Skip to content

[Backport 8.x] Add text structure API examples #3493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 34 additions & 23 deletions output/openapi/elasticsearch-openapi.json

Large diffs are not rendered by default.

78 changes: 47 additions & 31 deletions output/schema/schema.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,22 @@ import { EcsCompatibilityType, FormatType } from '../_types/Structure'
/**
* Find the structure of a text field.
* Find the structure of a text field in an Elasticsearch index.
*
* This API provides a starting point for extracting further information from log messages already ingested into Elasticsearch.
* For example, if you have ingested data into a very simple index that has just `@timestamp` and message fields, you can use this API to see what common structure exists in the message field.
*
* The response from the API contains:
*
* * Sample messages.
* * Statistics that reveal the most common values for all fields detected within the text and basic numeric statistics for numeric fields.
* * Information about the structure of the text, which is useful when you write ingest configurations to index it or similarly formatted text.
* * Appropriate mappings for an Elasticsearch index, which you could use to ingest the text.
*
* All this information can be calculated by the structure finder with no guidance.
* However, you can optionally override some of the decisions about the text structure by specifying one or more query parameters.
*
* If the structure finder produces unexpected results, specify the `explain` query parameter and an explanation will appear in the response.
* It helps determine why the returned structure was chosen.
* @rest_spec_name text_structure.find_field_structure
* @availability stack stability=stable visibility=public
* @cluster_privileges monitor_text_structure
Expand Down Expand Up @@ -63,7 +79,7 @@ interface Request extends RequestBase {
*/
ecs_compatibility?: EcsCompatibilityType
/**
* If true, the response includes a field named `explanation`, which is an array of strings that indicate how the structure finder produced its result.
* If `true`, the response includes a field named `explanation`, which is an array of strings that indicate how the structure finder produced its result.
* @server_default false
*/
explain?: boolean
Expand Down Expand Up @@ -99,7 +115,7 @@ interface Request extends RequestBase {
/**
* If the format is `delimited`, you can specify whether values between delimiters should have whitespace trimmed from them.
* If this parameter is not specified and the delimiter is pipe (`|`), the default value is true.
* Otherwise, the default value is false.
* Otherwise, the default value is `false`.
*/
should_trim_fields?: boolean
/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# summary:
description: A successful response from `GET _text_structure/find_field_structure?index=test-logs&field=message`.
# type: response
# response_code: ''
value:
"{\n \"num_lines_analyzed\" : 22,\n \"num_messages_analyzed\" : 22,\n \"\
sample_start\" : \"[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider]\
\ [laptop] Java vector incubator API enabled; uses preferredBitSize=128\\n[2024-03-05T10:52:41,038][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]\\n\",\n \
\ \"charset\" : \"UTF-8\",\n \"format\" : \"semi_structured_text\",\n \"multiline_start_pattern\"\
\ : \"^\\\\[\\\\b\\\\d{4}-\\\\d{2}-\\\\d{2}[T ]\\\\d{2}:\\\\d{2}\",\n \"grok_pattern\"\
\ : \"\\\\[%{TIMESTAMP_ISO8601:timestamp}\\\\]\\\\[%{LOGLEVEL:loglevel} \\\\]\\\\\
[.*\",\n \"ecs_compatibility\" : \"disabled\",\n \"timestamp_field\" : \"timestamp\"\
,\n \"joda_timestamp_formats\" : [\n \"ISO8601\"\n ],\n \"java_timestamp_formats\"\
\ : [\n \"ISO8601\"\n ],\n \"need_client_timezone\" : true,\n \"mappings\"\
\ : {\n \"properties\" : {\n \"@timestamp\" : {\n \"type\" : \"date\"\
\n },\n \"loglevel\" : {\n \"type\" : \"keyword\"\n },\n \
\ \"message\" : {\n \"type\" : \"text\"\n }\n }\n },\n \"ingest_pipeline\"\
\ : {\n \"description\" : \"Ingest pipeline created by text structure finder\"\
,\n \"processors\" : [\n {\n \"grok\" : {\n \"field\" :\
\ \"message\",\n \"patterns\" : [\n \"\\\\[%{TIMESTAMP_ISO8601:timestamp}\\\
\\]\\\\[%{LOGLEVEL:loglevel} \\\\]\\\\[.*\"\n ],\n \"ecs_compatibility\"\
\ : \"disabled\"\n }\n },\n {\n \"date\" : {\n \
\ \"field\" : \"timestamp\",\n \"timezone\" : \"{{ event.timezone }}\"\
,\n \"formats\" : [\n \"ISO8601\"\n ]\n }\n\
\ },\n {\n \"remove\" : {\n \"field\" : \"timestamp\"\n\
\ }\n }\n ]\n },\n \"field_stats\" : {\n \"loglevel\" : {\n\
\ \"count\" : 22,\n \"cardinality\" : 1,\n \"top_hits\" : [\n \
\ {\n \"value\" : \"INFO\",\n \"count\" : 22\n }\n\
\ ]\n },\n \"message\" : {\n \"count\" : 22,\n \"cardinality\"\
\ : 22,\n \"top_hits\" : [\n {\n \"value\" : \"[2024-03-05T10:52:36,256][INFO\
\ ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled;\
\ uses preferredBitSize=128\",\n \"count\" : 1\n },\n {\n\
\ \"value\" : \"[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService \
\ ] [laptop] loaded module [repository-url]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [rest-root]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [ingest-user-agent]\",\n \"count\" : 1\n\
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-core]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-redact]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [lang-painless]]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [repository-s3]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-analytics]\",\n \"count\" : 1\n\
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-autoscaling]\",\n \"count\" : 1\n\
\ }\n ]\n },\n \"timestamp\" : {\n \"count\" : 22,\n \
\ \"cardinality\" : 14,\n \"earliest\" : \"2024-03-05T10:52:36,256\",\n \
\ \"latest\" : \"2024-03-05T10:52:49,199\",\n \"top_hits\" : [\n \
\ {\n \"value\" : \"2024-03-05T10:52:41,044\",\n \"count\" : 6\n\
\ },\n {\n \"value\" : \"2024-03-05T10:52:41,043\",\n \
\ \"count\" : 3\n },\n {\n \"value\" : \"2024-03-05T10:52:41,059\"\
,\n \"count\" : 2\n },\n {\n \"value\" : \"2024-03-05T10:52:36,256\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:41,038\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:41,042\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:43,291\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:46,098\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:47,227\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:47,259\"\
,\n \"count\" : 1\n }\n ]\n }\n }\n}"
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import { EcsCompatibilityType, FormatType } from '../_types/Structure'
*
* This API provides a starting point for ingesting data into Elasticsearch in a format that is suitable for subsequent use with other Elastic Stack functionality.
* Use this API rather than the find text structure API if your input text has already been split up into separate messages by some other process.
*
* The response from the API contains:
*
* * Sample messages.
Expand All @@ -38,6 +39,9 @@ import { EcsCompatibilityType, FormatType } from '../_types/Structure'
*
* All this information can be calculated by the structure finder with no guidance.
* However, you can optionally override some of the decisions about the text structure by specifying one or more query parameters.
*
* If the structure finder produces unexpected results, specify the `explain` query parameter and an explanation will appear in the response.
* It helps determine why the returned structure was chosen.
* @rest_spec_name text_structure.find_message_structure
* @availability stack stability=stable visibility=public
* @cluster_privileges monitor_text_structure
Expand Down Expand Up @@ -71,7 +75,8 @@ interface Request extends RequestBase {
* @server_default false
*/
explain?: boolean
/** The high level structure of the text.
/**
* The high level structure of the text.
* By default, the API chooses the format.
* In this default scenario, all rows must have the same number of fields for a delimited format to be detected.
* If the format is `delimited` and the delimiter is not set, however, the API tolerates up to 5% of rows that have a different number of columns than the first row.
Expand All @@ -94,7 +99,7 @@ interface Request extends RequestBase {
/**
* If the format is `delimited`, you can specify whether values between delimiters should have whitespace trimmed from them.
* If this parameter is not specified and the delimiter is pipe (`|`), the default value is true.
* Otherwise, the default value is false.
* Otherwise, the default value is `false`.
*/
should_trim_fields?: boolean
/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# summary:
# method_request: POST _text_structure/find_message_structure
description: >
Run `POST _text_structure/find_message_structure` to analyze Elasticsearch log files.
# type: request
value:
"{\n \"messages\": [\n \"[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider]\
\ [laptop] Java vector incubator API enabled; uses preferredBitSize=128\",\n \
\ \"[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded\
\ module [repository-url]\",\n \"[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [rest-root]\",\n \"[2024-03-05T10:52:41,043][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]\",\n \"[2024-03-05T10:52:41,043][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]\",\n \"\
[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module\
\ [ingest-user-agent]\",\n \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-monitoring]\",\n \"[2024-03-05T10:52:41,044][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]\",\n \"\
[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module\
\ [x-pack-analytics]\",\n \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-ent-search]\",\n \"[2024-03-05T10:52:41,044][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]\",\n\
\ \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded\
\ module [lang-painless]]\",\n \"[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [lang-expression]\",\n \"[2024-03-05T10:52:41,059][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]\",\n \"[2024-03-05T10:52:43,291][INFO\
\ ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object\
\ pointers [true]\",\n \"[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security \
\ ] [laptop] Security is enabled\",\n \"[2024-03-05T10:52:47,227][INFO\
\ ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled\",\n \"[2024-03-05T10:52:47,259][INFO\
\ ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed\
\ or reinstalled\",\n \"[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings\
\ ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]\"\
,\n \"[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using\
\ discovery type [multi-node] and seed hosts providers [settings]\",\n \"[2024-03-05T10:52:49,188][INFO\
\ ][o.e.n.Node ] [laptop] initialized\",\n \"[2024-03-05T10:52:49,199][INFO\
\ ][o.e.n.Node ] [laptop] starting ...\"\n ]\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# summary:
description: A successful response from `POST _text_structure/find_message_structure`.
# type: response
# response_code: ''
value:
"{\n \"num_lines_analyzed\" : 22,\n \"num_messages_analyzed\" : 22,\n \"\
sample_start\" : \"[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider]\
\ [laptop] Java vector incubator API enabled; uses preferredBitSize=128\\n[2024-03-05T10:52:41,038][INFO\
\ ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]\\n\",\n \
\ \"charset\" : \"UTF-8\",\n \"format\" : \"semi_structured_text\",\n \"multiline_start_pattern\"\
\ : \"^\\\\[\\\\b\\\\d{4}-\\\\d{2}-\\\\d{2}[T ]\\\\d{2}:\\\\d{2}\",\n \"grok_pattern\"\
\ : \"\\\\[%{TIMESTAMP_ISO8601:timestamp}\\\\]\\\\[%{LOGLEVEL:loglevel} \\\\]\\\\\
[.*\",\n \"ecs_compatibility\" : \"disabled\",\n \"timestamp_field\" : \"timestamp\"\
,\n \"joda_timestamp_formats\" : [\n \"ISO8601\"\n ],\n \"java_timestamp_formats\"\
\ : [\n \"ISO8601\"\n ],\n \"need_client_timezone\" : true,\n \"mappings\"\
\ : {\n \"properties\" : {\n \"@timestamp\" : {\n \"type\" : \"date\"\
\n },\n \"loglevel\" : {\n \"type\" : \"keyword\"\n },\n \
\ \"message\" : {\n \"type\" : \"text\"\n }\n }\n },\n \"ingest_pipeline\"\
\ : {\n \"description\" : \"Ingest pipeline created by text structure finder\"\
,\n \"processors\" : [\n {\n \"grok\" : {\n \"field\" :\
\ \"message\",\n \"patterns\" : [\n \"\\\\[%{TIMESTAMP_ISO8601:timestamp}\\\
\\]\\\\[%{LOGLEVEL:loglevel} \\\\]\\\\[.*\"\n ],\n \"ecs_compatibility\"\
\ : \"disabled\"\n }\n },\n {\n \"date\" : {\n \
\ \"field\" : \"timestamp\",\n \"timezone\" : \"{{ event.timezone }}\"\
,\n \"formats\" : [\n \"ISO8601\"\n ]\n }\n\
\ },\n {\n \"remove\" : {\n \"field\" : \"timestamp\"\n\
\ }\n }\n ]\n },\n \"field_stats\" : {\n \"loglevel\" : {\n\
\ \"count\" : 22,\n \"cardinality\" : 1,\n \"top_hits\" : [\n \
\ {\n \"value\" : \"INFO\",\n \"count\" : 22\n }\n\
\ ]\n },\n \"message\" : {\n \"count\" : 22,\n \"cardinality\"\
\ : 22,\n \"top_hits\" : [\n {\n \"value\" : \"[2024-03-05T10:52:36,256][INFO\
\ ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled;\
\ uses preferredBitSize=128\",\n \"count\" : 1\n },\n {\n\
\ \"value\" : \"[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService \
\ ] [laptop] loaded module [repository-url]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [rest-root]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [ingest-user-agent]\",\n \"count\" : 1\n\
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-core]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-redact]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [lang-painless]]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [repository-s3]\",\n \"count\" : 1\n \
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-analytics]\",\n \"count\" : 1\n\
\ },\n {\n \"value\" : \"[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService\
\ ] [laptop] loaded module [x-pack-autoscaling]\",\n \"count\" : 1\n\
\ }\n ]\n },\n \"timestamp\" : {\n \"count\" : 22,\n \
\ \"cardinality\" : 14,\n \"earliest\" : \"2024-03-05T10:52:36,256\",\n \
\ \"latest\" : \"2024-03-05T10:52:49,199\",\n \"top_hits\" : [\n \
\ {\n \"value\" : \"2024-03-05T10:52:41,044\",\n \"count\" : 6\n\
\ },\n {\n \"value\" : \"2024-03-05T10:52:41,043\",\n \
\ \"count\" : 3\n },\n {\n \"value\" : \"2024-03-05T10:52:41,059\"\
,\n \"count\" : 2\n },\n {\n \"value\" : \"2024-03-05T10:52:36,256\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:41,038\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:41,042\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:43,291\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:46,098\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:47,227\"\
,\n \"count\" : 1\n },\n {\n \"value\" : \"2024-03-05T10:52:47,259\"\
,\n \"count\" : 1\n }\n ]\n }\n }\n}"
Loading
Loading