Skip to content

Commit 5f98fec

Browse files
committed
Add overlay for tokenizer in ML analysis_config
1 parent 6fbdb67 commit 5f98fec

File tree

2 files changed

+45
-3
lines changed

2 files changed

+45
-3
lines changed

docs/overlays/elasticsearch-openapi-overlays.yaml

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,25 @@ actions:
5656
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
5757
externalDocs:
5858
url: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
59-
description: Query DSL
59+
description: Query DSL
60+
- target: "$.components['schemas']['ml._types:CategorizationAnalyzerDefinition'].properties.tokenizer"
61+
description: Remove tokenizer object from ML anomaly detection analysis config
62+
remove: true
63+
- target: "$.components['schemas']['ml._types:CategorizationAnalyzerDefinition'].properties"
64+
description: Re-add a simplified tokenizer object in ML anomaly detection analysis config
65+
update:
66+
tokenizer:
67+
x-abbreviated: true
68+
oneOf:
69+
- type: object
70+
- type: string
71+
description: >
72+
The name or definition of the tokenizer to use after character filters are applied.
73+
This property is compulsory if `categorization_analyzer` is specified as an object.
74+
Machine learning provides a tokenizer called `ml_standard` that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English.
75+
If you want to use that tokenizer but change the character or token filters, specify `"tokenizer": "ml_standard"` in your `categorization_analyzer`.
76+
Additionally, the `ml_classic` tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2).
77+
`ml_classic` was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
78+
externalDocs:
79+
url: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
80+
description: Tokenizer reference

docs/overlays/elasticsearch-serverless-openapi-overlays.yaml

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,10 @@ actions:
3030
x-beta: true
3131
# Remove and annotate items that are not shown in Bump.sh due to depth limits
3232
- target: "$.components['schemas']['ml._types:Datafeed'].properties.query"
33+
description: Remove query object from ML anomaly detection datafeed
3334
remove: true
3435
- target: "$.components['schemas']['ml._types:Datafeed'].properties"
35-
description: Re-add a simplified query object
36+
description: Re-add a simplified query object in ML anomaly detection datafeed
3637
update:
3738
query:
3839
x-abbreviated: true
@@ -45,4 +46,24 @@ actions:
4546
externalDocs:
4647
url: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
4748
description: Query DSL
48-
49+
- target: "$.components['schemas']['ml._types:CategorizationAnalyzerDefinition'].properties.tokenizer"
50+
description: Remove tokenizer object from ML anomaly detection analysis config
51+
remove: true
52+
- target: "$.components['schemas']['ml._types:CategorizationAnalyzerDefinition'].properties"
53+
description: Re-add a simplified tokenizer object in ML anomaly detection analysis config
54+
update:
55+
tokenizer:
56+
x-abbreviated: true
57+
oneOf:
58+
- type: object
59+
- type: string
60+
description: >
61+
The name or definition of the tokenizer to use after character filters are applied.
62+
This property is compulsory if `categorization_analyzer` is specified as an object.
63+
Machine learning provides a tokenizer called `ml_standard` that tokenizes in a way that has been determined to produce good categorization results on a variety of log file formats for logs in English.
64+
If you want to use that tokenizer but change the character or token filters, specify `"tokenizer": "ml_standard"` in your `categorization_analyzer`.
65+
Additionally, the `ml_classic` tokenizer is available, which tokenizes in the same way as the non-customizable tokenizer in old versions of the product (before 6.2).
66+
`ml_classic` was the default categorization tokenizer in versions 6.2 to 7.13, so if you need categorization identical to the default for jobs created in these versions, specify `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
67+
externalDocs:
68+
url: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
69+
description: Tokenizer reference

0 commit comments

Comments
 (0)