Skip to content

Commit 94237ee

Browse files
authored
[DOCS] Puts lang ident example back. (#2608)
1 parent 1829c7b commit 94237ee

File tree

1 file changed

+114
-2
lines changed

1 file changed

+114
-2
lines changed

docs/en/stack/ml/nlp/ml-nlp-lang-ident.asciidoc

Lines changed: 114 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,10 @@ language traditionally uses. These languages are marked in the supported
2424
languages table (see below) with the `Latn` subtag. {lang-ident-cap} supports
2525
Unicode input.
2626

27+
2728
[discrete]
2829
[[ml-lang-ident-supported-languages]]
29-
=== Supported languages
30+
== Supported languages
3031

3132
The table below contains the ISO codes and the English names of the languages
3233
that {lang-ident} supports. If a language has a 2-letter `ISO 639-1` code, the
@@ -82,8 +83,119 @@ script.
8283
<!-- lint enable -->
8384
////
8485

86+
87+
[discrete]
88+
[[ml-lang-ident-example]]
89+
== Example of {lang-ident}
90+
91+
In the following example, we feed the {lang-ident} trained model a short
92+
Hungarian text that contains diacritics and a couple of English words. The
93+
model identifies the text correctly as Hungarian with high probability.
94+
95+
[source,js]
96+
----------------------------------
97+
POST _ingest/pipeline/_simulate
98+
{
99+
"pipeline":{
100+
"processors":[
101+
{
102+
"inference":{
103+
"model_id":"lang_ident_model_1", <1>
104+
"inference_config":{
105+
"classification":{
106+
"num_top_classes":5 <2>
107+
}
108+
},
109+
"field_map":{
110+
}
111+
}
112+
}
113+
]
114+
},
115+
"docs":[
116+
{
117+
"_source":{ <3>
118+
"text":"Sziasztok! Ez egy rövid magyar szöveg. Nézzük, vajon sikerül-e azonosítania a language identification funkciónak? Annak ellenére is sikerülni fog, hogy a szöveg két angol szót is tartalmaz."
119+
}
120+
}
121+
]
122+
}
123+
----------------------------------
124+
//NOTCONSOLE
125+
126+
<1> ID of the {lang-ident} trained model.
127+
<2> Specifies the number of languages to report by descending order of
128+
probability.
129+
<3> The source object that contains the text to identify.
130+
131+
132+
In the example above, the `num_top_classes` value indicates that only the top
133+
five languages (that is to say, the ones with the highest probability) are
134+
reported.
135+
136+
The request returns the following response:
137+
138+
[source,js]
139+
----------------------------------
140+
{
141+
"docs" : [
142+
{
143+
"doc" : {
144+
"_index" : "_index",
145+
"_type" : "_doc",
146+
"_id" : "_id",
147+
"_source" : {
148+
"text" : "Sziasztok! Ez egy rövid magyar szöveg. Nézzük, vajon sikerül-e azonosítania a language identification funkciónak? Annak ellenére is sikerülni fog, hogy a szöveg két angol szót is tartalmaz.",
149+
"ml" : {
150+
"inference" : {
151+
"top_classes" : [ <1>
152+
{
153+
"class_name" : "hu",
154+
"class_probability" : 0.9999936063740517,
155+
"class_score" : 0.9999936063740517
156+
},
157+
{
158+
"class_name" : "lv",
159+
"class_probability" : 2.5020248433413966E-6,
160+
"class_score" : 2.5020248433413966E-6
161+
},
162+
{
163+
"class_name" : "is",
164+
"class_probability" : 1.0150420723037688E-6,
165+
"class_score" : 1.0150420723037688E-6
166+
},
167+
{
168+
"class_name" : "ga",
169+
"class_probability" : 6.67935962773335E-7,
170+
"class_score" : 6.67935962773335E-7
171+
},
172+
{
173+
"class_name" : "tr",
174+
"class_probability" : 5.591166324774555E-7,
175+
"class_score" : 5.591166324774555E-7
176+
}
177+
],
178+
"predicted_value" : "hu", <2>
179+
"model_id" : "lang_ident_model_1"
180+
}
181+
}
182+
},
183+
"_ingest" : {
184+
"timestamp" : "2020-01-22T14:25:14.644912Z"
185+
}
186+
}
187+
}
188+
]
189+
}
190+
----------------------------------
191+
//NOTCONSOLE
192+
193+
<1> Contains scores for the most probable languages.
194+
<2> The ISO identifier of the language with the highest probability.
195+
196+
85197
[discrete]
86198
[[ml-lang-ident-readings]]
87-
=== Further reading
199+
== Further reading
88200

89201
* {blog-ref}multilingual-search-using-language-identification-in-elasticsearch[Multilingual search using {lang-ident} in {es}]

0 commit comments

Comments
 (0)