@@ -20,13 +20,36 @@ meaning and user intent, rather than exact keyword matches.
20
20
ELSER is an out-of-domain model which means it does not require fine-tuning on
21
21
your own data, making it adaptable for various use cases out of the box.
22
22
23
+
24
+ [discrete]
25
+ [[elser-tokens]]
26
+ == Tokens - not synonyms
27
+
23
28
ELSER expands the indexed and searched passages into collections of terms that
24
29
are learned to co-occur frequently within a diverse set of training data. The
25
30
terms that the text is expanded into by the model _are not_ synonyms for the
31
+ <<<<<<< HEAD
26
32
search terms; they are learned associations. These expanded terms are weighted
27
33
as some of them are more significant than others. Then the {es}
28
34
{ref}/rank-features.html[rank features field type] is used to store the terms
29
35
and weights at index time, and to search against later.
36
+ =======
37
+ search terms; they are learned associations capturing relevance. These expanded
38
+ terms are weighted as some of them are more significant than others. Then the
39
+ {es} {ref}/sparse-vector.html[sparse vector]
40
+ (or {ref}/rank-features.html[rank features]) field type is used to store the
41
+ terms and weights at index time, and to search against later.
42
+ >>>>>>> f9c8a202 ([DOCS] Adds section about tokens to ELSER conceptual (#2568))
43
+
44
+ This approach provides a more understandable search experience compared to
45
+ vector embeddings. However, attempting to directly interpret the tokens and
46
+ weights can be misleading, as the expansion essentially results in a vector in a
47
+ very high-dimensional space. Consequently, certain tokens, especially those with
48
+ low weight, contain information that is intertwined with other low-weight tokens
49
+ in the representation. In this regard, they function similarly to a dense vector
50
+ representation, making it challenging to separate their individual
51
+ contributions. This complexity can potentially lead to misinterpretations if not
52
+ carefully considered during analysis.
30
53
31
54
32
55
[discrete]
0 commit comments