@@ -20,15 +20,30 @@ meaning and user intent, rather than exact keyword matches.
20
20
ELSER is an out-of-domain model which means it does not require fine-tuning on
21
21
your own data, making it adaptable for various use cases out of the box.
22
22
23
+
24
+ [discrete]
25
+ [[elser-tokens]]
26
+ == Tokens - not synonyms
27
+
23
28
ELSER expands the indexed and searched passages into collections of terms that
24
29
are learned to co-occur frequently within a diverse set of training data. The
25
30
terms that the text is expanded into by the model _are not_ synonyms for the
26
- search terms; they are learned associations. These expanded terms are weighted
27
- as some of them are more significant than others. Then the {es}
28
- {ref}/sparse-vector.html[sparse vector]
31
+ search terms; they are learned associations capturing relevance . These expanded
32
+ terms are weighted as some of them are more significant than others. Then the
33
+ {es} { ref}/sparse-vector.html[sparse vector]
29
34
(or {ref}/rank-features.html[rank features]) field type is used to store the
30
35
terms and weights at index time, and to search against later.
31
36
37
+ This approach provides a more understandable search experience compared to
38
+ vector embeddings. However, attempting to directly interpret the tokens and
39
+ weights can be misleading, as the expansion essentially results in a vector in a
40
+ very high-dimensional space. Consequently, certain tokens, especially those with
41
+ low weight, contain information that is intertwined with other low-weight tokens
42
+ in the representation. In this regard, they function similarly to a dense vector
43
+ representation, making it challenging to separate their individual
44
+ contributions. This complexity can potentially lead to misinterpretations if not
45
+ carefully considered during analysis.
46
+
32
47
33
48
[discrete]
34
49
[[elser-req]]
0 commit comments