@@ -20,13 +20,28 @@ meaning and user intent, rather than exact keyword matches.
20
20
ELSER is an out-of-domain model which means it does not require fine-tuning on
21
21
your own data, making it adaptable for various use cases out of the box.
22
22
23
+
24
+ [discrete]
25
+ [[elser-tokens]]
26
+ == Tokens - not synonyms
27
+
23
28
ELSER expands the indexed and searched passages into collections of terms that
24
29
are learned to co-occur frequently within a diverse set of training data. The
25
30
terms that the text is expanded into by the model _are not_ synonyms for the
26
- search terms; they are learned associations. These expanded terms are weighted
27
- as some of them are more significant than others. Then the {es}
28
- {ref}/rank-features.html[rank features field type] is used to store the terms
29
- and weights at index time, and to search against later.
31
+ search terms; they are learned associations capturing relevance. These expanded
32
+ terms are weighted as some of them are more significant than others. Then the
33
+ {es} {ref}/rank-features.html[rank features] field type is used to store the
34
+ terms and weights at index time, and to search against later.
35
+
36
+ This approach provides a more understandable search experience compared to
37
+ vector embeddings. However, attempting to directly interpret the tokens and
38
+ weights can be misleading, as the expansion essentially results in a vector in a
39
+ very high-dimensional space. Consequently, certain tokens, especially those with
40
+ low weight, contain information that is intertwined with other low-weight tokens
41
+ in the representation. In this regard, they function similarly to a dense vector
42
+ representation, making it challenging to separate their individual
43
+ contributions. This complexity can potentially lead to misinterpretations if not
44
+ carefully considered during analysis.
30
45
31
46
32
47
[discrete]
0 commit comments