Skip to content

Commit 34054ee

Browse files
committed
DOCS-1147 and DOCS-1206 text search
DOCS-1206 ulimits Add note to Keyword Search Tutorial to link to Text Search DOCS-1147 text index sharded cluster and replica sets and reorg draft of the text-search usage fix information about hyphens and copy edits added more info DOCS-1147 text search
1 parent a5bb81e commit 34054ee

25 files changed

+1555
-550
lines changed

source/applications.txt

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,18 @@ The following documents provide patterns for developing application features:
5757
tutorial/isolate-sequence-of-operations
5858
tutorial/create-an-auto-incrementing-field
5959
tutorial/expire-data
60+
61+
Text Search Patterns
62+
--------------------
63+
64+
The following tutorials provide some patterns for
65+
text search usage:
66+
67+
.. toctree::
68+
:maxdepth: 1
69+
70+
tutorial/enable-text-search
71+
tutorial/search-for-text
72+
tutorial/create-text-index-on-multi-language-collection
73+
tutorial/return-text-queries-using-only-text-index
74+
tutorial/limit-number-of-items-scanned-for-text-search

source/applications/text-search.txt

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
===========
2+
Text Search
3+
===========
4+
5+
.. default-domain:: mongodb
6+
7+
.. versionadded:: 2.4
8+
9+
Overview
10+
--------
11+
12+
Text search supports the search of string content in documents of a
13+
collection. Text search introduces a new :ref:`text
14+
<index-feature-text>` index type and a new :dbcommand:`text` command.
15+
16+
The text search process:
17+
18+
- tokenizes and stems the search term(s) during both the index creation
19+
and the text command execution.
20+
21+
- assigns a score to each document that contains the search term in the
22+
indexed fields. The score determines the relevance of a document to a
23+
given search query.
24+
25+
By default, :dbcommand:`text` command returns at most the top 100
26+
matching documents as determined by the scores.
27+
28+
.. _create-text-index:
29+
30+
Create a ``text`` Index
31+
-----------------------
32+
33+
To perform text search, create a ``text`` index on the field or fields
34+
whose value is a string or an array of string elements. To create a
35+
``text`` indexes, use the :method:`db.collection.ensureIndex()` method
36+
with a document that contains field and value pairs where the value is
37+
the string literal ``text``.
38+
39+
.. important::
40+
41+
- Before you can :ref:`create a text index <create-text-index>` or
42+
:ref:`run the text command <text-search-text-command>`, you need
43+
to manually enable the text search. See
44+
:doc:`/tutorial/enable-text-search` for information on how to
45+
enable the text search feature.
46+
47+
- Text indexes have significant storage requirements and performance
48+
costs. See :ref:`text index feature <index-feature-text>` for more
49+
information.
50+
51+
- .. include:: /includes/fact-text-index-limit-one.rst
52+
53+
The following example creates a ``text`` index on the fields
54+
``subject`` and ``content``:
55+
56+
.. code-block:: javascript
57+
58+
db.collection.ensureIndex(
59+
{
60+
subject: "text",
61+
content: "text"
62+
}
63+
)
64+
65+
This ``text`` index catalogs all string data in the ``subject`` field
66+
and the ``content`` field, where the field value is either a string or
67+
an array of string elements.
68+
69+
See :doc:`/core/text-index` for details on the options available when
70+
creating ``text`` indexes.
71+
72+
Additionally, ``text`` indexes can also be combined with
73+
ascending/descending index fields. See:
74+
75+
- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search`
76+
77+
- :doc:`/tutorial/return-text-queries-using-only-text-index`
78+
79+
.. _text-search-text-command:
80+
81+
``text`` Command
82+
----------------
83+
84+
The :dbcommand:`text` command can search for words and phrases. The
85+
command matches on the complete stemmed words. For example, if a
86+
document field contains the word ``blueberry``, a search on the term
87+
``blue`` will not match the document. However, a search on either
88+
``blueberry`` or ``blueberries`` will match.
89+
90+
By default, the :dbcommand:`text` returns the top 100 scoring documents
91+
in descending order, but you can specify a ``limit`` option to change
92+
the maximum number to return.
93+
94+
Given a collection with a ``text`` index, use the
95+
:method:`~db.collection.runCommand()` method to execute the
96+
:dbcommand:`text` command, as in:
97+
98+
.. code-block:: javascript
99+
100+
db.collection.runCommand( "text" , { search: <string> } )
101+
102+
For information and examples on various text search patterns, see
103+
:doc:`/tutorial/search-for-text`.
104+
105+
Text Search Output
106+
------------------
107+
108+
The :dbcommand:`text` command returns a document that contains the
109+
result set.
110+
111+
See :ref:`text-search-output` for information on the output.

source/contents.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ MongoDB Manual Contents
1010
security
1111
crud
1212
aggregation
13+
applications/text-search
1314
indexes
1415
replication
1516
sharding

source/core/indexes.txt

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -770,6 +770,52 @@ indexes are not suited for finding the closest documents to a
770770
particular location, when the closest documents are far away compared
771771
to bucket size.
772772

773+
.. index:: index; text
774+
.. index:: text index
775+
.. _index-feature-text:
776+
777+
``text`` Indexes
778+
~~~~~~~~~~~~~~~~
779+
780+
.. versionadded:: 2.4
781+
782+
MongoDB provides ``text`` indexes to support :doc:`text search
783+
</applications/text-search>` on a collection. You can only access the
784+
``text`` index with the :dbcommand:`text` command.
785+
786+
``text`` indexes are case-insensitive and can include any field that
787+
contains string data. ``text`` indexes drop language-specific stop
788+
words (e.g. in English, “the,” “an,” “a,” “and,” etc.) and uses simple
789+
language-specific suffix stemming. See :ref:`text-search-languages` for
790+
the supported languages.
791+
792+
``text`` indexes have the following storage requirements and
793+
performance costs:
794+
795+
- Text indexes can be large. They contain one index entry for each
796+
unique post-stemmed word in each indexed field for each document
797+
inserted.
798+
799+
- Building a ``text`` index is very similar to building a large
800+
multi-key index, and will take longer than building a simple ordered
801+
(scalar) index on the same data.
802+
803+
- When building a large ``text`` index on an existing collection,
804+
ensure that you have a sufficiently-high open file descriptor limit.
805+
See the :ref:`recommended settings <oom-killer>`.
806+
807+
- ``text`` indexes will impact insertion throughput because MongoDB
808+
must add an index entry for each unique post-stemmed word in each
809+
indexed field of each new source document.
810+
811+
- Additionally, ``text`` indexes do not store phrases or information
812+
about the proximity of words in the documents. As a result, phrase
813+
queries will run much more effectively when the entire collection
814+
fits in RAM.
815+
816+
See :doc:`/applications/text-search` for more information on the text
817+
search feature.
818+
773819
.. index:: index; limitations
774820
.. _index-limitations:
775821

source/core/text-index.txt

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
:orphan:
2+
3+
==============
4+
``text`` Index
5+
==============
6+
7+
.. default-domain:: mongodb
8+
9+
This document provides details on some of the options available when
10+
creating ``text`` indexes.
11+
12+
Specify a Name for the ``text`` Index
13+
-------------------------------------
14+
15+
The default name for the index consists of each index field name
16+
concatenated with ``_text``. Consider the ``text`` index on the fields
17+
``content``, ``users.comments``, and ``users.profiles``.
18+
19+
.. code-block:: javascript
20+
21+
db.collection.ensureIndex(
22+
{
23+
content: "text",
24+
"users.comments": "text",
25+
"users.profiles": "text"
26+
}
27+
)
28+
29+
The default name for the index is:
30+
31+
.. code-block:: javascript
32+
33+
"content_text_users.comments_text_users.profiles_text"
34+
35+
To avoid creating an index with a name that exceeds the :limit:`index
36+
name length limit <Index Name Length>`, you can pass the ``name``
37+
option to the :method:`db.collection.ensureIndex()` method:
38+
39+
.. code-block:: javascript
40+
41+
db.collection.ensureIndex(
42+
{
43+
content: "text",
44+
"users.comments": "text",
45+
"users.profiles": "text"
46+
},
47+
{
48+
name: "MyTextIndex"
49+
}
50+
)
51+
52+
.. note::
53+
54+
To drop the ``text`` index, use the index name. To get the name of
55+
an index, use :method:`db.collection.getIndexes()`.
56+
57+
Index All Fields
58+
----------------
59+
60+
To allow for text search on all fields with string content, use the
61+
wildcard specifier (``$**``) to index all fields that contain string
62+
content.
63+
64+
The following example indexes any string value in the data of every
65+
field of every document in a collection and names it ``TextIndex``:
66+
67+
.. code-block:: javascript
68+
69+
db.collection.ensureIndex(
70+
{ "$**": "text" },
71+
{ name: "TextIndex" }
72+
)
73+
74+
.. _text-index-default-language:
75+
76+
Specify Languages for Text Index
77+
--------------------------------
78+
79+
The default language associated with the indexed data determines the
80+
list of stop words and the rules for the stemmer and tokenizer. The
81+
default language for the indexed data is ``english``.
82+
83+
To specify a different language, use the ``default_language`` option
84+
when creating the ``text`` index. See :ref:`text-search-languages` for
85+
the languages available for ``default_language``.
86+
87+
The following example creates a ``text`` index on the
88+
``content`` field and sets the ``default_language`` to
89+
``spanish``:
90+
91+
.. code-block:: javascript
92+
93+
db.collection.ensureIndex(
94+
{ content : "text" },
95+
{ default_language: "spanish" }
96+
)
97+
98+
.. seealso::
99+
100+
:doc:`/tutorial/create-text-index-on-multi-language-collection`
101+
102+
.. _text-index-internals-weights:
103+
104+
Control Results of Text Search with Weights
105+
-------------------------------------------
106+
107+
By default, the :dbcommand:`text` command returns matching documents
108+
based on scores, from highest to lowest. For a ``text`` index, the
109+
*weight* of an indexed field denote the significance of the field
110+
relative to the other indexed fields in terms of the score. The score
111+
calculation for a given word in a document includes the weighted sum of
112+
the frequency for each of the indexed fields in that document.
113+
114+
The default weight is 1 for the indexed fields. To adjust the weights
115+
for the indexed fields, include the ``weights`` option in the
116+
:method:`db.collection.ensureIndex()` method.
117+
118+
.. warning::
119+
120+
Choose the weights carefully in order to prevent the need to reindex.
121+
122+
A collection ``blog`` has the following documents:
123+
124+
.. code-block:: javascript
125+
126+
{ _id: 1,
127+
content: "This morning I had a cup of coffee.",
128+
about: "beverage",
129+
keywords: [ "coffee" ]
130+
}
131+
132+
{ _id: 2,
133+
content: "Who doesn't like cake?",
134+
about: "food",
135+
keywords: [ "cake", "food", "dessert" ]
136+
}
137+
138+
To create a ``text`` index with different field weights for the
139+
``content`` field and the ``keywords`` field, include the ``weights``
140+
option to the :method:`~db.collection.ensureIndex()` method.
141+
142+
.. code-block:: javascript
143+
144+
db.blog.ensureIndex(
145+
{
146+
content: "text",
147+
keywords: "text",
148+
about: "text"
149+
},
150+
{
151+
weights: {
152+
content: 10,
153+
keywords: 5,
154+
},
155+
name: "TextIndex"
156+
}
157+
)
158+
159+
The ``text`` index has the following fields and weights:
160+
161+
- ``content`` has a weight of 10,
162+
163+
- ``keywords`` has a weight of 5, and
164+
165+
- ``about`` has the default weight of 1.
166+
167+
These weights denote the relative significance of the indexed fields to
168+
each other. For instance, a term match in the ``content`` field has:
169+
170+
- ``2`` times (i.e. ``10:5``) the impact as a term match in the
171+
``keywords`` field and
172+
173+
- ``10`` times (i.e. ``10:1``) the impact as a term match in the
174+
``about`` field.
175+
176+
Tutorials
177+
---------
178+
179+
The following tutorials offer additional ``text`` index creation
180+
patterns:
181+
182+
- :doc:`/tutorial/create-text-index-on-multi-language-collection`
183+
184+
- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search`
185+
186+
- :doc:`/tutorial/return-text-queries-using-only-text-index`
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
A collection can have at most only **one** ``text`` index.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
The :doc:`text search </applications/text-search>` is currently a
2+
*beta* feature. As a beta feature:
3+
4+
- You need to explicitly enable the feature before :ref:`creating a text
5+
index <create-text-index>` or using the :dbcommand:`text` command.
6+
7+
- To enable text search on :doc:`replica sets </core/replication>` and
8+
:doc:`sharded clusters </core/sharded-clusters>`, you need to
9+
enable on **each and every** :program:`mongod` for replica
10+
sets and on **each and every** :program:`mongos` for sharded clusters.

0 commit comments

Comments
 (0)