mongodb · tychoish · Nov 30, 2012 · Nov 21, 2012 · Nov 29, 2012 · Nov 29, 2012
diff --git a/source/aggregation.txt b/source/aggregation.txt
@@ -25,4 +25,5 @@ The following is the outline of the aggregation documentation:
    applications/aggregation
    tutorial/aggregation-examples
    reference/aggregation
+   applications/map-reduce
 
diff --git a/source/applications.txt b/source/applications.txt
@@ -39,6 +39,7 @@ The following documents outline basic application development topics:
    - :doc:`/applications/replication` 
    - :doc:`/applications/indexes`
    - :doc:`/applications/aggregation`
+   - :doc:`/applications/map-reduce`
 
 .. _application-patterns:
 
@@ -50,5 +51,6 @@ The following documents provide patterns for developing application features:
 .. toctree::
    :maxdepth: 2
 
+   tutorial/troubleshoot-map-reduce
    tutorial/perform-two-phase-commits
    tutorial/expire-data
diff --git a/source/applications/map-reduce.txt b/source/applications/map-reduce.txt
@@ -0,0 +1,287 @@
+==========
+Map-Reduce
+==========
+
+.. default-domain:: mongodb
+
+Map-reduce operations can handle complex aggregation tasks. To perform
+map-reduce operations, MongoDB provides the :dbcommand:`mapReduce`
+command and, in the :program:`mongo` shell, the
+:method:`db.collection.mapReduce()` wrapper method.
+
+.. contents:: This overview will cover: 
+   :backlinks: none
+   :local:
+   :depth: 1
+
+For many simple aggregation tasks, see the :doc:`aggregation framework
+</applications/aggregation>`.
+
+.. _map-reduce-examples:
+
+Map-Reduce Examples
+-------------------
+
+This section provides some map-reduce examples in the :program:`mongo`
+shell using the :method:`db.collection.mapReduce()` method:
+
+.. code-block:: javascript
+
+   db.collection.mapReduce(
+                            <map>,
+                            <reduce>,
+                            {
+                              <out>,
+                              <query>,
+                              <sort>,
+                              <limit>,
+                              <finalize>,
+                              <scope>,
+                              <jsMode>,
+                              <verbose>
+                            }
+                          )
+
+For more information on the parameters, see the
+:method:`db.collection.mapReduce()` reference page .
+
+.. include:: /includes/examples-map-reduce.rst
+   :start-after: map-reduce-document-prototype-begin
+
+.. _map-reduce-incremental:
+
+Incremental Map-Reduce
+----------------------
+
+If the map-reduce dataset is constantly growing, then rather than
+performing the map-reduce operation over the entire dataset each time
+you want to run map-reduce, you may want to perform an incremental
+map-reduce.
+
+To perform incremental map-reduce:
+
+#. Run a map-reduce job over the current collection and output the
+   result to a separate collection.
+
+#. When you have more data to process, run subsequent map-reduce job
+   with:
+
+   - the ``<query>`` parameter that specifies conditions that match
+     *only* the new documents.
+
+   - the ``<out>`` parameter that specifies the ``reduce`` action to
+     merge the new results into the existing output collection.
+
+Consider the following example where you schedule a map-reduce
+operation on a ``sessions`` collection to run at the end of each day.
+
+**Data Setup**
+
+The ``sessions`` collection contains documents that log users' session
+each day and can be simulated as follows:
+
+.. code-block:: javascript
+
+   db.sessions.save( { userid: "a", ts: ISODate('2011-11-03 14:17:00'), length: 95 } );
+   db.sessions.save( { userid: "b", ts: ISODate('2011-11-03 14:23:00'), length: 110 } );
+   db.sessions.save( { userid: "c", ts: ISODate('2011-11-03 15:02:00'), length: 120 } );
+   db.sessions.save( { userid: "d", ts: ISODate('2011-11-03 16:45:00'), length: 45 } );
+
+   db.sessions.save( { userid: "a", ts: ISODate('2011-11-04 11:05:00'), length: 105 } );
+   db.sessions.save( { userid: "b", ts: ISODate('2011-11-04 13:14:00'), length: 120 } );
+   db.sessions.save( { userid: "c", ts: ISODate('2011-11-04 17:00:00'), length: 130 } );
+   db.sessions.save( { userid: "d", ts: ISODate('2011-11-04 15:37:00'), length: 65 } );
+
+**Initial Map-Reduce of Current Collection**
+
+#. Define the ``<map>`` function that maps the ``userid`` to an
+   object that contains the fields ``userid``, ``total_time``, ``count``,
+   and ``avg_time``:
+
+   .. code-block:: javascript
+
+      var mapFunction = function() {
+                            var key = this.userid;
+                            var value = {
+                                          userid: this.userid,
+                                          total_time: this.length,
+                                          count: 1,
+                                          avg_time: 0
+                                         };
+
+                            emit( key, value );
+                        };
+
+#. Define the corresponding ``<reduce>`` function with two arguments
+   ``key`` and ``values`` to calculate the total time and the count.
+   The ``key`` corresponds to the ``userid``, and the ``values`` is an
+   array whose elements corresponds to the individual objects mapped to the
+   ``userid`` in the ``mapFunction``.
+
+   .. code-block:: javascript
+
+      var reduceFunction = function(key, values) {
+
+                              var reducedObject = {
+                                                    userid: key,
+                                                    total_time: 0,
+                                                    count:0,
+                                                    avg_time:0
+                                                  };
+
+                              values.forEach( function(value) {
+                                                    reducedObject.total_time += value.total_time;
+                                                    reducedObject.count += value.count;
+                                              } 
+                                            );
+                              return reducedObject;
+                           };
+
+#. Define ``<finalize>`` function with two arguments ``key`` and
+   ``reducedValue``. The function modifies the ``reducedValue`` document
+   to add another field ``average`` and returns the modified document.
+
+   .. code-block:: javascript
+
+      var finalizeFunction = function (key, reducedValue) {
+
+                                if (reducedValue.count > 0)
+                                    reducedValue.avg_time = reducedValue.total_time / reducedValue.count;
+
+                                return reducedValue;
+                             };
+
+#. Perform map-reduce on the ``session`` collection using the
+   ``mapFunction``, the ``reduceFunction``, and the
+   ``finalizeFunction`` functions. Output the results to a collection
+   ``session_stat``. If the ``session_stat`` collection already exists,
+   the operation will replace the contents:
+
+   .. code-block:: javascript
+
+      db.sessions.mapReduce( mapFunction,
+                             reduceFunction,
+                             {
+                               out: { reduce: "session_stat" },
+                               finalize: finalizeFunction
+                             }
+                           )
+
+**Subsequent Incremental Map-Reduce**
+
+Assume the next day, the ``sessions`` collection grows by the following documents:
+
+   .. code-block:: javascript
+
+      db.sessions.save( { userid: "a", ts: ISODate('2011-11-05 14:17:00'), length: 100 } );
+      db.sessions.save( { userid: "b", ts: ISODate('2011-11-05 14:23:00'), length: 115 } );
+      db.sessions.save( { userid: "c", ts: ISODate('2011-11-05 15:02:00'), length: 125 } );
+      db.sessions.save( { userid: "d", ts: ISODate('2011-11-05 16:45:00'), length: 55 } );
+
+5. At the end of the day, perform incremental map-reduce on the
+   ``sessions`` collection but use the ``query`` field to select only the
+   new documents. Output the results to the collection ``session_stat``,
+   but ``reduce`` the contents with the results of the incremental
+   map-reduce:
+
+   .. code-block:: javascript
+
+      db.sessions.mapReduce( mapFunction,
+                             reduceFunction,
+                             {
+                               query: { ts: { $gt: ISODate('2011-11-05 00:00:00') } },
+                               out: { reduce: "session_stat" },
+                               finalize: finalizeFunction
+                             }
+                           );
+
+.. _map-reduce-temporay-collection:
+
+Temporary Collection
+--------------------
+
+The map-reduce operation uses a temporary collection during processing.
+At completion, the temporary collection will be renamed to the
+permanent name atomically. Thus, one can perform a map-reduce operation
+periodically with the same target collection name without worrying
+about a temporary state of incomplete data. This is very useful when
+generating statistical output collections on a regular basis.
+
+.. _map-reduce-sharded-cluster:
+
+Sharded Cluster
+---------------
+
+Sharded Input
+~~~~~~~~~~~~~
+
+If the input collection is sharded, :program:`mongos` will
+automatically dispatch the map-reduce job to each shard to be executed
+in parallel. There is no special option required. :program:`mongos`
+will wait for jobs on all shards to finish.
+
+Sharded Output
+~~~~~~~~~~~~~~
+
+By default the output collection will not be sharded. The process is:
+
+- :program:`mongos` dispatches a map-reduce finish job to the shard
+  that will store the target collection.
+
+- The target shard will pull results from all other shards, run a final
+  reduce/finalize, and write to the output.
+
+- If using the sharded option in the ``<out>`` parameter, the output will be
+  sharded using ``_id`` as the shard key.
+
+.. versionchanged:: 2.2
+
+- If the output collection does not exist, the collection is created
+  and sharded on the ``_id`` field. Even if empty, its initial chunks
+  are created based on the result of the first step of the map-reduce
+  operation.
+
+- :program:`mongos` dispatches, in parallel, a map-reduce finish job
+  to every shard that owns a chunk.
+
+- Each shard will pull the results it owns from all other shards, run a
+  final reduce/finalize, and write to the output collection.
+
+.. note::
+
+   - During additional map-reduce jobs, chunk splitting will be done as needed.
+
+   - Balancing of chunks for the output collection is automatically
+     prevented during post-processing to avoid concurrency issues.
+
+Prior to version 2.1:
+
+- :program:`mongos` retrieves the results from each shard, doing a
+  merge sort to order the results, and performs a reduce/finalize as
+  needed. :program:`mongos` then writes the result to the output
+  collection in sharded mode.
+
+- Only a small amount of memory is required even for large datasets.
+
+- Shard chunks do not get automatically split and migrated during
+  insertion. Manual intervention is required until the chunks are
+  granular and balanced.
+
+.. warning:: 
+
+   Sharded output for mapreduce has been overhauled in v2.2. Its use in
+   earlier versions is not recommended.
+
+.. _map-reduce-additional-references:
+
+Additional References
+---------------------
+
+.. seealso::
+
+   - :doc:`/tutorial/troubleshoot-map-reduce`
+
+   - :wiki:`Map-Reduce Concurrency
+     <How+does+concurrency+work#Howdoesconcurrencywork-MapReduce>` 
+
+   - `MapReduce, Geospatial Indexes, and Other Cool Features <http://www.slideshare.net/mongosf/mapreduce-geospatial-indexing-and-other-cool-features-kristina-chodorow>`_ - Kristina Chodorow at MongoSF (April 2010)