Skip to content

Commit bb785c1

Browse files
Chris Choschmalliso
authored andcommitted
Docsp 18343 source config output (#144)
* DOCSP-18343: source config output format
1 parent d712d9c commit bb785c1

File tree

3 files changed

+293
-3
lines changed

3 files changed

+293
-3
lines changed

source/source-connector/configuration-properties.txt

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,19 @@ See the following categories for a list of related configuration properties:
4141
* - :ref:`Change Stream Properties <source-configuration-change-stream>`
4242
- Specify your change stream pipelines and cursor settings.
4343

44+
* - :ref:`Output Format Properties <source-configuration-output-format>`
45+
- Specify the format of the data the connector publishes to your
46+
Kafka topic.
47+
48+
* - :ref:`Copy Existing Properties <source-configuration-copy-existing>`
49+
- Specify what data the connector should convert to Change Stream
50+
events.
51+
4452
* - :ref:`Error Handling and Resume Properties <source-configuration-error-handling>`
4553
- Specify how the connector handles errors and resumes reading after an
4654
interruption.
4755

56+
4857
See the `Confluent Source Connector configuration documentation <https://docs.confluent.io/platform/current/installation/configuration/connect/source-connect-configs.html>`__
4958
for more information on these settings.
5059

@@ -54,7 +63,6 @@ See the following categories for a list of related configuration properties:
5463
MongoDB Connection </source-connector/configuration-properties/mongodb-connection>
5564
Kafka Topic </source-connector/configuration-properties/kafka-topic>
5665
Change Stream </source-connector/configuration-properties/change-stream>
66+
Output Format </source-connector/configuration-properties/output-format>
67+
Copy Existing </source-connector/configuration-properties/copy-existing>
5768
Error Handling </source-connector/configuration-properties/error-handling>
58-
59-
..
60-
- Document/Data Format
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
.. _source-configuration-copy-existing:
2+
3+
========================
4+
Copy Existing Properties
5+
========================
6+
7+
.. default-domain:: mongodb
8+
9+
.. contents:: On this page
10+
:local:
11+
:backlinks: none
12+
:depth: 2
13+
:class: singlecol
14+
15+
Overview
16+
--------
17+
18+
Use the following configuration settings to enable the copy existing
19+
feature which converts MongoDB collections into Change Stream events.
20+
21+
.. seealso::
22+
23+
For an example of the copy existing feature, see the
24+
(TODO: link to Copy Existing Data Usage Example) Copy Existing Usage
25+
Example.
26+
27+
.. include:: /includes/source-config-link.rst
28+
29+
Settings
30+
--------
31+
32+
.. list-table::
33+
:header-rows: 1
34+
:widths: 35 65
35+
36+
* - Name
37+
- Description
38+
39+
* - | **copy.existing**
40+
- | **Type:** boolean
41+
|
42+
| **Description:**
43+
| Whether to enable the copy existing feature which converts all
44+
data in a MongoDB collection to Change Stream events and
45+
publishes them on Kafka topics. If MongoDB changes the source
46+
collection data after the connector starts the copy process, the
47+
connector creates events for the changes after it completes the copy
48+
process.
49+
| **Default**:``false``
50+
| **Accepted Values**: ``true`` or ``false``
51+
52+
* - | **copy.existing.namespace.regex**
53+
- | **Type:** string
54+
|
55+
| **Description:**
56+
| Regular expression the connector uses to match namespaces from
57+
which to copy data. A namespace describes the MongoDB database name
58+
and collection separated by a period, e.g.
59+
``databaseName.collectionName``.
60+
61+
.. example::
62+
63+
In the following example, the regular expression setting matches
64+
collections that start with "page" in the "stats" database.
65+
66+
.. code-block:: none
67+
68+
copy.existing.namespace.regex=stats\.page.*
69+
70+
The "\\" character in the example above escapes the "." character
71+
that follows it in the regular expression. For more information on
72+
how to build regular expressions, see the Java API documentation on
73+
`Patterns <https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html>`__.
74+
75+
| **Default**: ``""``
76+
| **Accepted Values**: A valid regular expression
77+
78+
* - | **copy.existing.pipeline**
79+
- | **Type:** string
80+
|
81+
| **Description:**
82+
| An array of :manual:`pipeline operations </core/aggregation-pipeline/>`
83+
the connector runs when copying existing data. You can use this
84+
setting to filter the source collection and improve the use of
85+
indexes in the copying process.
86+
87+
.. example::
88+
89+
The following example shows how you can use the :manual:`$match </reference/operator/aggregation/match/>`
90+
aggregation operator to instruct the connector to copy only
91+
documents that contain a ``closed`` field with a value of ``false``.
92+
93+
.. code-block:: none
94+
95+
copy.existing.pipeline=[ { "$match": { "closed": "false" } } ]
96+
97+
| **Default**: ``[]``
98+
| **Accepted Values**: Valid aggregation pipeline stages
99+
100+
* - | **copy.existing.max.threads**
101+
- | **Type:** int
102+
|
103+
| **Description:**
104+
| The maximum number of threads the connector can use to copy data.
105+
| **Default**: number of processors available in the environment
106+
| **Accepted Values**: An integer
107+
108+
* - | **copy.existing.queue.size**
109+
- | **Type:** int
110+
|
111+
| **Description:**
112+
| The size of the queue the connector can use when copying data.
113+
| **Default**: ``16000``
114+
| **Accepted Values**: An integer
115+
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
.. _source-configuration-output-format:
2+
3+
========================
4+
Output Format Properties
5+
========================
6+
7+
.. default-domain:: mongodb
8+
9+
.. contents:: On this page
10+
:local:
11+
:backlinks: none
12+
:depth: 2
13+
:class: singlecol
14+
15+
Overview
16+
--------
17+
18+
Use the following configuration settings to specify the format of data the
19+
source connector publishes to Kafka topics.
20+
21+
.. include:: /includes/source-config-link.rst
22+
23+
Settings
24+
--------
25+
26+
.. list-table::
27+
:header-rows: 1
28+
:widths: 30 70
29+
30+
* - Name
31+
- Description
32+
33+
* - | **output.format.key**
34+
- | **Type:** string
35+
|
36+
| **Description:**
37+
| Specifies which data format the source connector outputs the key
38+
document.
39+
|
40+
| **Default**: ``json``
41+
| **Accepted Values**: ``bson``, ``json``, ``schema``
42+
43+
* - | **output.format.value**
44+
- | **Type:** string
45+
|
46+
| **Description:**
47+
| Specifies which data format the source connector outputs the value
48+
document.
49+
|
50+
| **Default**: ``json``
51+
| **Accepted Values**: ``bson``, ``json``, ``schema``
52+
53+
* - | **output.json.formatter**
54+
- | **Type:** string
55+
|
56+
| **Description:**
57+
| Class name of the JSON formatter the connector should use to
58+
output data.
59+
|
60+
| **Default**:
61+
62+
.. code-block::
63+
64+
com.mongodb.kafka.connect.source.json.formatter.DefaultJson
65+
66+
| **Accepted Values**:
67+
| One of the following full class names:
68+
69+
.. code-block:: none
70+
71+
com.mongodb.kafka.connect.source.json.formatter.DefaultJson
72+
com.mongodb.kafka.connect.source.json.formatter.ExtendedJson
73+
com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson
74+
75+
| Or your custom JSON formatter full class name.
76+
77+
* - | **output.schema.key**
78+
- | **Type:** string
79+
|
80+
| **Description:**
81+
| Specifies an AVRO schema definition for the key document of the
82+
`SourceRecord <https://kafka.apache.org/23/javadoc/org/apache/kafka/connect/source/SourceRecord.html>`__.
83+
84+
.. seealso::
85+
86+
For more information on AVRO schema, see the Avro schema guide
87+
(TODO: link Fundamentals > Data Formats > AVRO schema page).
88+
89+
| **Default**:
90+
91+
.. code-block:: json
92+
93+
{
94+
"type": "record",
95+
"name": "keySchema",
96+
"fields" : [ { "name": "_id", "type": "string" } ]"
97+
}
98+
99+
| **Accepted Values**: A valid AVRO schema
100+
101+
* - | **output.schema.value**
102+
- | **Type:** string
103+
|
104+
| **Description:**
105+
| Specifies an AVRO schema definition for the value document of the
106+
`SourceRecord <https://kafka.apache.org/23/javadoc/org/apache/kafka/connect/source/SourceRecord.html>`__.
107+
108+
.. seealso::
109+
110+
For more information on AVRO schema, see the AVRO schema guide
111+
(TODO: link Fundamentals > Data Formats > AVRO schema page).
112+
113+
| **Default**:
114+
115+
.. code-block:: json
116+
117+
{
118+
"name": "ChangeStream",
119+
"type": "record",
120+
"fields": [
121+
{ "name": "_id", "type": "string" },
122+
{ "name": "operationType", "type": ["string", "null"] },
123+
{ "name": "fullDocument", "type": ["string", "null"] },
124+
{ "name": "ns",
125+
"type": [{"name": "ns", "type": "record", "fields": [
126+
{"name": "db", "type": "string"},
127+
{"name": "coll", "type": ["string", "null"] } ]
128+
}, "null" ] },
129+
{ "name": "to",
130+
"type": [{"name": "to", "type": "record", "fields": [
131+
{"name": "db", "type": "string"},
132+
{"name": "coll", "type": ["string", "null"] } ]
133+
}, "null" ] },
134+
{ "name": "documentKey", "type": ["string", "null"] },
135+
{ "name": "updateDescription",
136+
"type": [{"name": "updateDescription", "type": "record", "fields": [
137+
{"name": "updatedFields", "type": ["string", "null"]},
138+
{"name": "removedFields",
139+
"type": [{"type": "array", "items": "string"}, "null"]
140+
}] }, "null"] },
141+
{ "name": "clusterTime", "type": ["string", "null"] },
142+
{ "name": "txnNumber", "type": ["long", "null"]},
143+
{ "name": "lsid", "type": [{"name": "lsid", "type": "record",
144+
"fields": [ {"name": "id", "type": "string"},
145+
{"name": "uid", "type": "string"}] }, "null"] }
146+
]
147+
}
148+
149+
| **Accepted Values**: A valid JSON schema
150+
151+
* - | **output.schema.infer.value**
152+
- | **Type:** boolean
153+
|
154+
| **Description:**
155+
| Whether the connector should infer the schema for the value
156+
document of the `SourceRecord <https://kafka.apache.org/23/javadoc/org/apache/kafka/connect/source/SourceRecord.html>`__.
157+
Since the connector processes each document in isolation, the
158+
connector may generate many schemas.
159+
160+
.. important::
161+
162+
The connector only reads this setting when you set your
163+
``output.format.value`` setting to ``schema``.
164+
165+
| **Default**: ``false``
166+
| **Accepted Values**: ``true`` or ``false``
167+

0 commit comments

Comments
 (0)