Skip to content

Commit 4d48977

Browse files
biniona-mongodbnathan-contino-mongorozzaRWaltersMA
authored andcommitted
(DOCSP-18211) source fundamentals schema (#132)
Co-authored-by: Nathan Contino <[email protected]> Co-authored-by: Ross Lawley <[email protected]> Co-authored-by: Robert Walters <[email protected]>
1 parent 22a5f82 commit 4d48977

File tree

3 files changed

+189
-1
lines changed

3 files changed

+189
-1
lines changed

conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,6 @@
135135

136136
html_sidebars = sconf.sidebars
137137

138-
139138
# -- Options for Epub output ---------------------------------------------------
140139

141140
# Bibliographic Dublin Core info.

source/source-connector/fundamentals.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,6 @@ Source Connector Fundamentals
99
Change Streams </source-connector/fundamentals/change-streams>
1010
Document Metadata </source-connector/fundamentals/document-metadata>
1111
Scaling Source Connectors </source-connector/fundamentals/scaling-source-connectors>
12+
Apply Schemas </source-connector/fundamentals/specify-schema>
1213

1314
asdf
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
=============
2+
Apply Schemas
3+
=============
4+
5+
.. default-domain:: mongodb
6+
7+
.. contents:: On this page
8+
:local:
9+
:backlinks: none
10+
:depth: 2
11+
:class: singlecol
12+
13+
14+
Overview
15+
--------
16+
17+
In this guide, you can learn how to apply schemas to incoming
18+
documents in a {+mkc+} source connector.
19+
20+
There are two types of schema in Kafka Connect, **key schema** and
21+
**value schema**. Kafka Connect sends messages to Apache Kafka containing both
22+
your value and a key. A key schema enforces a structure for keys in messages
23+
sent to Apache Kafka. A value schema enforces a structure for values in messages
24+
sent to Apache Kafka.
25+
26+
.. important:: Note on Terminology
27+
28+
The word "key" has a slightly different meaning in the context
29+
of BSON and Apache Kafka. In BSON, a "key" is a unique string identifier for
30+
a field in a document.
31+
32+
In Apache Kafka, a "key" is a byte array sent in a message used to determine
33+
what partition of a topic to write the message to. Kafka keys can be
34+
duplicates of other keys or ``null``.
35+
36+
Specifying schemas in the {+mkc+} is optional, and you can specify any of the
37+
following combinations of schemas:
38+
39+
- Only a value schema
40+
- Only a key schema
41+
- Both a value and key schema
42+
- No schemas
43+
44+
.. tip:: Benefits of Schema
45+
46+
To see a discussion on the benefits of using schemas with Kafka Connect,
47+
see `this article from Confluent <https://docs.confluent.io/platform/current/schema-registry/index.html#ak-serializers-and-deserializers-background>`__.
48+
49+
To see full properties files for specifying a schema, see our specify a schema
50+
usage example. <TODO: link to example>
51+
52+
To learn more about keys and values in Apache Kafka, see the
53+
`official Apache Kafka introduction <http://kafka.apache.org/intro#intro_concepts_and_terms>`__.
54+
55+
Default Schemas
56+
---------------
57+
58+
The {+mkc+} provides two default schemas:
59+
60+
- :ref:`A key schema for the _id field of MongoDB change event documents. <source-default-key-schema>`
61+
- :ref:`A value schema for MongoDB change event documents. <source-default-value-schema>`
62+
63+
To learn more about change events, see our
64+
:doc:`guide on change streams </source-connector/fundamentals/change-streams>`.
65+
66+
To learn more about default schemas, see the default schemas
67+
:github:`here in the MongoDB Kafka Connector source code <mongodb/mongo-kafka/blob/master/src/main/java/com/mongodb/kafka/connect/source/schema/AvroSchemaDefaults.java>`.
68+
69+
.. _source-default-key-schema:
70+
71+
Key Schema
72+
~~~~~~~~~~
73+
74+
The {+mkc+} provides a default key schema for the ``_id`` field of change
75+
event documents. You should use the default key schema unless you remove the
76+
``_id`` field from your change event document using either of the transformations
77+
:ref:`described in this guide here <source-schema-for-modified-document>`.
78+
79+
If you specify either of these transformations and would like to use a key
80+
schema for your incoming documents, you must specify a key schema
81+
:ref:`as described in the specify a schema section of this guide <source-specify-avro-schema>`.
82+
83+
You can enable the default key schema with the following option:
84+
85+
.. code-block:: java
86+
87+
output.format.key=schema
88+
89+
.. _source-default-value-schema:
90+
91+
Value Schema
92+
~~~~~~~~~~~~
93+
94+
The {+mkc+} provides a default value schema for change event documents. You
95+
should use the default value schema unless you transform your change event
96+
documents
97+
:ref:`as described in this guide here <source-schema-for-modified-document>`.
98+
99+
If you specify either of these transformations and would like to use a value schema for your
100+
incoming documents, you must use one of the mechanisms described in the
101+
:ref:`schemas for transformed documents section of this guide <source-schema-for-modified-document>`.
102+
103+
You can enable the default value schema with the following option:
104+
105+
.. code-block:: properties
106+
107+
output.format.value=schema
108+
109+
.. _source-schema-for-modified-document:
110+
111+
Schemas For Transformed Documents
112+
---------------------------------
113+
114+
There are two ways you can transform your change event documents in a
115+
source connector:
116+
117+
- The ``publish.full.document.only=true`` option
118+
- An aggregation pipeline that modifies the structure of change event documents
119+
120+
If you transform your MongoDB change event documents,
121+
you must do the following to apply schemas:
122+
123+
- :ref:`Specify schemas <source-specify-avro-schema>`
124+
- :ref:`Have the connector infer a value schema <source-infer-a-schema>`
125+
126+
To learn more, see our
127+
:doc:`guide on source connector configuration properties </source-connector/configuration-properties>`.
128+
129+
.. _source-specify-avro-schema:
130+
131+
Specify Schemas
132+
~~~~~~~~~~~~~~~
133+
134+
You can specify schemas for incoming documents using Avro schema syntax. Click on
135+
the following tabs to see how to specify a schema for document values and keys:
136+
137+
.. tabs::
138+
139+
.. tab:: Key
140+
:tabid: key
141+
142+
.. code-block:: properties
143+
144+
output.format.key=schema
145+
output.schema.key=<your avro schema>
146+
147+
.. tab:: Value
148+
:tabid: value
149+
150+
.. code-block:: properties
151+
152+
output.format.value=schema
153+
output.schema.value=<your avro schema>
154+
155+
.. TODO: Make sure this link goes to correct avro schema page
156+
157+
To learn more about Avro Schema, see our
158+
:doc:`guide on Avro schema </introduction/data-formats/avro>`.
159+
160+
.. _source-infer-a-schema:
161+
162+
Infer a Schema
163+
~~~~~~~~~~~~~~
164+
165+
You can have your source connector infer a schema for incoming documents. This
166+
option works well for development and for data sources that do not
167+
frequently change structure, but for most production deployments we recommend that you
168+
:ref:`specify a schema <source-specify-avro-schema>`.
169+
170+
You can have the MongoDB Kafka Connector infer a schema by specifying the
171+
following options:
172+
173+
.. code-block:: properties
174+
175+
output.format.value=schema
176+
output.schema.infer.value=true
177+
178+
.. note:: Cannot Infer Key Schema
179+
180+
The {+mkc+} does not support key schema inference. If you want to use a key
181+
schema and transform your MongoDB change event documents, you must specify a
182+
key schema as described in
183+
:ref:`the specify schemas section of this guide <source-specify-avro-schema>`.
184+
185+
Properties Files
186+
----------------
187+
188+
TODO: <Complete Source Connector Properties File For Default and Specified Schemas>

0 commit comments

Comments
 (0)