Skip to content

Commit 75d3a41

Browse files
biniona-mongodbkyuan-mongodb
authored andcommitted
(DOCSP-15786) Add Avro Overview (#121)
Co-authored-by: kyuan-mongodb <[email protected]>
1 parent 67aaa98 commit 75d3a41

File tree

5 files changed

+300
-1
lines changed

5 files changed

+300
-1
lines changed

source/includes/avro-customers.avro

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"type":"record",
3+
"name":"Customer",
4+
"fields":[
5+
{
6+
"name":"name",
7+
"type":"string"
8+
},
9+
{
10+
"name":"visits",
11+
"type":{
12+
"type":"array",
13+
"items":{
14+
"type":"long",
15+
"logicalType":"timestamp-millis"
16+
}
17+
}
18+
},
19+
{
20+
"name":"total_purchased",
21+
"type":{
22+
"type":"map",
23+
"values":"int"
24+
}
25+
}
26+
]
27+
}

source/includes/avro-customers.json

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"name":"Jan",
3+
"visits":[
4+
{"$date":"2016-01-03T05:00:00.000Z"},
5+
{"$date":"2019-01-20T05:00:00.000Z"}
6+
],
7+
"total_purchased":{
8+
"apples":1,
9+
"bananas":10
10+
}
11+
}

source/introduction/data-formats.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,8 @@
22
Data Formats
33
============
44

5+
.. toctree::
6+
:titlesonly:
7+
:maxdepth: 1
8+
59

6-
asdf
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
2+
====
3+
Avro
4+
====
5+
6+
.. default-domain:: mongodb
7+
8+
.. contents:: On this page
9+
:local:
10+
:backlinks: none
11+
:depth: 2
12+
:class: singlecol
13+
14+
Overview
15+
--------
16+
17+
TODO: Refactor This Page as outlined in DOCSP-18210
18+
19+
In this guide, you can find the following information about Apache Avro:
20+
21+
- A brief description of Apache Avro
22+
- A brief description of Avro schema
23+
- An example Avro schema for a MongoDB collection
24+
- How to configure the MongoDB Kafka Connector to use Avro schema and its limitations
25+
- Complete properties files for Avro enabled MongoDB Kafka Connector source and sink connectors
26+
27+
Apache Avro
28+
-----------
29+
30+
Apache Avro is an open-source framework for serializing and transporting
31+
data described by schemas. When working with the MongoDB Kafka Connector, you have the
32+
option to specify schemas for source documents in a format defined by Apache Avro
33+
called Avro schema. For more information on Avro schema, see this guide's
34+
:ref:`section on Avro schema <avro-avro-schema>`.
35+
36+
For more information on Apache Avro, see the following resources:
37+
38+
- `Apache Avro Overview <https://avro.apache.org/docs/current/index.html>`__
39+
- `Confluent Blog Post "Why Avro for Kafka Data?" <https://www.confluent.io/blog/avro-kafka-data/>`__
40+
41+
.. _avro-avro-schema:
42+
43+
Avro Schema
44+
-----------
45+
46+
Avro schema is a JSON based schema definition syntax provided by Apache Avro
47+
that supports the specification of the following groups of data types:
48+
49+
- `Primitive Types <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__
50+
- `Complex Types <https://avro.apache.org/docs/current/spec.html#schema_complex>`__
51+
- `Logical Types <https://avro.apache.org/docs/current/spec.html#Logical+Types>`__
52+
53+
Example Avro Schema for a MongoDB Collection
54+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55+
56+
In this section, you can see an example of an Avro schema that models documents
57+
in a MongoDB collection.
58+
59+
Assume you have a MongoDB collection ``customers`` that contains documents resembling
60+
the following:
61+
62+
.. literalinclude:: /includes/avro-customers.json
63+
:language: json
64+
65+
The following table discusses the fields of documents in the ``customers``
66+
collection and how you can describe them in Avro schema:
67+
68+
.. list-table::
69+
:header-rows: 1
70+
:widths: 33 33 34
71+
72+
* - Field Name
73+
- Collection
74+
- Avro Schema
75+
76+
* - ``name``
77+
- string field
78+
- `string <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__ primitive type
79+
80+
* - ``visits``
81+
- array field containing timestamps
82+
- `array <https://avro.apache.org/docs/current/spec.html#Arrays>`__
83+
complex type holding
84+
`timestamp-millis <https://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29>`__ logical types
85+
86+
* - ``total_purchased``
87+
- object field with string keys and integer values
88+
- `map <https://avro.apache.org/docs/current/spec.html#Maps>`__ complex
89+
type with `int <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__
90+
primitive type values.
91+
92+
The Avro schema for the ``customers`` collection looks like this:
93+
94+
.. literalinclude:: /includes/avro-customers.avro
95+
:language: json
96+
97+
For a list of all Avro schema types, see the
98+
`Apache Avro specification <https://avro.apache.org/docs/current/spec.html>`__.
99+
100+
Avro in the MongoDB Kafka Connector
101+
-----------------------------------
102+
103+
The MongoDB Kafka Connector has Avro specific limitations and features.
104+
105+
Limited Support For Logical Types
106+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
107+
108+
The MongoDB Kafka Connector supports sink connector conversion from all Avro
109+
primitive and complex types, however sink connectors can only support conversion from
110+
the following Avro logical types:
111+
112+
- ``decimal``
113+
- ``date``
114+
- ``time-millis``
115+
- ``time-micros``
116+
- ``timestamp-millis``
117+
- ``timestamp-micros``
118+
119+
For a full list of Avro logical types, see the
120+
`logical types section of the Avro specification <https://avro.apache.org/docs/current/spec.html#schema_complex>`__.
121+
122+
Source Connector Configuration Options
123+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
124+
125+
The MongoDB Kafka Connector provides a default schema that matches change
126+
stream event documents. If you specify an aggregation pipeline or the
127+
``publish.full.document.only`` option, you should specify an Avro schema for
128+
your source data with the following option:
129+
130+
.. code-block:: java
131+
132+
output.schema.value=<your avro schema>
133+
134+
.. note:: Infer Schema
135+
136+
A MongDB Kafka Connector source connector can infer a schema for incoming
137+
documents that do not fit the default schema. This option works well for data
138+
sources that do not frequently change structure, but for most deployments we
139+
recommend using the ``output.schema.value`` option instead to explicitly
140+
communicate your desired schema to the connector.
141+
142+
You can have the MongoDB Kafka Connector infer schema by specifying the
143+
following options:
144+
145+
.. code-block:: java
146+
147+
output.format.value="schema"
148+
output.schema.infer.value=true
149+
150+
.. note:: Type Conversion
151+
152+
You can find information about how the MongoDB Kafka Connector converts
153+
between Avro schema types and BSON types in the
154+
`MongoDB Kafka Connector source code <https://github.com/mongodb/mongo-kafka/blob/404adca5ffb3ae4cad028339d9136539c4fe9bd4/src/main/java/com/mongodb/kafka/connect/source/schema/BsonValueToSchemaAndValue.java#L72-L109>`__.
155+
156+
A MongoDB Kafka Connector source connector reads data from MongoDB into an
157+
intermediate Kafka Connect data format. To send data from Kafka Connect to
158+
Apache Kafka using Avro schema, you must use a converter and Confluent Schema
159+
Registry. You can learn how to configure a MongoDB Kafka Connector source
160+
connector to send Avro schema data to Apache Kafka in the
161+
:ref:`Avro and Schema Registry section of this guide <avro-avro-and-schema-registry>`.
162+
163+
For more information on source connector configuration options, see our
164+
:doc:`source connector guide </source-connector>`.
165+
166+
For more information on converters, see our
167+
:doc:`converters guide </introduction/data-formats/converters>`.
168+
169+
Sink Connector Configuration Options
170+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
171+
172+
A MongoDB Kafka Connector sink connector can receive Avro schema data from a topic in Apache
173+
Kafka. You can learn how to configure a MongoDB Kafka Connector sink
174+
connector to receive Avro schema data from Apache Kafka in the
175+
:ref:`Avro and Schema Registry section of this guide <avro-avro-and-schema-registry>`.
176+
177+
Avro and Schema Registry
178+
~~~~~~~~~~~~~~~~~~~~~~~~
179+
180+
Kafka Connect natively supports integration with **Confluent Schema Registry**
181+
for source and sink connectors. Confluent Schema Registry is a tool for serving,
182+
storing, and compatibility testing schemas.
183+
184+
You can integrate a source or sink connector using Avro schema
185+
with Schema Registry by specifying the following properties in your connector
186+
configuration:
187+
188+
.. _avro-avro-and-schema-registry:
189+
190+
.. code-block:: java
191+
192+
key.converter=io.confluent.connect.avro.AvroConverter
193+
key.converter.schema.registry.url=<your schema registry uri>
194+
value.converter=io.confluent.connect.avro.AvroConverter
195+
value.converter.schema.registry.url=<your schema registry uri>
196+
197+
When applied to a source connector, these properties make the connector perform
198+
the following actions:
199+
200+
- Automatically register new schemas with Schema Registry
201+
- Send data to Apache Kafka in Avro schema format
202+
203+
When applied to a sink connector, these properties make the connector perform the following
204+
actions:
205+
206+
- Retrieve the Avro schema corresponding to a received
207+
message from Schema Registry
208+
- Use the retrieved Avro schema to parse the contents of the received message
209+
210+
.. important:: Avro Converter with a MongoDB Data Source
211+
212+
Avro Converters are a great fit for data with a static structure, but are not
213+
a good fit for dynamic or changing data. MongoDB's schemaless document
214+
model supports dynamic data, so ensure your MongoDB data source has a static
215+
structure before specifying an Avro Converter.
216+
217+
For more information on converters, see our
218+
:doc:`converters guide </introduction/data-formats/converters>`.
219+
220+
For more information on using Kafka Connect with Schema Registry, see
221+
`Confluent's "Using Kafka Connect with Schema Registy" page <https://docs.confluent.io/platform/current/schema-registry/connect.html>`__.
222+
223+
Complete Properties Files
224+
-------------------------
225+
226+
In this section we provide complete properties files for Avro enabled MongoDB
227+
Kafka Connector source and sink connectors.
228+
229+
Source Connector
230+
~~~~~~~~~~~~~~~~
231+
232+
This configuration specifies an Avro schema for MongoDB documents processed by your
233+
source connector and specifies that your connector should send data to Apache
234+
Kafka using an Avro converter and Schema Registry:
235+
236+
.. code-block:: java
237+
238+
publish.full.document.only=true
239+
output.format.value="schema"
240+
output.schema.value=<your avro schema>
241+
key.converter=io.confluent.connect.avro.AvroConverter
242+
key.converter.schema.registry.url=<your schema registry uri>
243+
value.converter=io.confluent.connect.avro.AvroConverter
244+
value.converter.schema.registry.url=<your schema registry uri>
245+
246+
Sink Connector
247+
~~~~~~~~~~~~~~
248+
249+
This configuration specifies that your sink connector should read Avro data from an
250+
Apache Kafka topic using an Avro converter and Schema Registry:
251+
252+
.. code-block:: java
253+
254+
key.converter=io.confluent.connect.avro.AvroConverter
255+
key.converter.schema.registry.url=<your schema registry uri>
256+
value.converter=io.confluent.connect.avro.AvroConverter
257+
value.converter.schema.registry.url=<your schema registry uri>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TODO: Complete this page

0 commit comments

Comments
 (0)