Skip to content

Commit 726f969

Browse files
biniona-mongodbjeff-allen-mongo
authored andcommitted
(DOCSP-18540) Data Formats (#150)
Co-authored-by: Jeff Allen <[email protected]>
1 parent ff92235 commit 726f969

File tree

4 files changed

+219
-8
lines changed

4 files changed

+219
-8
lines changed

source/introduction.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,6 @@ Introduction
88
Kafka and Kafka Connect </introduction/kafka-connect>
99
Key Concepts </introduction/key-concepts>
1010
Data Formats </introduction/data-formats>
11+
Converters </introduction/converters>
1112

1213
asdf

source/introduction/data-formats.txt

Lines changed: 214 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,218 @@
22
Data Formats
33
============
44

5-
.. toctree::
6-
:titlesonly:
7-
:maxdepth: 1
5+
.. default-domain:: mongodb
86

9-
Converters </introduction/data-formats/converters>
10-
Avro Schema </introduction/data-formats/avro-schema>
7+
.. contents:: On this page
8+
:local:
9+
:backlinks: none
10+
:depth: 2
11+
:class: singlecol
12+
13+
Overview
14+
--------
15+
16+
In this guide, you can learn about the data formats you use when working with the
17+
{+mkc+} and your pipeline.
18+
19+
.. _kafka-df-sample-doc:
20+
21+
This guide uses the following sample document to show the behavior of the
22+
different formats:
23+
24+
.. code-block:: json
25+
:copyable: false
26+
27+
{company:"MongoDB"}
28+
29+
JSON
30+
----
31+
32+
JSON is a data-interchange format based on JavaScript object notation. You
33+
represent the :ref:`sample document <kafka-df-sample-doc>` in JSON like this:
34+
35+
.. code-block:: json
36+
:copyable: false
37+
38+
{"company":"MongoDB"}
39+
40+
You may encounter the following data formats related to JSON when working with the {+mkc+}:
41+
42+
- :ref:`Raw JSON <kafka-df-raw-json>`
43+
- :ref:`BSON <kafka-df-bson>`
44+
- :ref:`JSON Schema <kafka-df-json-schema>`
45+
46+
For more information on JSON,
47+
see the `official JSON website <https://www.json.org/json-en.html>`__.
48+
49+
.. _kafka-df-raw-json:
50+
51+
Raw JSON
52+
~~~~~~~~
53+
54+
Raw JSON is a data format that consists of JSON objects written as strings. You represent the
55+
:ref:`sample document <kafka-df-sample-doc>` in Raw JSON like this:
56+
57+
.. code-block:: text
58+
:copyable: false
59+
60+
"{\"company\":\"MongoDB\"}"
61+
62+
You use Raw JSON when you specify a String Converter on a
63+
source or sink connector. <TODO: Link to Converters page when it is ready>
64+
65+
.. _kafka-df-bson:
66+
67+
BSON
68+
~~~~
69+
70+
BSON is a binary serialization encoding for JSON-like objects. BSON encodes
71+
the :ref:`sample document <kafka-df-sample-doc>` like this:
72+
73+
.. code-block:: text
74+
:copyable: false
75+
76+
\x1a\x00\x00\x00\x02company\x00\x08\x00\x00\x00MongoDB\x00\x00
77+
78+
Your connectors use the BSON format to send and receive documents from your
79+
MongoDB deployment.
80+
81+
For more information on BSON, see `the BSON specification <https://bsonspec.org/>`__.
82+
83+
.. _kafka-df-json-schema:
84+
85+
JSON Schema
86+
~~~~~~~~~~~
87+
88+
JSON Schema is a syntax for specifying **schemas** for JSON objects. A schema is
89+
a definition attached to an {+ak+} Topic that defines valid values for that topic.
90+
91+
You can specify a schema for the :ref:`sample document <kafka-df-sample-doc>`
92+
with JSON Schema like this:
93+
94+
.. code-block:: json
95+
:copyable: false
96+
97+
{
98+
"$schema":"http://json-schema.org/draft-07/schema",
99+
"$id":"unique id",
100+
"type":"object",
101+
"title":"Example Schema",
102+
"description":"JSON Schema for the sample document.",
103+
"required":[
104+
"company"
105+
],
106+
"properties":{
107+
"company":{
108+
"$id":"another unique id",
109+
"type":"string",
110+
"title":"Company",
111+
"description":"A field to hold the name of a company"
112+
}
113+
},
114+
"additionalProperties":false
115+
}
116+
117+
You use JSON Schema when you apply JSON Schema converters to your connectors.
118+
<TODO: Link to this section of converters page>
119+
120+
For more information, see the official
121+
`JSON Schema website <https://json-schema.org/>`__.
122+
123+
Avro
124+
----
125+
126+
Apache Avro is an open-source framework for serializing and transporting
127+
data described by schemas. Avro defines two data formats relevant to the {+mkc+}:
128+
129+
- :ref:`Avro schema <kafka-df-avro-schema>`
130+
- :ref:`Avro binary encoding <kafka-df-avro-encoding>`
131+
132+
For more information on Apache Avro, see the
133+
`Apache Avro Documentation <https://avro.apache.org/docs/current/index.html>`__.
134+
135+
.. _kafka-df-avro-schema:
136+
137+
Avro Schema
138+
~~~~~~~~~~~
139+
140+
Avro schema is a JSON-based schema definition syntax. Avro schema supports the
141+
specification of the following groups of data types:
142+
143+
- `Primitive Types <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__
144+
- `Complex Types <https://avro.apache.org/docs/current/spec.html#schema_complex>`__
145+
- `Logical Types <https://avro.apache.org/docs/current/spec.html#Logical+Types>`__
146+
147+
.. important:: Sink Connectors and Logical Types
148+
149+
{+mkc+} sink connectors support all Avro schema primitive and complex types,
150+
however {+mkc+} sink connectors support only the following logical types:
151+
152+
- ``decimal``
153+
- ``date``
154+
- ``time-millis``
155+
- ``time-micros``
156+
- ``timestamp-millis``
157+
- ``timestamp-micros``
158+
159+
You can construct an Avro schema for the :ref:`sample document <kafka-df-sample-doc>`
160+
like this:
161+
162+
.. code-block:: json
163+
:copyable: false
164+
165+
{
166+
 "type": "record",
167+
 "name": "example",
168+
 "doc": "example documents have a company field",
169+
 "fields": [
170+
   {
171+
     "name": "company",
172+
     "type": "string"
173+
   }
174+
 ]
175+
}
176+
177+
You use Avro schema when you
178+
:ref:`define a schema for a {+mkc+} source connector <source-specify-avro-schema>`.
179+
180+
For a list of all Avro schema types, see the
181+
`Apache Avro specification <https://avro.apache.org/docs/current/spec.html>`__.
182+
183+
.. _kafka-df-avro-encoding:
184+
185+
Avro Binary Encoding
186+
~~~~~~~~~~~~~~~~~~~~
187+
188+
Avro specifies a binary serialization encoding for JSON objects defined by an
189+
Avro schema.
190+
191+
If you use the
192+
:ref:`preceding Avro schema <kafka-df-avro-schema>`, you can represent the
193+
:ref:`sample document <kafka-df-sample-doc>` with Avro binary encoding
194+
like this:
195+
196+
.. code-block:: text
197+
:copyable: false
198+
199+
\x0eMongoDB
200+
201+
You use Avro binary encoding when you specify an Avro Converter on a
202+
source or sink connector. <TODO: Link to Converters page when its ready>.
203+
204+
To learn more about Avro binary encoding, see
205+
`this section of the Avro specification <https://avro.apache.org/docs/current/spec.html#Data+Serialization+and+Deserialization>`__.
206+
207+
.. _kafka-db-byte-arrays:
208+
209+
Byte Arrays
210+
-----------
211+
212+
A byte array is a consecutive sequence of unstructured bytes.
213+
214+
You can represent the sample document as a byte array using any of the encodings
215+
mentioned above.
216+
217+
You use byte arrays when your converters send data to or receive data
218+
from {+ak+}. For more information on converters, see our guide on converters.
219+
<TODO: link to converters page>

source/introduction/data-formats/avro.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ Avro
1414
Overview
1515
--------
1616

17-
TODO: Refactor This Page as outlined in DOCSP-18210
17+
<TODO: Delete this page before releasing the docs>
18+
1819

1920
In this guide, you can find the following information about Apache Avro:
2021

@@ -164,7 +165,7 @@ For more information on source connector configuration options, see our
164165
:doc:`source connector guide </source-connector>`.
165166

166167
For more information on converters, see our
167-
:doc:`converters guide </introduction/data-formats/converters>`.
168+
converters guide.
168169

169170
Sink Connector Configuration Options
170171
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -208,7 +209,7 @@ actions:
208209
- Use the retrieved Avro schema to parse the contents of the received message
209210

210211
For more information on converters, see our
211-
:doc:`converters guide </introduction/data-formats/converters>`.
212+
converters guide.
212213

213214
For more information on using Kafka Connect with Schema Registry, see
214215
`Confluent's "Using Kafka Connect with Schema Registy" page <https://docs.confluent.io/platform/current/schema-registry/connect.html>`__.

0 commit comments

Comments
 (0)