|
| 1 | + |
| 2 | +==== |
| 3 | +Avro |
| 4 | +==== |
| 5 | + |
| 6 | +.. default-domain:: mongodb |
| 7 | + |
| 8 | +.. contents:: On this page |
| 9 | + :local: |
| 10 | + :backlinks: none |
| 11 | + :depth: 2 |
| 12 | + :class: singlecol |
| 13 | + |
| 14 | +Overview |
| 15 | +-------- |
| 16 | + |
| 17 | +TODO: Refactor This Page as outlined in DOCSP-18210 |
| 18 | + |
| 19 | +In this guide, you can find the following information about Apache Avro: |
| 20 | + |
| 21 | +- A brief description of Apache Avro |
| 22 | +- A brief description of Avro schema |
| 23 | +- An example Avro schema for a MongoDB collection |
| 24 | +- How to configure the MongoDB Kafka Connector to use Avro schema and its limitations |
| 25 | +- Complete properties files for Avro enabled MongoDB Kafka Connector source and sink connectors |
| 26 | + |
| 27 | +Apache Avro |
| 28 | +----------- |
| 29 | + |
| 30 | +Apache Avro is an open-source framework for serializing and transporting |
| 31 | +data described by schemas. When working with the MongoDB Kafka Connector, you have the |
| 32 | +option to specify schemas for source documents in a format defined by Apache Avro |
| 33 | +called Avro schema. For more information on Avro schema, see this guide's |
| 34 | +:ref:`section on Avro schema <avro-avro-schema>`. |
| 35 | + |
| 36 | +For more information on Apache Avro, see the following resources: |
| 37 | + |
| 38 | +- `Apache Avro Overview <https://avro.apache.org/docs/current/index.html>`__ |
| 39 | +- `Confluent Blog Post "Why Avro for Kafka Data?" <https://www.confluent.io/blog/avro-kafka-data/>`__ |
| 40 | + |
| 41 | +.. _avro-avro-schema: |
| 42 | + |
| 43 | +Avro Schema |
| 44 | +----------- |
| 45 | + |
| 46 | +Avro schema is a JSON based schema definition syntax provided by Apache Avro |
| 47 | +that supports the specification of the following groups of data types: |
| 48 | + |
| 49 | +- `Primitive Types <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__ |
| 50 | +- `Complex Types <https://avro.apache.org/docs/current/spec.html#schema_complex>`__ |
| 51 | +- `Logical Types <https://avro.apache.org/docs/current/spec.html#Logical+Types>`__ |
| 52 | + |
| 53 | +Example Avro Schema for a MongoDB Collection |
| 54 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 55 | + |
| 56 | +In this section, you can see an example of an Avro schema that models documents |
| 57 | +in a MongoDB collection. |
| 58 | + |
| 59 | +Assume you have a MongoDB collection ``customers`` that contains documents resembling |
| 60 | +the following: |
| 61 | + |
| 62 | +.. literalinclude:: /includes/avro-customers.json |
| 63 | + :language: json |
| 64 | + |
| 65 | +The following table discusses the fields of documents in the ``customers`` |
| 66 | +collection and how you can describe them in Avro schema: |
| 67 | + |
| 68 | +.. list-table:: |
| 69 | + :header-rows: 1 |
| 70 | + :widths: 33 33 34 |
| 71 | + |
| 72 | + * - Field Name |
| 73 | + - Collection |
| 74 | + - Avro Schema |
| 75 | + |
| 76 | + * - ``name`` |
| 77 | + - string field |
| 78 | + - `string <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__ primitive type |
| 79 | + |
| 80 | + * - ``visits`` |
| 81 | + - array field containing timestamps |
| 82 | + - `array <https://avro.apache.org/docs/current/spec.html#Arrays>`__ |
| 83 | + complex type holding |
| 84 | + `timestamp-millis <https://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29>`__ logical types |
| 85 | + |
| 86 | + * - ``total_purchased`` |
| 87 | + - object field with string keys and integer values |
| 88 | + - `map <https://avro.apache.org/docs/current/spec.html#Maps>`__ complex |
| 89 | + type with `int <https://avro.apache.org/docs/current/spec.html#schema_primitive>`__ |
| 90 | + primitive type values. |
| 91 | + |
| 92 | +The Avro schema for the ``customers`` collection looks like this: |
| 93 | + |
| 94 | +.. literalinclude:: /includes/avro-customers.avro |
| 95 | + :language: json |
| 96 | + |
| 97 | +For a list of all Avro schema types, see the |
| 98 | +`Apache Avro specification <https://avro.apache.org/docs/current/spec.html>`__. |
| 99 | + |
| 100 | +Avro in the MongoDB Kafka Connector |
| 101 | +----------------------------------- |
| 102 | + |
| 103 | +The MongoDB Kafka Connector has Avro specific limitations and features. |
| 104 | + |
| 105 | +Limited Support For Logical Types |
| 106 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 107 | + |
| 108 | +The MongoDB Kafka Connector supports sink connector conversion from all Avro |
| 109 | +primitive and complex types, however sink connectors can only support conversion from |
| 110 | +the following Avro logical types: |
| 111 | + |
| 112 | +- ``decimal`` |
| 113 | +- ``date`` |
| 114 | +- ``time-millis`` |
| 115 | +- ``time-micros`` |
| 116 | +- ``timestamp-millis`` |
| 117 | +- ``timestamp-micros`` |
| 118 | + |
| 119 | +For a full list of Avro logical types, see the |
| 120 | +`logical types section of the Avro specification <https://avro.apache.org/docs/current/spec.html#schema_complex>`__. |
| 121 | + |
| 122 | +Source Connector Configuration Options |
| 123 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 124 | + |
| 125 | +The MongoDB Kafka Connector provides a default schema that matches change |
| 126 | +stream event documents. If you specify an aggregation pipeline or the |
| 127 | +``publish.full.document.only`` option, you should specify an Avro schema for |
| 128 | +your source data with the following option: |
| 129 | + |
| 130 | +.. code-block:: java |
| 131 | + |
| 132 | + output.schema.value=<your avro schema> |
| 133 | + |
| 134 | +.. note:: Infer Schema |
| 135 | + |
| 136 | + A MongDB Kafka Connector source connector can infer a schema for incoming |
| 137 | + documents that do not fit the default schema. This option works well for data |
| 138 | + sources that do not frequently change structure, but for most deployments we |
| 139 | + recommend using the ``output.schema.value`` option instead to explicitly |
| 140 | + communicate your desired schema to the connector. |
| 141 | + |
| 142 | + You can have the MongoDB Kafka Connector infer schema by specifying the |
| 143 | + following options: |
| 144 | + |
| 145 | + .. code-block:: java |
| 146 | + |
| 147 | + output.format.value="schema" |
| 148 | + output.schema.infer.value=true |
| 149 | + |
| 150 | +.. note:: Type Conversion |
| 151 | + |
| 152 | + You can find information about how the MongoDB Kafka Connector converts |
| 153 | + between Avro schema types and BSON types in the |
| 154 | + `MongoDB Kafka Connector source code <https://github.com/mongodb/mongo-kafka/blob/404adca5ffb3ae4cad028339d9136539c4fe9bd4/src/main/java/com/mongodb/kafka/connect/source/schema/BsonValueToSchemaAndValue.java#L72-L109>`__. |
| 155 | + |
| 156 | +A MongoDB Kafka Connector source connector reads data from MongoDB into an |
| 157 | +intermediate Kafka Connect data format. To send data from Kafka Connect to |
| 158 | +Apache Kafka using Avro schema, you must use a converter and Confluent Schema |
| 159 | +Registry. You can learn how to configure a MongoDB Kafka Connector source |
| 160 | +connector to send Avro schema data to Apache Kafka in the |
| 161 | +:ref:`Avro and Schema Registry section of this guide <avro-avro-and-schema-registry>`. |
| 162 | + |
| 163 | +For more information on source connector configuration options, see our |
| 164 | +:doc:`source connector guide </source-connector>`. |
| 165 | + |
| 166 | +For more information on converters, see our |
| 167 | +:doc:`converters guide </introduction/data-formats/converters>`. |
| 168 | + |
| 169 | +Sink Connector Configuration Options |
| 170 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 171 | + |
| 172 | +A MongoDB Kafka Connector sink connector can receive Avro schema data from a topic in Apache |
| 173 | +Kafka. You can learn how to configure a MongoDB Kafka Connector sink |
| 174 | +connector to receive Avro schema data from Apache Kafka in the |
| 175 | +:ref:`Avro and Schema Registry section of this guide <avro-avro-and-schema-registry>`. |
| 176 | + |
| 177 | +Avro and Schema Registry |
| 178 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 179 | + |
| 180 | +Kafka Connect natively supports integration with **Confluent Schema Registry** |
| 181 | +for source and sink connectors. Confluent Schema Registry is a tool for serving, |
| 182 | +storing, and compatibility testing schemas. |
| 183 | + |
| 184 | +You can integrate a source or sink connector using Avro schema |
| 185 | +with Schema Registry by specifying the following properties in your connector |
| 186 | +configuration: |
| 187 | + |
| 188 | +.. _avro-avro-and-schema-registry: |
| 189 | + |
| 190 | +.. code-block:: java |
| 191 | + |
| 192 | + key.converter=io.confluent.connect.avro.AvroConverter |
| 193 | + key.converter.schema.registry.url=<your schema registry uri> |
| 194 | + value.converter=io.confluent.connect.avro.AvroConverter |
| 195 | + value.converter.schema.registry.url=<your schema registry uri> |
| 196 | + |
| 197 | +When applied to a source connector, these properties make the connector perform |
| 198 | +the following actions: |
| 199 | + |
| 200 | +- Automatically register new schemas with Schema Registry |
| 201 | +- Send data to Apache Kafka in Avro schema format |
| 202 | + |
| 203 | +When applied to a sink connector, these properties make the connector perform the following |
| 204 | +actions: |
| 205 | + |
| 206 | +- Retrieve the Avro schema corresponding to a received |
| 207 | + message from Schema Registry |
| 208 | +- Use the retrieved Avro schema to parse the contents of the received message |
| 209 | + |
| 210 | +.. important:: Avro Converter with a MongoDB Data Source |
| 211 | + |
| 212 | + Avro Converters are a great fit for data with a static structure, but are not |
| 213 | + a good fit for dynamic or changing data. MongoDB's schemaless document |
| 214 | + model supports dynamic data, so ensure your MongoDB data source has a static |
| 215 | + structure before specifying an Avro Converter. |
| 216 | + |
| 217 | +For more information on converters, see our |
| 218 | +:doc:`converters guide </introduction/data-formats/converters>`. |
| 219 | + |
| 220 | +For more information on using Kafka Connect with Schema Registry, see |
| 221 | +`Confluent's "Using Kafka Connect with Schema Registy" page <https://docs.confluent.io/platform/current/schema-registry/connect.html>`__. |
| 222 | + |
| 223 | +Complete Properties Files |
| 224 | +------------------------- |
| 225 | + |
| 226 | +In this section we provide complete properties files for Avro enabled MongoDB |
| 227 | +Kafka Connector source and sink connectors. |
| 228 | + |
| 229 | +Source Connector |
| 230 | +~~~~~~~~~~~~~~~~ |
| 231 | + |
| 232 | +This configuration specifies an Avro schema for MongoDB documents processed by your |
| 233 | +source connector and specifies that your connector should send data to Apache |
| 234 | +Kafka using an Avro converter and Schema Registry: |
| 235 | + |
| 236 | +.. code-block:: java |
| 237 | + |
| 238 | + publish.full.document.only=true |
| 239 | + output.format.value="schema" |
| 240 | + output.schema.value=<your avro schema> |
| 241 | + key.converter=io.confluent.connect.avro.AvroConverter |
| 242 | + key.converter.schema.registry.url=<your schema registry uri> |
| 243 | + value.converter=io.confluent.connect.avro.AvroConverter |
| 244 | + value.converter.schema.registry.url=<your schema registry uri> |
| 245 | + |
| 246 | +Sink Connector |
| 247 | +~~~~~~~~~~~~~~ |
| 248 | + |
| 249 | +This configuration specifies that your sink connector should read Avro data from an |
| 250 | +Apache Kafka topic using an Avro converter and Schema Registry: |
| 251 | + |
| 252 | +.. code-block:: java |
| 253 | + |
| 254 | + key.converter=io.confluent.connect.avro.AvroConverter |
| 255 | + key.converter.schema.registry.url=<your schema registry uri> |
| 256 | + value.converter=io.confluent.connect.avro.AvroConverter |
| 257 | + value.converter.schema.registry.url=<your schema registry uri> |
0 commit comments