Skip to content

Commit 21a49f8

Browse files
authored
DOCSP-27245: improved schema inference for nested docs (#85)
* DOCSP-27245: improved schema inference for nested docs * RW comment * DB PR fixes 1
1 parent cac34ca commit 21a49f8

File tree

2 files changed

+20
-1
lines changed

2 files changed

+20
-1
lines changed

source/source-connector/fundamentals/specify-schema.txt

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ schema for your incoming documents, you must specify a key schema
8686

8787
You can enable the default key schema with the following option:
8888

89-
.. code-block:: java
89+
.. code-block:: properties
9090

9191
output.format.key=schema
9292

@@ -182,6 +182,23 @@ following options:
182182
output.format.value=schema
183183
output.schema.infer.value=true
184184

185+
The source connector can infer schemas for incoming documents that
186+
contain nested documents stored in arrays. Starting in Version 1.9 of the
187+
{+connector+}, schema inference will gather the appropriate data type
188+
for fields instead of defaulting to a ``string`` type assignment if there are
189+
differences between nested documents described by the following cases:
190+
191+
- A field is present in one document but missing in another.
192+
- A field is present in one document but ``null`` in another.
193+
- A field is an array with elements of any type in one document but
194+
has additional elements or elements of other data types in another.
195+
- A field is an array with elements of any type in one document but an
196+
empty array in another.
197+
198+
If field types conflict between nested documents, the connector
199+
pushes the conflict down to the schema for the field and defaults to a
200+
``string`` type assignment.
201+
185202
.. note:: Cannot Infer Key Schema
186203

187204
The {+connector+} does not support key schema inference. If you want to use a key

source/whats-new.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ What's New in 1.9
4040
- Introduced the ``change.stream.full.document.before.change`` setting that
4141
allows you to access and configure the pre-image of an update
4242
operation in the change stream event document.
43+
- Improved :ref:`schema inference <source-infer-a-schema>` for nested
44+
documents contained in arrays.
4345

4446
.. _kafka-connector-whats-new-1.8.1:
4547

0 commit comments

Comments
 (0)