Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340

kosiew · 2025-06-09T11:19:50Z

Which issue does this PR close?

Closes Add documentation on constraint enforcements #16309

Rationale for this change

Table constraints like primary keys, uniqueness, and foreign keys are common features in relational systems, but DataFusion does not currently enforce or optimize based on most of them. This lack of enforcement isn't clearly documented, which can lead to confusion for TableProvider authors and users expecting standard SQL behavior. This PR aims to clarify that and guide users with expectations and references for typical implementations.

What changes are included in this PR?

Adds documentation to the custom-table-providers.md file describing how DataFusion currently treats table constraints.
Notes that some constraints (like nullability) are enforced, but others (like uniqueness or PK/FK constraints) are not.
References relevant background discussion and highlights the optimizer's current limitations in leveraging constraint metadata.

Are these changes tested?

N/A – This change is purely documentation-related and does not include or require any code or behavior changes.

Are there any user-facing changes?

Yes – this change updates the documentation to make constraint behavior more transparent for users implementing custom TableProviders.

…tion

alamb

Thank you @kosiew -- this is a great improvement on what we have

i think we could merge this PR as is and update it as a follow on too so I am approving it

docs/source/library-user-guide/table-constraints.md

alamb · 2025-06-11T01:16:33Z

docs/source/library-user-guide/table-constraints.md

+The optimizer also does not assume that these constraints hold when
+rewriting queries. For example, declaring a column as a primary key will
+not allow the optimizer to skip a `DISTINCT` aggregation.


I didn't think this was true -- I was pretty sure there are some ordering / functional dependency check that relies on declared constraints, but I couldn't find it quickly when searching

Maybe @mustafasrepo remembers 🤔

hi @alamb,

You're right.

I tested this in datafusion-cli

-- Test 1: Create table with more data to see if DISTINCT appears CREATE TABLE test_pk_large ( id INTEGER PRIMARY KEY, name VARCHAR(50) ); -- Insert duplicate names but unique IDs INSERT INTO test_pk_large VALUES (1, 'Alice'), (2, 'Alice'), (3, 'Bob'), (4, 'Bob'), (5, 'Charlie'); -- Test DISTINCT on primary key column EXPLAIN SELECT DISTINCT id FROM test_pk_large; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ bytes: 376 │ | | | │ format: memory │ | | | │ rows: 1 │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ -- Test 2 CREATE TABLE test_no_pk ( id INTEGER, name VARCHAR(50) ); -- Insert unique IDs (same as before) INSERT INTO test_no_pk VALUES (1, 'Alice'), (2, 'Alice'), (3, 'Bob'), (4, 'Bob'), (5, 'Charlie'); EXPLAIN SELECT DISTINCT id FROM test_no_pk; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ group_by: id │ | | | │ │ | | | │ mode: │ | | | │ FinalPartitioned │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 10 -> 10 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ Hash([id@0], 10) │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 1 -> 10 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ RoundRobinBatch(10) │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ group_by: id │ | | | │ mode: Partial │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ bytes: 376 │ | | | │ format: memory │ | | | │ rows: 1 │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+

In other words, the declared constraints does affect the optimizer.
I'll remove this paragraph.

Co-authored-by: Andrew Lamb <[email protected]>

…e constraints documentation

xudong963

Thank you

alamb · 2025-06-12T04:21:47Z

Thank you @kosiew and @xudong963

kosiew added 2 commits June 9, 2025 18:24

Add documentation for table constraint enforcement in DataFusion

eb9fff5

Merge branch 'main' into doc-16309

6d3b5c4

github-actions bot added the documentation Improvements or additions to documentation label Jun 9, 2025

kosiew added 3 commits June 9, 2025 19:20

Merge branch 'main' into doc-16309

6fd4527

Add link to table constraints documentation in index.rst

d6d51db

fix: correct markdown link references for table constraints documenta…

3a9ee8d

…tion

alamb approved these changes Jun 11, 2025

View reviewed changes

kosiew and others added 3 commits June 11, 2025 15:57

Update docs/source/library-user-guide/table-constraints.md

46859f2

Co-authored-by: Andrew Lamb <[email protected]>

fix: remove incorrect information about optimizer constraints in tabl…

27e396c

…e constraints documentation

prettier fix

8da0cbe

xudong963 approved these changes Jun 11, 2025

View reviewed changes

alamb merged commit 0e84041 into apache:main Jun 12, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340

Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340

Uh oh!

kosiew commented Jun 9, 2025 •

edited by alamb

Loading

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

alamb Jun 11, 2025

Uh oh!

kosiew Jun 11, 2025 •

edited

Loading

Uh oh!

xudong963 left a comment

Uh oh!

alamb commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340

Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340

Uh oh!

Conversation

kosiew commented Jun 9, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

kosiew Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

kosiew commented Jun 9, 2025 •

edited by alamb

Loading

kosiew Jun 11, 2025 •

edited

Loading