-
Notifications
You must be signed in to change notification settings - Fork 288
Conversation
This change of syntax is a breaking change for every user of data-diff, not to mention our own documentation and videos. Why not add a |
Mainly to support flags for the dbt functionality. I think it would be confusing to have flags that only apply when the What would the behavior of |
This is very minor, but we should be consistent around saying database/table 1/2 vs database/table a/b. We should use a/b instead of 1/2. Erez will recall that I was very annoying about this when he was naming columns for materialized tables. While I don't feel strongly about 1 vs a, we settled on a, and that is reflected in this guide: https://docs.datafold.com/guides/os_diff 🙏 |
We already have this kind of behavior for hashdiff vs joindiff. It might be confusing, but I don't know of an easy way to avoid it. Also, I believe we can allow the We already do a bit of syntax shenanigans here to allow the The last alternative is to create a new entry-point, like Either way, breaking the API is a definite no-no. |
Switched to option flags, dbt specific functionality is prefixed by --dbt- |
I did move some of the option functionality up to the main method when I thought it made sense (--debug, --no-tracking, etc.) For now, the majority of the other options are ignored when using --dbt or --dbt --cloud, but we can add that over time |
4ce9f39
to
27e0a87
Compare
tested compatibility with dbt: I didn't cover every adapter (just snowflake), but that should not affect the dbt-core artifacts being parsed. |
Hey @williebsweet @nolar, I think I've addressed all of the feedback. Re-requested review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough for me
Adds options to datadiff:
data-diff --dbt
anddata-diff --dbt --cloud
With some suboptions:
--dbt-profiles-dir
--dbt-project-dir
The main options parse a dbt project's artifacts and profiles.yml in order to easily run multiple diffs without specifying the typical connection strings and table names.
Currently, it will run a diff for each successful model in the project's last run.
The main configuration step is to add one or both of the following to the vars section of the dbt_project.yml:
If a dbt project is organized at a schema level (e.g. all models write to a single schema), both are needed. If the dbt project writes to multiple schemas, just the database var is needed.
If you're using the --dbt-cloud option, that requires a Datafold api key on the env var DATAFOLD_API_KEY. Additionally a datasource_id in the vars section:
Caveats: