You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Major Features and Improvements
Bug Fixes and Other Changes
Crop values in natural language stats generator.
Switch to using PyBind11 instead of SWIG for wrapping C++ libraries.
CSV decoder support for multivalent columns by using tfx_bsl's decoder.
When inferring a schema entry for a feature, do not add a shape with dim = 0
when min_num_values = 0.
Add utility methods tfdv.get_slice_stats to get statistics for a slice and tfdv.compare_slices to compare statistics of two slices using Facets.
Make tfdv.load_stats_text and tfdv.write_stats_text public.
Add PTransforms tfdv.WriteStatisticsToText and tfdv.WriteStatisticsToTFRecord to write statistics proto to text and
tfrecord files respectively.
Modify tfdv.load_statistics to handle reading statistics from TFRecord and
text files.
Added an extra requirement group mutual-information. As a result, barebone
TFDV does not require scikit-learn any more.
Added an extra requirement group visualization. As a result, barebone TFDV
does not require ipython any more.
Added an extra requirement group all that specifies all the extra
dependencies TFDV needs. Use pip install tensorflow-data-validation[all]
to pull in those dependencies.
Depends on pyarrow>=0.16,<0.17.
Depends on apache-beam[gcp]>=2.20,<3.
Depends on `ipython>=7,<8;python_version>="3"'.
Depends on `scikit-learn>=0.18,<0.24'.
Depends on tensorflow>=1.15,!=2.0.*,<3.
Depends on tensorflow-metadata>=0.22.0,<0.23.
Depends on tensorflow-transform>=0.22,<0.23.
Depends on tfx-bsl>=0.22,<0.23.
Known Issues
(Known issue resolution) It is no longer necessary to use Apache Beam 2.17
when running TFDV on Windows. The current release of Apache Beam will work.
Breaking Changes
tfdv.GenerateStatistics now accepts a PCollection of pa.RecordBatch
instead of pa.Table.
All the TFDV coders now output a PCollection of pa.RecordBatch instead of
a PCollection of pa.Table.
tfdv.validate_instances and tfdv.api.validation_api.IdentifyAnomalousExamples now takes pa.RecordBatch as input instead of pa.Table.
The StatsGenerator interface (and all its sub-classes) now takes pa.RecordBatch as the input data instead of pa.Table.
Custom slicing functions now accepts a pa.RecordBatch instead of pa.Table as input and should output a tuple (slice_key, record_batch).