Skip to content

Version 0.22.0

Compare
Choose a tag to compare
@dhruvesh09 dhruvesh09 released this 15 May 23:36
9e29f13

Major Features and Improvements

Bug Fixes and Other Changes

  • Crop values in natural language stats generator.
  • Switch to using PyBind11 instead of SWIG for wrapping C++ libraries.
  • CSV decoder support for multivalent columns by using tfx_bsl's decoder.
  • When inferring a schema entry for a feature, do not add a shape with dim = 0
    when min_num_values = 0.
  • Add utility methods tfdv.get_slice_stats to get statistics for a slice and
    tfdv.compare_slices to compare statistics of two slices using Facets.
  • Make tfdv.load_stats_text and tfdv.write_stats_text public.
  • Add PTransforms tfdv.WriteStatisticsToText and
    tfdv.WriteStatisticsToTFRecord to write statistics proto to text and
    tfrecord files respectively.
  • Modify tfdv.load_statistics to handle reading statistics from TFRecord and
    text files.
  • Added an extra requirement group mutual-information. As a result, barebone
    TFDV does not require scikit-learn any more.
  • Added an extra requirement group visualization. As a result, barebone TFDV
    does not require ipython any more.
  • Added an extra requirement group all that specifies all the extra
    dependencies TFDV needs. Use pip install tensorflow-data-validation[all]
    to pull in those dependencies.
  • Depends on pyarrow>=0.16,<0.17.
  • Depends on apache-beam[gcp]>=2.20,<3.
  • Depends on `ipython>=7,<8;python_version>="3"'.
  • Depends on `scikit-learn>=0.18,<0.24'.
  • Depends on tensorflow>=1.15,!=2.0.*,<3.
  • Depends on tensorflow-metadata>=0.22.0,<0.23.
  • Depends on tensorflow-transform>=0.22,<0.23.
  • Depends on tfx-bsl>=0.22,<0.23.

Known Issues

  • (Known issue resolution) It is no longer necessary to use Apache Beam 2.17
    when running TFDV on Windows. The current release of Apache Beam will work.

Breaking Changes

  • tfdv.GenerateStatistics now accepts a PCollection of pa.RecordBatch
    instead of pa.Table.
  • All the TFDV coders now output a PCollection of pa.RecordBatch instead of
    a PCollection of pa.Table.
  • tfdv.validate_instances and
    tfdv.api.validation_api.IdentifyAnomalousExamples now takes
    pa.RecordBatch as input instead of pa.Table.
  • The StatsGenerator interface (and all its sub-classes) now takes
    pa.RecordBatch as the input data instead of pa.Table.
  • Custom slicing functions now accepts a pa.RecordBatch instead of
    pa.Table as input and should output a tuple (slice_key, record_batch).

Deprecations

  • Deprecating Py2 support.