Skip to content

Release 0.12.0

Compare
Choose a tag to compare
@paulgc paulgc released this 22 Feb 02:14
· 897 commits to master since this release

Major Features and Improvements

  • Add support for computing statistics over slices of data.
  • Performance improvement due to optimizing inner loops.
  • Add support for generating statistics from a pandas dataframe.
  • Performance improvement due to pre-allocating tf.Example in
    TFExampleDecoder.
  • Performance improvement due to merging common stats generator, numeric stats
    generator and string stats generator as a single basic stats generator.
  • Performance improvement due to merging top-k and uniques generators.
  • Add a validate_instance function, which checks a single example for
    anomalies.
  • Add a utility method get_statistics_html, which returns HTML that can be
    used for Facets visualization outside of a notebook.
  • Add support for schema inference of semantic domains.
  • Performance improvement on statistics computation over a pandas dataframe.

Bug Fixes and Other Changes

  • Use constant 'BYTES_VALUE' in the statistics proto to represent a bytes
    value which cannot be decoded as a utf-8 string.
  • Introduced CombinerFeatureStatsGenerator, a specialized interface for
    combiners that do not require cross-feature computations.
  • Expand unit test coverage.
  • Add optional frequency threshold that allows keeping only the most frequent
    values that are present in a minimum number of examples.
  • Add optional desired batch size that allows specification of the number of
    examples to include in each batch.
  • Depends on numpy>=1.14.5,<2.
  • Depends on protobuf>=3.6.1,<4.
  • Depends on apache-beam[gcp]>=2.10,<3.
  • Depends on tensorflow-metadata>=0.12.1,<0.13.
  • Depends on scikit-learn>=0.18,<1.
  • Depends on IPython>=5.0.
  • Requires pre-installed tensorflow>=1.12,<2.
  • Revise example notebook and update it to be able to run in Colab and Jupyter.

Breaking Changes

  • Represent batch as a list of ndarrays instead of ndarrays of ndarrays.
  • Modify decoders to return ndarrays of type numpy.float32 for FLOAT features.