Release 0.12.0

paulgc released this 22 Feb 02:14

· 897 commits to master since this release

06e30e3

Major Features and Improvements

Add support for computing statistics over slices of data.
Performance improvement due to optimizing inner loops.
Add support for generating statistics from a pandas dataframe.
Performance improvement due to pre-allocating tf.Example in
TFExampleDecoder.
Performance improvement due to merging common stats generator, numeric stats
generator and string stats generator as a single basic stats generator.
Performance improvement due to merging top-k and uniques generators.
Add a validate_instance function, which checks a single example for
anomalies.
Add a utility method get_statistics_html, which returns HTML that can be
used for Facets visualization outside of a notebook.
Add support for schema inference of semantic domains.
Performance improvement on statistics computation over a pandas dataframe.

Bug Fixes and Other Changes

Use constant 'BYTES_VALUE' in the statistics proto to represent a bytes
value which cannot be decoded as a utf-8 string.
Introduced CombinerFeatureStatsGenerator, a specialized interface for
combiners that do not require cross-feature computations.
Expand unit test coverage.
Add optional frequency threshold that allows keeping only the most frequent
values that are present in a minimum number of examples.
Add optional desired batch size that allows specification of the number of
examples to include in each batch.
Depends on numpy>=1.14.5,<2.
Depends on protobuf>=3.6.1,<4.
Depends on apache-beam[gcp]>=2.10,<3.
Depends on tensorflow-metadata>=0.12.1,<0.13.
Depends on scikit-learn>=0.18,<1.
Depends on IPython>=5.0.
Requires pre-installed tensorflow>=1.12,<2.
Revise example notebook and update it to be able to run in Colab and Jupyter.

Breaking Changes

Represent batch as a list of ndarrays instead of ndarrays of ndarrays.
Modify decoders to return ndarrays of type numpy.float32 for FLOAT features.

Assets 5