Releases
v0.12.0
paulgc
released this
22 Feb 02:14
Major Features and Improvements
Add support for computing statistics over slices of data.
Performance improvement due to optimizing inner loops.
Add support for generating statistics from a pandas dataframe.
Performance improvement due to pre-allocating tf.Example in
TFExampleDecoder.
Performance improvement due to merging common stats generator, numeric stats
generator and string stats generator as a single basic stats generator.
Performance improvement due to merging top-k and uniques generators.
Add a validate_instance
function, which checks a single example for
anomalies.
Add a utility method get_statistics_html
, which returns HTML that can be
used for Facets visualization outside of a notebook.
Add support for schema inference of semantic domains.
Performance improvement on statistics computation over a pandas dataframe.
Bug Fixes and Other Changes
Use constant 'BYTES_VALUE ' in the statistics proto to represent a bytes
value which cannot be decoded as a utf-8 string.
Introduced CombinerFeatureStatsGenerator, a specialized interface for
combiners that do not require cross-feature computations.
Expand unit test coverage.
Add optional frequency threshold that allows keeping only the most frequent
values that are present in a minimum number of examples.
Add optional desired batch size that allows specification of the number of
examples to include in each batch.
Depends on numpy>=1.14.5,<2
.
Depends on protobuf>=3.6.1,<4
.
Depends on apache-beam[gcp]>=2.10,<3
.
Depends on tensorflow-metadata>=0.12.1,<0.13
.
Depends on scikit-learn>=0.18,<1
.
Depends on IPython>=5.0
.
Requires pre-installed tensorflow>=1.12,<2
.
Revise example notebook and update it to be able to run in Colab and Jupyter.
Breaking Changes
Represent batch as a list of ndarrays instead of ndarrays of ndarrays.
Modify decoders to return ndarrays of type numpy.float32 for FLOAT features.
You can’t perform that action at this time.