Skip to content

Data Transform Process

brightcoder01 edited this page Jan 13, 2020 · 45 revisions

Data Transform Process

Normalize Table Schema: Wide Table

Transform the table schema to be wide (aka. one table column is one feature) if the original table schema is not. We implement it using a batch processing job such as a MaxCompute job.

Do Statistics Using SQL

Calculate the statistical value for the following transform code_gen.

Generate the Code of Data Transform Stage From SQLFlow

We can use keras layer + feature column to do the data transformation. Please look at the Google Cloud Sample.

Feature Transform Library Based on TensorFlow OP

Build the common transform function set using TensorFlow. It can be fed into tf.keras.layers.Lambda or normalizer_fn of numeric_column.
As the transform function set is built upon TensorFlow op, we can ensure the consistency between training and inference.

Transform Code Structure

Transform Work: tf.keras.layers.Lambda Multiple Column Transform: tf.keras.layers.Lambda Feature Column: Categorical Mapper Embedding: Dense Embedding Layer or embedding_column?

Clone this wiki locally