Skip to content

ElasticDL Serving Solution Explore

Mingliang edited this page Oct 16, 2019 · 37 revisions

ElasticDL Serving Solution Explore

Motivation

Besides training, model serving is a very important part in the end-to-end machine learning lifecycle. Publishing the trained model as a service in production can make the model valuable in the real world.

At the current stage, ElasticDL focuses on the training part. After the training job completes, we don't have our own or can't any reuse existed serving infrastructure to serve our models. Our target is to figure out the serving solution.

Direction

  • Store the ElasticDL model in the SavedModel format.
    SavedModel is the universal serialization format for tensorflow models. It's language neutral and can be loaded by multiple framework (such as TFServing, TFLite, TensorFlow.js and so on). We choose to store the ElaticDL model into SavedModel format. In this way, we can leverage the various mature solution to serving our model.

Challenges

  • How to save the model trained with parameter server as SavedModel?
    For the model of large size, ElasticDL uses Redis to store the Embedding vector. And in the next step, we are planning to design parameter server to restore the variables and embeddings. In our model definition, we use ElasticDL.Embedding instead of tf.keras.layers.Embedding to interact with our parameter server. ElasticDL.Embedding use tf.py_function to invoke Rpc to call the parameter server.
    But in the stage of saving model, the customized ElasticDL.Embedding layer is not mapped to any native TensorFlow op and can't be saved into SavedModel. The embedding vectors stored in parameter server are lost. The embedding look up can't work in the serving process.

Ideas and Experiments

Clone this wiki locally