ElasticDL Serving Solution Explore

Motivation

Besides training, model serving is a very important part in the end-to-end machine learning lifecycle. Publishing the trained model as a service in production can make the model valuable in the real world.

At the current stage, ElasticDL focuses on the training part. We don't have our own or can't any reuse existed serving infrastructure to serve our trained models. Our target is to figure out the serving solution.

Direction

	Master Storage	AllReduce	Parameter Server
Small or Medium Size Model	SavedModel	SavedModel	SavedModel
Large Size Model	N/A	N/A	Distributed Parameter Serving

Store the ElasticDL model in the SavedModel format.
SavedModel is the universal serialization format for tensorflow models. It's language neutral and can be loaded by multiple frameworks (such as TFServing, TFLite, TensorFlow.js and so on). We choose to store the ElaticDL model into SavedModel format. In this way, we can leverage various mature solutions to serving our model in different scenarios.

Challenges

How to save the model trained with parameter server as SavedModel?
For the model of large size, we are designing parameter server to restore the variables and embeddings. Currently we use Redis as a temporary solution. In our model definition, we use ElasticDL.Embedding instead of tf.keras.layers.Embedding to interact with our parameter server. ElasticDL.Embedding use tf.py_function to invoke Rpc to call the parameter server.
But in the stage of saving model, the customized ElasticDL.Embedding layer is not mapped to any native TensorFlow op and can't be saved into SavedModel. The embedding vectors stored in parameter server are lost. The embedding look up can't work in the serving process.

Ideas and Experiments

1. Custom an embedding layer to train with ElasticDL.Embedding and export model using SavedModel format with Keras.Layers.Embedding.

The custom embedding layer contains ElasticDL.Embedding instance and Keras.Layers.Embedding instance. It selects which embedding instance to use in call API by environment variable FRAMEWORK and SAVED_MODEL. In ElasticDL, the FRAMEWORK is set to "ElasticDL" and the SAVE_MODEL decides which layer is used. During training with ElasticDL, the SAVED_MODEL is set to "False". The custom layer uses Elastic.Embedding instance to call. After training, the SAVED_MODEL is set to True. Then, the Keras.Layers.Embedding instance will be used When tf.saved_model.save call the custom layer to export SavedModel. Meanwhile, the variables in Keras.Layers.Embedding instance will be replaced by the ElasticDL.Embedding. If not in ElasticDL, the FRAMEWORK is not set and the custom layer user Keras.Embedding to train and export model.

To verify the feasibility, we define a custom layer like:

import os
import tensorflow as tf
import numpy as np

from tensorflow.keras.layers import Input, Embedding, Dense, Flatten
from elasticdl.python.elasticdl.layers.embedding import Embedding as elasticDL_Embedding
from tensorflow.keras import layers
from tensorflow.python.keras.utils import tf_utils

class TestCustomEmbedding(layers.Layer):
    def __init__(self,
                 input_dim,
                 output_dim,
                 **kwargs
                ):
        super(TestCustomEmbedding, self).__init__(**kwargs)
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.edl_embedding_layer = elasticDL_Embedding(self.output_dim)
        self.keras_embedding_layer = Embedding(self.input_dim, self.output_dim)
    
    def call(self, inputs):
        is_exporting_saved_model = s.getenv('SAVED_MODEL')=='True'
        is_elastic = os.getenv('FRAMEWORK') == 'ElasticDL'
            
        def _true_fn(inputs):
            _replace_weights_with_edl()
            out = self.keras_embedding_layer(inputs)
            return out
        
        def _false_fn(inputs):
            if is_elastic:
                out = self.edl_embedding_layer(inputs)
            else:
                out = self.keras_embedding_layer(inputs)
            return out
        
        def _replace_weights_with_edl():
            import pandas as pd
            var_values = pd.read_csv('variable.csv')
            custom_param = var_values.values
            for var in self.keras_embedding_layer.trainable_variables:
                var.assign(custom_param)
        
        return tf_utils.smart_cond(is_exporting_saved_model,
                                   lambda: _true_fn(inputs),
                                   lambda: _false_fn(inputs)
                                  )

In the TestCustomEmbedding, the variables in Keras.Embedding instance will be replaced by values in variable.csv. The variable.csv mocks the variable values in ElasticDl.Embedding instance which can be gotten by grpc.

Then, we will define a Keras model with TestCustomEmbedding like as below:

inputs = Input(shape=(10,))
embedding = TestCustomEmbedding(10,4)(inputs)
flatten = Flatten()(embedding)
output = Dense(1, activation='sigmoid')(flatten)
model = tf.keras.Model(inputs=[inputs], outputs=[output])

os.environ['SAVED_MODEL'] = 'False'
input_array = tf.constant([[1,2,3,4,1,1,1,1,1,0]])
output = model.call(input_array, training=True)
print('training output : ', output)
output = model.call(input_array)
print('predict output : ',output)

The output

training output :  tf.Tensor([[0.48767245]], shape=(1, 1), dtype=float32)
predict output :  tf.Tensor([[0.48767245]], shape=(1, 1), dtype=float32)

The we set SAVE_MODEL to True and view the model output with the same input.

# save model in saved_model
tf.saved_model.save(model, "./tmp/custom_embedding/123")
os.environ['SAVED_MODEL'] = 'True'
output = model.call(input_array)
print('predict output in saved_model : ', output)

The output

predict output in saved_model :  tf.Tensor([[0.99985003]], shape=(1, 1), dtype=float32)

The, we publish a service with the SavedModel by by tf-serving. Then request the server with the same input values.

curl -d '{"instances": [[1,2,3,4,1,1,1,1,1,0]]}' -X POST http://localhost:8501/v1/models/model:predict

The response

{
    "predictions": [[0.999850035]]
}

So, we have verified that the custom layer can use ElasticDL.Embedding during training and use Keras.Embedding with variables in ElasticDL.Embedding to save model with SavedModel format.

Open Question

Is the following scenario possible? User writes tf.keras.layer.Embedding in the model definition. While running the model in ElasticDL, if PS is turned on, the keras native Embedding layer is replaced with ElasticDL.Embedding layer to interact with parameter server. In this way, user can write the model using TensorFlow native Api, but can execute in distributed way in ElasticDL. It's more user friendly.

ElasticDL Serving Solution Explore

ElasticDL Serving Solution Explore

Motivation

Direction

Challenges

Ideas and Experiments

Open Question

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally