-
Notifications
You must be signed in to change notification settings - Fork 115
ElasticDL code overall review discussion
Bright edited this page Sep 25, 2019
·
11 revisions
- It's a common scenario that the task list contains some successive training tasks and then some successive evaluation tasks are following them. The dataset is using prefetch function. While the worker is handling the last train task in the sublist, the prefetch action will pull all the successive evaluation task into this worker.
- How to do early stop? Early stop need the training and evaluation metrics to make the decision.
- Refactor the evaluation process.
- Each time executing the ElasticDL command, the client will build a new image. Support reusing the existed image.
- Support fail over of the EmbeddingService Redis cluster. At the present, it's single point.