Add README for RL directory; Typo fix in network compression README file (#940)

annaluo676 · yijiezh · commit 70042b884f51 · 2019-12-02T12:15:48.000-08:00
* Add README for RL directory; Typo fix in network compression README file

* Modify README for batch example
diff --git a/reinforcement_learning/README.md b/reinforcement_learning/README.md
@@ -1,6 +1,26 @@
-# Common Reinforcement Learning Examples
+# Amazon SageMaker Examples
 
-These examples demonstrate how to train reinforcement learning models on SageMaker.
+### Common Reinforcement Learning Examples
 
-## FAQ
+These examples demonstrate how to train reinforcement learning models on SageMaker for a wide range of applications.
+
+-  [Contextual Bandit with Live Environment](bandits_statlog_vw_customEnv) illustrates how you can manage your own contextual multi-armed bandit workflow on SageMaker using the built-in [Vowpal Wabbit](https://github.com/VowpalWabbit/vowpal_wabbit) (VW) container to train and deploy contextual bandit models.
+-  [Cartpole](rl_cartpole_coach) uses SageMaker RL base [docker image](https://github.com/aws/sagemaker-rl-container) to balance a broom upright.
+-  [Cartpole Batch](rl_cartpole_batch_coach) uses batch RL techniques to train Cartpole with offline data.
+-  [Cartpole Spot Training](rl_managed_spot_cartpole_coach) uses SageMaker Managed Spot instances at a lower cost.
+-  [DeepRacer](rl_deepracer_robomaker_coach_gazebo) gives a glimse of architecture used to get the DeepRacer working with AWS RoboMaker.
+-  [HVAC](rl_hvac_coach_energyplus) optimizes energy use based on the [EnergyPlus](https://energyplus.net/) simulator.
+-  [Knapsack](rl_knapsack_coach_custom) is an example of using RL to address operations research problem.
+-  [Mountain Car](rl_mountain_car_coach_gymEnv) is a classic control RL problem, in which an under-powered car is tasked with climbing a steep mountain, and is only successful when it reaches the top.
+-  [Network Compression](rl_network_compression_ray_custom) reduces the size of a trained network using a RL algorithm.
+-  [Object Tracker](rl_objecttracker_robomaker_coach_gazebo) trains a TurtleBot object tracker using Amazon SageMaker RL coupled with AWS RoboMaker.
+-  [Portfolio Management](rl_portfolio_management_coach_customEnv) shows how to re-distribute a capital into a set of different financial assets using RL algorithms.
+-  [Predictive Auto-scaling](rl_predictive_autoscaling_coach_customEnv) scales a production service via RL approach by adding and removing resources in reaction to dynamically changing load.
+-  [Resource Allocation](rl_resource_allocation_ray_customEnv) solves three canonical online and stochastic decision making problems using RL algorithms.
+-  [Roboschool Ray](rl_roboschool_ray) demonstrates how to use [Ray](https://rise.cs.berkeley.edu/projects/ray/) to scale RL training in different ways, and how to leverage SageMaker's Automatic Model Tuning functionality to optimize the training of an RL model.
+-  [Roboschool Stable Baseline](rl_roboschool_stable_baselines) is an example of using [stable-baselines](https://stable-baselines.readthedocs.io/en/master/) to train RL algorithms.
+-  [Tic-tac-toe](rl_tic_tac_toe_coach_customEnv) uses RL to train a policy and then plays locally and interactively within the notebook.
+-  [Traveling Salesman and Vehicle Routing](rl_traveling_salesman_vehicle_routing_coach) is an example of using RL to address operations research problems.
+
+### FAQ
 https://github.com/awslabs/amazon-sagemaker-examples#faq 
diff --git a/reinforcement_learning/rl_cartpole_batch_coach/README.md b/reinforcement_learning/rl_cartpole_batch_coach/README.md
@@ -1,8 +1,6 @@
 # Training Batch Reinforcement Learning Policies with Amazon SageMaker RL
 
-In many real-world problems, the reinforcement learning agent cannot interact with neither the real environment nor a simulated one. On one hand, creating a simulator that imitates the real environment dynamic could be quite complex and on the other, letting the learning agent attempt sub-optimal actions in the real world is quite risky. In such cases, the learning agent can only have access to batches of offline data that generated by some deployed policy. The learning agent need to utilize these data correctly to learn a better policy to solve the problem.
-
-This notebook shows an example of how to use batch reinforcement learning techniques to address such type of real-world problems: training a new policy from offline dataset when there is no way to interact with real environments or simulators. This example is a simple toy demonstrating how one might begin to address this real and challenging problem. We use gym `CartPole-v0` as a fake simulated system to generate offline dataset and the RL agents are trained using Amazon SageMaker RL.
+For many real-world problems, the reinforcement learning (RL) agent needs to learn from historical data that was generated by some deployed policy. For example, we may have historical data of experts playing games, users interacting with a website or sensor data from a control system. This notebook shows an example of how to use batch RL to train a new policy from offline dataset. We use gym `CartPole-v0` as a fake simulated system to generate offline dataset and the RL agents are trained using Amazon SageMaker RL.
 
 ## Contents
 
diff --git a/reinforcement_learning/rl_network_compression_ray_custom/README.md b/reinforcement_learning/rl_network_compression_ray_custom/README.md
@@ -2,13 +2,13 @@
 
 ## What is network compression?
 
-Network compression is the process of reducing the size of a trained network, either by removing certain layers or by shrinking layers, while maintaining performance. This notebook implements the a version of network compression using reinforcement learning algorithm similar to the one proposed in [1].
+Network compression is the process of reducing the size of a trained network, either by removing certain layers or by shrinking layers, while maintaining performance. This notebook implements a version of network compression using reinforcement learning algorithm similar to the one proposed in [1].
 
 [1] [Ashok, Anubhav, Nicholas Rhinehart, Fares Beainy, and Kris M. Kitani. "N2N learning: network to network compression via policy gradient reinforcement learning." arXiv preprint arXiv:1709.06030 (2017)]([https://arxiv.org/abs/1709.06030]).
 
 ## This Example
 
-In this example the network compression notebook uses a Sagemaker docker image containing Ray, tensorflow and OpenAI Gym. The network modification module is
+In this example the network compression notebook uses a Sagemaker docker image containing Ray, TensorFlow and OpenAI Gym. The network modification module is
 treated as a simulation where the actions produced by reinforcement learning algorithm (remove, shrink, etc.) can be run. The notebook has defined a set of actions for each module. It
 demonstrates how one can use the SageMaker Python SDK `script` mode with a `Tensorflow+Ray+Gym` container. You can run
 `rl_network_compression_a3c_ray_tensorflow_NetworkCompressionEnv.ipynb` from a SageMaker notebook instance.