Skip to content

Add README for RL directory; Typo fix in network compression README file #940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions reinforcement_learning/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
# Common Reinforcement Learning Examples
# Amazon SageMaker Examples

These examples demonstrate how to train reinforcement learning models on SageMaker.
### Common Reinforcement Learning Examples

## FAQ
These examples demonstrate how to train reinforcement learning models on SageMaker for a wide range of applications.

- [Contextual Bandit with Live Environment](bandits_statlog_vw_customEnv) illustrates how you can manage your own contextual multi-armed bandit workflow on SageMaker using the built-in [Vowpal Wabbit](https://github.com/VowpalWabbit/vowpal_wabbit) (VW) container to train and deploy contextual bandit models.
- [Cartpole](rl_cartpole_coach) uses SageMaker RL base [docker image](https://github.com/aws/sagemaker-rl-container) to balance a broom upright.
- [Cartpole Batch](rl_cartpole_batch_coach) uses batch RL techniques to train Cartpole with offline data.
- [Cartpole Spot Training](rl_managed_spot_cartpole_coach) uses SageMaker Managed Spot instances at a lower cost.
- [DeepRacer](rl_deepracer_robomaker_coach_gazebo) gives a glimse of architecture used to get the DeepRacer working with AWS RoboMaker.
- [HVAC](rl_hvac_coach_energyplus) optimizes energy use based on the [EnergyPlus](https://energyplus.net/) simulator.
- [Knapsack](rl_knapsack_coach_custom) is an example of using RL to address operations research problem.
- [Mountain Car](rl_mountain_car_coach_gymEnv) is a classic control RL problem, in which an under-powered car is tasked with climbing a steep mountain, and is only successful when it reaches the top.
- [Network Compression](rl_network_compression_ray_custom) reduces the size of a trained network using a RL algorithm.
- [Object Tracker](rl_objecttracker_robomaker_coach_gazebo) trains a TurtleBot object tracker using Amazon SageMaker RL coupled with AWS RoboMaker.
- [Portfolio Management](rl_portfolio_management_coach_customEnv) shows how to re-distribute a capital into a set of different financial assets using RL algorithms.
- [Predictive Auto-scaling](rl_predictive_autoscaling_coach_customEnv) scales a production service via RL approach by adding and removing resources in reaction to dynamically changing load.
- [Resource Allocation](rl_resource_allocation_ray_customEnv) solves three canonical online and stochastic decision making problems using RL algorithms.
- [Roboschool Ray](rl_roboschool_ray) demonstrates how to use [Ray](https://rise.cs.berkeley.edu/projects/ray/) to scale RL training in different ways, and how to leverage SageMaker's Automatic Model Tuning functionality to optimize the training of an RL model.
- [Roboschool Stable Baseline](rl_roboschool_stable_baselines) is an example of using [stable-baselines](https://stable-baselines.readthedocs.io/en/master/) to train RL algorithms.
- [Tic-tac-toe](rl_tic_tac_toe_coach_customEnv) uses RL to train a policy and then plays locally and interactively within the notebook.
- [Traveling Salesman and Vehicle Routing](rl_traveling_salesman_vehicle_routing_coach) is an example of using RL to address operations research problems.

### FAQ
https://github.com/awslabs/amazon-sagemaker-examples#faq
4 changes: 1 addition & 3 deletions reinforcement_learning/rl_cartpole_batch_coach/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Training Batch Reinforcement Learning Policies with Amazon SageMaker RL

In many real-world problems, the reinforcement learning agent cannot interact with neither the real environment nor a simulated one. On one hand, creating a simulator that imitates the real environment dynamic could be quite complex and on the other, letting the learning agent attempt sub-optimal actions in the real world is quite risky. In such cases, the learning agent can only have access to batches of offline data that generated by some deployed policy. The learning agent need to utilize these data correctly to learn a better policy to solve the problem.

This notebook shows an example of how to use batch reinforcement learning techniques to address such type of real-world problems: training a new policy from offline dataset when there is no way to interact with real environments or simulators. This example is a simple toy demonstrating how one might begin to address this real and challenging problem. We use gym `CartPole-v0` as a fake simulated system to generate offline dataset and the RL agents are trained using Amazon SageMaker RL.
For many real-world problems, the reinforcement learning (RL) agent needs to learn from historical data that was generated by some deployed policy. For example, we may have historical data of experts playing games, users interacting with a website or sensor data from a control system. This notebook shows an example of how to use batch RL to train a new policy from offline dataset. We use gym `CartPole-v0` as a fake simulated system to generate offline dataset and the RL agents are trained using Amazon SageMaker RL.

## Contents

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

## What is network compression?

Network compression is the process of reducing the size of a trained network, either by removing certain layers or by shrinking layers, while maintaining performance. This notebook implements the a version of network compression using reinforcement learning algorithm similar to the one proposed in [1].
Network compression is the process of reducing the size of a trained network, either by removing certain layers or by shrinking layers, while maintaining performance. This notebook implements a version of network compression using reinforcement learning algorithm similar to the one proposed in [1].

[1] [Ashok, Anubhav, Nicholas Rhinehart, Fares Beainy, and Kris M. Kitani. "N2N learning: network to network compression via policy gradient reinforcement learning." arXiv preprint arXiv:1709.06030 (2017)]([https://arxiv.org/abs/1709.06030]).

## This Example

In this example the network compression notebook uses a Sagemaker docker image containing Ray, tensorflow and OpenAI Gym. The network modification module is
In this example the network compression notebook uses a Sagemaker docker image containing Ray, TensorFlow and OpenAI Gym. The network modification module is
treated as a simulation where the actions produced by reinforcement learning algorithm (remove, shrink, etc.) can be run. The notebook has defined a set of actions for each module. It
demonstrates how one can use the SageMaker Python SDK `script` mode with a `Tensorflow+Ray+Gym` container. You can run
`rl_network_compression_a3c_ray_tensorflow_NetworkCompressionEnv.ipynb` from a SageMaker notebook instance.
Expand Down