Skip to content

Commit 746be85

Browse files
authored
Tutorial (#1599)
1 parent c45a888 commit 746be85

File tree

5 files changed

+141
-180
lines changed

5 files changed

+141
-180
lines changed

examples/pytorch/text-generator/README.md

Lines changed: 61 additions & 166 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@ _WARNING: you are on the master branch; please refer to examples on the branch c
44

55
This example shows how to deploy a realtime text generation API using a GPT-2 model from Hugging Face's transformers library.
66

7-
<br>
8-
9-
## Implement your predictor
7+
## Implement your Predictor
108

119
1. Create a Python file named `predictor.py`.
1210
2. Define a Predictor class with a constructor that loads and initializes the model.
@@ -22,7 +20,6 @@ from transformers import GPT2Tokenizer, GPT2LMHeadModel
2220
class PythonPredictor:
2321
def __init__(self, config):
2422
self.device = "cuda" if torch.cuda.is_available() else "cpu"
25-
print(f"using device: {self.device}")
2623
self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
2724
self.model = GPT2LMHeadModel.from_pretrained("gpt2").to(self.device)
2825

@@ -33,11 +30,7 @@ class PythonPredictor:
3330
return self.tokenizer.decode(prediction[0])
3431
```
3532

36-
Here are the complete [Predictor docs](../../../docs/deployments/realtime-api/predictors.md).
37-
38-
<br>
39-
40-
## Specify your Python dependencies
33+
## Specify Python dependencies
4134

4235
Create a `requirements.txt` file to specify the dependencies needed by `predictor.py`. Cortex will automatically install them into your runtime once you deploy:
4336

@@ -48,36 +41,39 @@ torch
4841
transformers==3.0.*
4942
```
5043

51-
<br>
52-
53-
## Configure your API
54-
55-
Create a `cortex.yaml` file and add the configuration below. A `RealtimeAPI` provides a runtime for inference and makes your `predictor.py` implementation available as a web service that can serve realtime predictions:
44+
## Deploy your model locally
5645

57-
```yaml
58-
# cortex.yaml
46+
You can create APIs from any Python runtime that has access to Docker (e.g. the Python shell or a Jupyter notebook):
5947

60-
- name: text-generator
61-
kind: RealtimeAPI
62-
predictor:
63-
type: python
64-
path: predictor.py
65-
```
48+
```python
49+
import cortex
6650

67-
Here are the complete [API configuration docs](../../../docs/deployments/realtime-api/api-configuration.md).
51+
cx_local = cortex.client("local")
6852

69-
<br>
53+
api_spec = {
54+
"name": "text-generator",
55+
"kind": "RealtimeAPI",
56+
"predictor": {
57+
"type": "python",
58+
"path": "predictor.py"
59+
}
60+
}
7061

71-
## Deploy your model locally
62+
cx_local.deploy(api_spec, project_dir=".", wait=True)
63+
```
7264

73-
`cortex deploy` takes your Predictor implementation along with the configuration from `cortex.yaml` and creates a web API:
65+
## Consume your API
7466

75-
```bash
76-
$ cortex deploy
67+
```python
68+
import requests
7769

78-
creating text-generator (RealtimeAPI)
70+
endpoint = cx_local.get_api("text-generator")["endpoint"]
71+
payload = {"text": "hello world"}
72+
print(requests.post(endpoint, payload).text)
7973
```
8074

75+
## Manage your APIs using the CLI
76+
8177
Monitor the status of your API using `cortex get`:
8278

8379
```bash
@@ -91,11 +87,11 @@ Show additional information for your API (e.g. its endpoint) using `cortex get <
9187

9288
```bash
9389
$ cortex get text-generator
90+
9491
status last update avg request 2XX
9592
live 1m - -
9693

9794
endpoint: http://localhost:8889
98-
...
9995
```
10096

10197
You can also stream logs from your API:
@@ -106,18 +102,6 @@ $ cortex logs text-generator
106102
...
107103
```
108104

109-
Once your API is live, use `curl` to test your API (it will take a few seconds to generate the text):
110-
111-
```bash
112-
$ curl http://localhost:8889 \
113-
-X POST -H "Content-Type: application/json" \
114-
-d '{"text": "machine learning is"}'
115-
116-
"machine learning is ..."
117-
```
118-
119-
<br>
120-
121105
## Deploy your model to AWS
122106

123107
Cortex can automatically provision infrastructure on your AWS account and deploy your models as production-ready web services:
@@ -126,19 +110,26 @@ Cortex can automatically provision infrastructure on your AWS account and deploy
126110
$ cortex cluster up
127111
```
128112

129-
This creates a Cortex cluster in your AWS account, which will take approximately 15 minutes.
113+
This creates a Cortex cluster in your AWS account, which will take approximately 15 minutes. After your cluster is created, you can deploy to your cluster by using the same code and configuration as before:
130114

131-
After your cluster is created, you can deploy your model to your cluster by using the same code and configuration as before:
115+
```python
116+
import cortex
132117

133-
```bash
134-
$ cortex deploy --env aws
118+
cx_aws = cortex.client("aws")
135119

136-
creating text-generator (RealtimeAPI)
137-
```
120+
api_spec = {
121+
"name": "text-generator",
122+
"kind": "RealtimeAPI",
123+
"predictor": {
124+
"type": "python",
125+
"path": "predictor.py"
126+
}
127+
}
138128

139-
_Note that the `--env` flag specifies the name of the CLI environment to use. [CLI environments](../../../docs/miscellaneous/environments.md) contain the information necessary to connect to your cluster. The default environment is `local`, and when the cluster was created, a new environment named `aws` was created to point to the cluster. You can change the default environment with `cortex env default <env_name`)._
129+
cx_aws.deploy(api_spec, project_dir=".")
130+
```
140131

141-
Monitor the status of your APIs using `cortex get`:
132+
Monitor the status of your APIs using `cortex get` using your CLI:
142133

143134
```bash
144135
$ cortex get --watch
@@ -156,62 +147,32 @@ Show additional information for your API (e.g. its endpoint) using `cortex get <
156147
$ cortex get text-generator --env aws
157148

158149
status up-to-date requested last update avg request 2XX
159-
live 1 1 17m - -
150+
live 1 1 1m - -
160151

161-
metrics dashboard: https://us-west-2.console.aws.amazon.com/cloudwatch/home#dashboards:name=cortex
162152
endpoint: https://***.execute-api.us-west-2.amazonaws.com/text-generator
163-
...
164153
```
165154

166-
Use your new endpoint to make requests to your API on AWS:
167-
168-
```bash
169-
$ curl https://***.execute-api.us-west-2.amazonaws.com/text-generator \
170-
-X POST -H "Content-Type: application/json" \
171-
-d '{"text": "machine learning is"}'
172-
173-
"machine learning is ..."
174-
```
175-
176-
<br>
177-
178-
## Perform a rolling update
179-
180-
When you make a change to your `predictor.py` or your `cortex.yaml`, you can update your api by re-running `cortex deploy`.
155+
## Run on GPUs
181156

182-
Let's modify `predictor.py` to set the length of the generated text based on a query parameter:
157+
If your cortex cluster is using GPU instances (configured during cluster creation) or if you are running locally with an nvidia GPU, you can run your text generator API on GPUs. Add the `compute` field to your API configuration and re-deploy:
183158

184159
```python
185-
# predictor.py
186-
187-
import torch
188-
from transformers import GPT2Tokenizer, GPT2LMHeadModel
189-
160+
api_spec = {
161+
"name": "text-generator",
162+
"kind": "RealtimeAPI",
163+
"predictor": {
164+
"type": "python",
165+
"path": "predictor.py"
166+
},
167+
"compute": {
168+
"gpu": 1
169+
}
170+
}
190171

191-
class PythonPredictor:
192-
def __init__(self, config):
193-
self.device = "cuda" if torch.cuda.is_available() else "cpu"
194-
print(f"using device: {self.device}")
195-
self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
196-
self.model = GPT2LMHeadModel.from_pretrained("gpt2").to(self.device)
197-
198-
def predict(self, payload, query_params): # this line is updated
199-
input_length = len(payload["text"].split())
200-
output_length = int(query_params.get("length", 20)) # this line is added
201-
tokens = self.tokenizer.encode(payload["text"], return_tensors="pt").to(self.device)
202-
prediction = self.model.generate(tokens, max_length=input_length + output_length, do_sample=True) # this line is updated
203-
return self.tokenizer.decode(prediction[0])
204-
```
205-
206-
Run `cortex deploy` to perform a rolling update of your API:
207-
208-
```bash
209-
$ cortex deploy --env aws
210-
211-
updating text-generator (RealtimeAPI)
172+
cx_aws.deploy(api_spec, project_dir=".")
212173
```
213174

214-
You can track the status of your API using `cortex get`:
175+
As your new API is initializing, the old API will continue to respond to prediction requests. Once the API's status becomes "live" (with one up-to-date replica), traffic will be routed to the updated version. You can track the status of your API using `cortex get`:
215176

216177
```bash
217178
$ cortex get --env aws --watch
@@ -220,78 +181,12 @@ realtime api status up-to-date stale requested last update avg r
220181
text-generator updating 0 1 1 29s - -
221182
```
222183

223-
As your new implementation is initializing, the old implementation will continue to be used to respond to prediction requests. Eventually the API's status will become "live" (with one up-to-date replica), and traffic will be routed to the updated version.
224-
225-
Try your new code:
226-
227-
```bash
228-
$ curl https://***.execute-api.us-west-2.amazonaws.com/text-generator?length=30 \
229-
-X POST -H "Content-Type: application/json" \
230-
-d '{"text": "machine learning is"}'
231-
232-
"machine learning is ..."
233-
```
234-
235-
<br>
236-
237-
## Run on GPUs
238-
239-
If your cortex cluster is using GPU instances (configured during cluster creation), you can run your text generator API on GPUs. Add the `compute` field to your API configuration:
240-
241-
```yaml
242-
# cortex.yaml
243-
244-
- name: text-generator
245-
kind: RealtimeAPI
246-
predictor:
247-
type: python
248-
path: predictor.py
249-
compute:
250-
gpu: 1
251-
```
252-
253-
Run `cortex deploy` to update your API with this configuration:
254-
255-
```bash
256-
$ cortex deploy --env aws
257-
258-
updating text-generator (RealtimeAPI)
259-
```
260-
261-
You can use `cortex get` to check the status of your API, and once it's live, prediction requests should be faster.
262-
263-
### A note about rolling updates in dev environments
264-
265-
In development environments, you may wish to disable rolling updates since rolling updates require additional cluster resources. For example, a rolling update of a GPU-based API will require at least two GPUs, which can require a new instance to spin up if your cluster only has one instance. To disable rolling updates, set `max_surge` to 0 in the `update_strategy` configuration:
266-
267-
```yaml
268-
# cortex.yaml
269-
270-
- name: text-generator
271-
kind: RealtimeAPI
272-
predictor:
273-
type: python
274-
path: predictor.py
275-
compute:
276-
gpu: 1
277-
update_strategy:
278-
max_surge: 0
279-
```
280-
281-
<br>
282-
283184
## Cleanup
284185

285-
Run `cortex delete` to delete each API:
186+
Deleting APIs will free up cluster resources and allow Cortex to scale down to the minimum number of instances you specified during cluster creation:
286187

287-
```bash
288-
$ cortex delete text-generator --env local
289-
290-
deleting text-generator
291-
292-
$ cortex delete text-generator --env aws
188+
```python
189+
cx_local.delete_api("text-generator")
293190

294-
deleting text-generator
191+
cx_aws.delete_api("text-generator")
295192
```
296-
297-
Running `cortex delete` will free up cluster resources and allow Cortex to scale down to the minimum number of instances you specified during cluster creation. It will not spin down your cluster.

examples/pytorch/text-generator/cortex.yaml

Lines changed: 0 additions & 10 deletions
This file was deleted.

0 commit comments

Comments
 (0)