5
5
6
6
<!-- Delete on release branches -->
7
7
<!-- CORTEX_VERSION_README_MINOR -->
8
-
9
- [ install] ( https://docs.cortex.dev/install ) • [ documentation] ( https://docs.cortex.dev ) • [ examples] ( https://github.com/cortexlabs/cortex/tree/0.23/examples ) • [ support] ( https://gitter.im/cortexlabs/cortex )
8
+ [ install] ( https://docs.cortex.dev/install ) • [ documentation] ( https://docs.cortex.dev ) • [ examples] ( https://github.com/cortexlabs/cortex/tree/0.23/examples ) • [ community] ( https://gitter.im/cortexlabs/cortex )
10
9
11
10
# Deploy machine learning models to production
12
11
@@ -16,49 +15,45 @@ Cortex is an open source platform for deploying, managing, and scaling machine l
16
15
17
16
## Model serving infrastructure
18
17
19
- * Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs
20
- * Ensures high availability with availability zones and automated instance restarts
21
- * Scales to handle production workloads with request-based autoscaling
22
- * Runs inference on spot instances with on-demand backups
23
- * Manages traffic splitting for A/B testing
18
+ * Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs.
19
+ * Ensures high availability with availability zones and automated instance restarts.
20
+ * Runs inference on spot instances with on-demand backups.
21
+ * Autoscales to handle production workloads.
24
22
25
- #### Configure your cluster:
23
+ #### Configure Cortex
26
24
27
25
``` yaml
28
26
# cluster.yaml
29
27
30
28
region : us-east-1
31
- availability_zones : [us-east-1a, us-east-1b]
32
- api_gateway : public
33
29
instance_type : g4dn.xlarge
34
30
min_instances : 10
35
31
max_instances : 100
36
32
spot : true
37
33
` ` `
38
34
39
- #### Spin up your cluster on your AWS account:
35
+ #### Spin up Cortex on your AWS account
40
36
41
37
` ` ` text
42
38
$ cortex cluster up --config cluster.yaml
43
39
44
40
○ configuring autoscaling ✓
45
41
○ configuring networking ✓
46
42
○ configuring logging ✓
47
- ○ configuring metrics dashboard ✓
48
43
49
44
cortex is ready!
50
45
```
51
46
52
47
<br >
53
48
54
- ## Reproducible model deployments
49
+ ## Reproducible deployments
55
50
56
- * Implement request handling in Python
57
- * Customize compute, autoscaling, and networking for each API
58
- * Package dependencies, code, and configuration for reproducible deployments
59
- * Test locally before deploying to your cluster
51
+ * Package dependencies, code, and configuration for reproducible deployments.
52
+ * Configure compute, autoscaling, and networking for each API.
53
+ * Integrate with your data science platform or CI/CD system.
54
+ * Test locally before deploying to your cluster.
60
55
61
- #### Implement a predictor:
56
+ #### Implement a predictor
62
57
63
58
``` python
64
59
# predictor.py
@@ -73,70 +68,67 @@ class PythonPredictor:
73
68
return self .model(payload[" text" ])[0 ]
74
69
```
75
70
76
- #### Configure an API:
77
-
78
- ``` yaml
79
- # cortex.yaml
80
-
81
- name : text-generator
82
- kind : RealtimeAPI
83
- predictor :
84
- path : predictor.py
85
- compute :
86
- gpu : 1
87
- mem : 4Gi
88
- autoscaling :
89
- min_replicas : 1
90
- max_replicas : 10
91
- networking :
92
- api_gateway : public
93
- ` ` `
94
-
95
- #### Deploy to production:
96
-
97
- ` ` ` text
98
- $ cortex deploy cortex.yaml
99
-
100
- creating https://example.com/text-generator
71
+ #### Configure an API
101
72
102
- $ curl https://example.com/text-generator \
103
- -X POST -H "Content-Type : application/json" \
104
- -d '{"text" : " deploy machine learning models to" }'
105
-
106
- " deploy machine learning models to production"
73
+ ``` python
74
+ api_spec = {
75
+ " name" : " text-generator" ,
76
+ " kind" : " RealtimeAPI" ,
77
+ " predictor" : {
78
+ " type" : " python" ,
79
+ " path" : " predictor.py"
80
+ },
81
+ " compute" : {
82
+ " gpu" : 1 ,
83
+ " mem" : " 8Gi" ,
84
+ },
85
+ " autoscaling" : {
86
+ " min_replicas" : 1 ,
87
+ " max_replicas" : 10
88
+ },
89
+ " networking" : {
90
+ " api_gateway" : " public"
91
+ }
92
+ }
107
93
```
108
94
109
95
<br >
110
96
111
- ## API management
97
+ ## Scalable machine learning APIs
112
98
113
- * Monitor API performance
114
- * Aggregate and stream logs
115
- * Customize prediction tracking
116
- * Update APIs without downtime
99
+ * Scale to handle production workloads with request-based autoscaling.
100
+ * Stream performance metrics and logs to any monitoring tool.
101
+ * Serve many models efficiently with multi model caching.
102
+ * Configure traffic splitting for A/B testing.
103
+ * Update APIs without downtime.
117
104
118
- #### Manage your APIs:
105
+ #### Deploy to your cluster
119
106
120
- ``` text
121
- $ cortex get
107
+ ``` python
108
+ import cortex
122
109
123
- realtime api status replicas last update latency requests
110
+ cx = cortex.client(" aws" )
111
+ cx.deploy(api_spec, project_dir = " ." )
124
112
125
- text-generator live 34 9h 247ms 71828
126
- object-detector live 13 15h 23ms 828459
113
+ # creating https://example.com/ text-generator
114
+ ```
127
115
116
+ #### Consume your API
128
117
129
- batch api running jobs last update
118
+ ``` python
119
+ import requests
130
120
131
- image-classifier 5 10h
121
+ endpoint = " https://example.com/text-generator"
122
+ payload = {" text" : " hello world" }
123
+ prediction = requests.post(endpoint, payload)
132
124
```
133
125
134
126
<br >
135
127
136
128
## Get started
137
129
138
- ``` text
139
- $ pip install cortex
130
+ ``` bash
131
+ pip install cortex
140
132
```
141
133
142
134
See the [ installation guide] ( https://docs.cortex.dev/install ) for next steps.
0 commit comments