|
6 | 6 |
|
7 | 7 | <br>
|
8 | 8 |
|
9 |
| -# Model serving at scale |
| 9 | +# Deploy, manage, and scale machine learning models in production |
10 | 10 |
|
11 |
| -Cortex is a platform for deploying, managing, and scaling machine learning in production. |
| 11 | +Cortex is a cloud native model serving platform for machine learning engineering teams. |
12 | 12 |
|
13 | 13 | <br>
|
14 | 14 |
|
15 |
| -## Key features |
| 15 | +## Use cases |
16 | 16 |
|
17 |
| -* Run realtime inference, batch inference, and training workloads. |
18 |
| -* Deploy TensorFlow, PyTorch, ONNX, and other models to production. |
19 |
| -* Scale to handle production workloads with server-side batching and request-based autoscaling. |
20 |
| -* Configure rolling updates and live model reloading to update APIs without downtime. |
21 |
| -* Serve models efficiently with multi-model caching and spot / preemptible instances. |
22 |
| -* Stream performance metrics and structured logs to any monitoring tool. |
23 |
| -* Perform A/B tests with configurable traffic splitting. |
| 17 | +* **Realtime machine learning** - build NLP, computer vision, and other APIs and integrate them into any application. |
| 18 | +* **Large-scale inference** - scale realtime or batch inference workloads across hundreds or thousands of instances. |
| 19 | +* **Consistent MLOps workflows** - create streamlined and reproducible MLOps workflows for any machine learning team. |
24 | 20 |
|
25 | 21 | <br>
|
26 | 22 |
|
27 |
| -## How it works |
| 23 | +## Deploy |
28 | 24 |
|
29 |
| -### Implement a Predictor |
| 25 | +* Deploy TensorFlow, PyTorch, ONNX, and other models using a simple CLI or Python client. |
| 26 | +* Run realtime inference, batch inference, asynchronous inference, and training jobs. |
| 27 | +* Define preprocessing and postprocessing steps in Python and chain workloads seamlessly. |
30 | 28 |
|
31 |
| -```python |
32 |
| -# predictor.py |
| 29 | +```text |
| 30 | +$ cortex deploy apis.yaml |
33 | 31 |
|
34 |
| -from transformers import pipeline |
| 32 | +• creating text-generator (realtime API) |
| 33 | +• creating image-classifier (batch API) |
| 34 | +• creating video-analyzer (async API) |
35 | 35 |
|
36 |
| -class PythonPredictor: |
37 |
| - def __init__(self, config): |
38 |
| - self.model = pipeline(task="text-generation") |
39 |
| - |
40 |
| - def predict(self, payload): |
41 |
| - return self.model(payload["text"])[0] |
42 |
| -``` |
43 |
| - |
44 |
| -### Configure a realtime API |
45 |
| - |
46 |
| -```yaml |
47 |
| -# text_generator.yaml |
48 |
| - |
49 |
| -- name: text-generator |
50 |
| - kind: RealtimeAPI |
51 |
| - predictor: |
52 |
| - type: python |
53 |
| - path: predictor.py |
54 |
| - compute: |
55 |
| - gpu: 1 |
56 |
| - mem: 8Gi |
57 |
| - autoscaling: |
58 |
| - min_replicas: 1 |
59 |
| - max_replicas: 10 |
| 36 | +all APIs are ready! |
60 | 37 | ```
|
61 | 38 |
|
62 |
| -### Deploy |
| 39 | +## Manage |
63 | 40 |
|
64 |
| -```bash |
65 |
| -$ cortex deploy text_generator.yaml |
| 41 | +* Create A/B tests and shadow pipelines with configurable traffic splitting. |
| 42 | +* Automatically stream logs from every workload to your favorite log management tool. |
| 43 | +* Monitor your workloads with pre-built Grafana dashboards and add your own custom dashboards. |
66 | 44 |
|
67 |
| -# creating http://example.com/text-generator |
| 45 | +```text |
| 46 | +$ cortex get |
68 | 47 |
|
| 48 | +API TYPE GPUs |
| 49 | +text-generator realtime 32 |
| 50 | +image-classifier batch 64 |
| 51 | +video-analyzer async 16 |
69 | 52 | ```
|
70 | 53 |
|
71 |
| -### Serve prediction requests |
| 54 | +## Scale |
| 55 | + |
| 56 | +* Configure workload and cluster autoscaling to efficiently handle large-scale production workloads. |
| 57 | +* Create clusters with different types of instances for different types of workloads. |
| 58 | +* Spend less on cloud infrastructure by letting Cortex manage spot or preemptible instances. |
| 59 | + |
| 60 | +```text |
| 61 | +$ cortex cluster info |
72 | 62 |
|
73 |
| -```bash |
74 |
| -$ curl http://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}' |
| 63 | +provider: aws |
| 64 | +region: us-east-1 |
| 65 | +instance_types: [c5.xlarge, g4dn.xlarge] |
| 66 | +spot_instances: true |
| 67 | +min_instances: 10 |
| 68 | +max_instances: 100 |
75 | 69 | ```
|
0 commit comments