Skip to content

Update README.md to reflect Python client #1595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 25, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 55 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@

<!-- Delete on release branches -->
<!-- CORTEX_VERSION_README_MINOR -->

[install](https://docs.cortex.dev/install) • [documentation](https://docs.cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.23/examples) • [support](https://gitter.im/cortexlabs/cortex)
[install](https://docs.cortex.dev/install) • [documentation](https://docs.cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.23/examples) • [community](https://gitter.im/cortexlabs/cortex)

# Deploy machine learning models to production

Expand All @@ -16,49 +15,45 @@ Cortex is an open source platform for deploying, managing, and scaling machine l

## Model serving infrastructure

* Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs
* Ensures high availability with availability zones and automated instance restarts
* Scales to handle production workloads with request-based autoscaling
* Runs inference on spot instances with on-demand backups
* Manages traffic splitting for A/B testing
* Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs.
* Ensures high availability with availability zones and automated instance restarts.
* Runs inference on spot instances with on-demand backups.
* Autoscales to handle production workloads.

#### Configure your cluster:
#### Configure Cortex

```yaml
# cluster.yaml

region: us-east-1
availability_zones: [us-east-1a, us-east-1b]
api_gateway: public
instance_type: g4dn.xlarge
min_instances: 10
max_instances: 100
spot: true
```

#### Spin up your cluster on your AWS account:
#### Spin up Cortex on your AWS account

```text
$ cortex cluster up --config cluster.yaml

○ configuring autoscaling ✓
○ configuring networking ✓
○ configuring logging ✓
○ configuring metrics dashboard ✓

cortex is ready!
```

<br>

## Reproducible model deployments
## Reproducible deployments

* Implement request handling in Python
* Customize compute, autoscaling, and networking for each API
* Package dependencies, code, and configuration for reproducible deployments
* Test locally before deploying to your cluster
* Package dependencies, code, and configuration for reproducible deployments.
* Configure compute, autoscaling, and networking for each API.
* Integrate with your data science platform or CI/CD system.
* Test locally before deploying to your cluster.

#### Implement a predictor:
#### Implement a predictor

```python
# predictor.py
Expand All @@ -73,70 +68,67 @@ class PythonPredictor:
return self.model(payload["text"])[0]
```

#### Configure an API:

```yaml
# cortex.yaml

name: text-generator
kind: RealtimeAPI
predictor:
path: predictor.py
compute:
gpu: 1
mem: 4Gi
autoscaling:
min_replicas: 1
max_replicas: 10
networking:
api_gateway: public
```

#### Deploy to production:

```text
$ cortex deploy cortex.yaml

creating https://example.com/text-generator
#### Configure an API

$ curl https://example.com/text-generator \
-X POST -H "Content-Type: application/json" \
-d '{"text": "deploy machine learning models to"}'

"deploy machine learning models to production"
```python
api_spec = {
"name": "text-generator",
"kind": "RealtimeAPI",
"predictor": {
"type": "python",
"path": "predictor.py"
},
"compute": {
"gpu": 1,
"mem": "8Gi",
},
"autoscaling": {
"min_replicas": 1,
"max_replicas": 10
},
"networking": {
"api_gateway": "public"
}
}
```

<br>

## API management
## Scalable machine learning APIs

* Monitor API performance
* Aggregate and stream logs
* Customize prediction tracking
* Update APIs without downtime
* Scale to handle production workloads with request-based autoscaling.
* Stream performance metrics and logs to any monitoring tool.
* Serve many models efficiently with multi model caching.
* Configure traffic splitting for A/B testing.
* Update APIs without downtime.

#### Manage your APIs:
#### Deploy to your cluster

```text
$ cortex get
```python
import cortex

realtime api status replicas last update latency requests
cx = cortex.client("aws")
cx.deploy(api_spec, project_dir=".")

text-generator live 34 9h 247ms 71828
object-detector live 13 15h 23ms 828459
# creating https://example.com/text-generator
```

#### Consume your API

batch api running jobs last update
```python
import requests

image-classifier 5 10h
endpoint = "https://example.com/text-generator"
payload = {"text": "hello world"}
prediction = requests.post(endpoint, payload)
```

<br>

## Get started

```text
$ pip install cortex
```bash
pip install cortex
```

See the [installation guide](https://docs.cortex.dev/install) for next steps.