Skip to content

Improve download error messages #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ python3 torchchat.py download llama3
View available models with `python3 torchchat.py list`. You can also remove downloaded models
with `python3 torchchat.py remove llama3`.

### Common Issues

* **CERTIFICATE_VERIFY_FAILED**:
Run `pip install --upgrade certifi`.
* **Access to model is restricted and you are not in the authorized list. Visit \[link\] to ask for access**:
Some models require an additional step to access. Follow the link to fill out the request form on HuggingFace.

## What can you do with torchchat?

* Run models via PyTorch / Python:
Expand Down Expand Up @@ -106,7 +113,7 @@ Quantization is the process of converting a model into a more memory-efficient r

Depending on the model and the target device, different quantization recipes may be applied. Torchchat contains two example configurations to optimize performance for GPU-based systems `config/data/cuda.json` , and mobile systems `config/data/mobile.json`. The GPU configuration is targeted towards optimizing for memory bandwidth which is a scarce resource in powerful GPUs (and to a less degree, memory footprint to fit large models into a device's memory). The mobile configuration is targeted towards optimizing for memory fotoprint because in many devices, a single application is limited to as little as GB or less of memory.

You can use the quantization recipes in conjunction with any of the `chat`, `generate` and `browser` commands to test their impact and accelerate model execution. You will apply these recipes to the export comamnds below, to optimize the exported models. To adapt these recipes or wrote your own, please refer to the [quantization overview](docs/quantization.md).
You can use the quantization recipes in conjunction with any of the `chat`, `generate` and `browser` commands to test their impact and accelerate model execution. You will apply these recipes to the export comamnds below, to optimize the exported models. To adapt these recipes or wrote your own, please refer to the [quantization overview](docs/quantization.md).

---
*TO BE REPLACED BY SUITABLE ORDING PROVIDED BY LEGAL*
Expand Down
17 changes: 14 additions & 3 deletions download.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# LICENSE file in the root directory of this source tree.
import os
import shutil
import sys
import urllib.request
from pathlib import Path
from typing import Optional
Expand Down Expand Up @@ -35,10 +36,20 @@ def _download_hf_snapshot(
ignore_patterns="*safetensors*",
)
except HTTPError as e:
if e.response.status_code == 401:
raise RuntimeError(
"Access denied. Run huggingface-cli login to authenticate."
if e.response.status_code == 401: # Missing HuggingFace CLI login.
print(
"Access denied. Create a HuggingFace account and run 'pip3 install huggingface_hub' and 'huggingface-cli login' to authenticate.",
file=sys.stderr
)
exit(1)
elif e.response.status_code == 403: # No access to the specific model.
# The error message includes a link to request access to the given model. This prints nicely and does not include
# a traceback.
print(
str(e),
file=sys.stderr
)
exit(1)
else:
raise e

Expand Down