Skip to content

Commit 3936766

Browse files
authored
Merge branch 'main' into upd-nanotron
2 parents edd44a4 + 989f5f5 commit 3936766

File tree

75 files changed

+2267
-2373
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+2267
-2373
lines changed

.github/workflows/slow_tests.yaml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: Slow end to end tests
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- v*-release
8+
pull_request:
9+
branches:
10+
- main
11+
12+
jobs:
13+
run_tests:
14+
name: Run tests
15+
runs-on: 'aws-g4dn-2xlarge-use1-public-80'
16+
steps:
17+
- name: Install Git LFS
18+
run: |
19+
if ! command -v git-lfs &> /dev/null; then
20+
echo "Installing Git LFS..."
21+
sudo apt-get update && sudo apt-get install -y git-lfs
22+
git lfs install
23+
else
24+
echo "Git LFS already installed."
25+
fi
26+
27+
- name: Checkout repository
28+
uses: actions/checkout@v4
29+
with:
30+
lfs: true
31+
32+
- name: Install uv
33+
uses: astral-sh/setup-uv@v5
34+
with:
35+
enable-cache: true
36+
37+
- name: Install the project
38+
run: uv sync --extra dev
39+
40+
- name: Ensure cache directories exist
41+
run: mkdir -p cache/models cache/datasets
42+
43+
- name: Run tests
44+
env:
45+
HF_HOME: "cache/models"
46+
HF_DATASETS_CACHE: "cache/datasets"
47+
run: uv run pytest --disable-pytest-warnings --runslow tests/slow_tests

.github/workflows/tests.yaml

Lines changed: 46 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -11,36 +11,49 @@ on:
1111

1212
jobs:
1313
run_tests:
14-
name: Run tests
15-
runs-on: ubuntu-latest
16-
steps:
17-
- name: Checkout code
18-
uses: actions/checkout@v3
19-
with:
20-
lfs: 'true'
21-
- name: Setup Python environment
22-
uses: actions/setup-python@v4
23-
with:
24-
python-version: '3.10'
25-
cache: 'pip'
26-
- name: Install lighteval in editable mode
27-
run: |
28-
pip install -e .[dev,extended_tasks,multilingual,litellm]
29-
- name: Get cached files
30-
uses: actions/cache@v4
31-
id: get-cache
32-
with:
33-
path: "cache"
34-
key: test-cache-HF
35-
- name: Test
36-
env:
37-
HF_TEST_TOKEN: ${{ secrets.HF_TEST_TOKEN }}
38-
HF_HOME: "cache/models"
39-
HF_DATASETS_CACHE: "cache/datasets"
40-
run: | # PYTHONPATH="${PYTHONPATH}:src" HF_DATASETS_CACHE="cache/datasets" HF_HOME="cache/models"
41-
python -m pytest -x --disable-pytest-warnings
42-
- name: Write cache
43-
uses: actions/cache@v4
44-
with:
45-
path: "cache"
46-
key: test-cache-HF
14+
name: Run tests
15+
runs-on: ubuntu-latest
16+
steps:
17+
- name: Checkout repository
18+
uses: actions/checkout@v4
19+
with:
20+
lfs: true
21+
22+
- name: Cache Hugging Face models
23+
uses: actions/cache@v4
24+
with:
25+
path: cache/models
26+
key: hf-models-${{ runner.os }}-${{ github.ref }}
27+
restore-keys: hf-models-${{ runner.os }}-
28+
29+
- name: Cache Hugging Face datasets
30+
uses: actions/cache@v4
31+
with:
32+
path: cache/datasets
33+
key: hf-datasets-${{ runner.os }}-${{ github.ref }}
34+
restore-keys: hf-datasets-${{ runner.os }}-
35+
36+
- name: Cache uv virtual environment
37+
uses: actions/cache@v4
38+
with:
39+
path: .venv
40+
key: uv-env-${{ runner.os }}-${{ hashFiles('pyproject.toml') }}
41+
restore-keys: uv-env-${{ runner.os }}-
42+
43+
- name: Install uv
44+
uses: astral-sh/setup-uv@v5
45+
with:
46+
enable-cache: true
47+
48+
- name: Install the project
49+
run: uv sync --extra dev
50+
51+
- name: Ensure cache directories exist
52+
run: mkdir -p cache/models cache/datasets
53+
54+
- name: Run tests
55+
env:
56+
HF_TEST_TOKEN: ${{ secrets.HF_TEST_TOKEN }}
57+
HF_HOME: "cache/models"
58+
HF_DATASETS_CACHE: "cache/datasets"
59+
run: uv run pytest -x --disable-pytest-warnings

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Here’s a quick command to evaluate using the Accelerate backend:
8787

8888
```shell
8989
lighteval accelerate \
90-
"pretrained=gpt2" \
90+
"model_name=gpt2" \
9191
"leaderboard|truthfulqa:mc|0|0"
9292
```
9393

docs/source/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
title: Use vllm as backend
2424
- local: use-sglang-as-backend
2525
title: Use SGLang as backend
26-
- local: evaluate-the-model-on-a-server-or-container
27-
title: Evaluate on Server
26+
- local: use-huggingface-inference-endpoints-or-tgi-as-backend
27+
title: Use Hugging Face inference endpoints or TGI as backend
2828
- local: contributing-to-multilingual-evaluations
2929
title: Contributing to multilingual evaluations
3030
title: Guides

docs/source/adding-a-custom-task.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ Once your file is created you can then run the evaluation with the following com
171171

172172
```bash
173173
lighteval accelerate \
174-
"pretrained=HuggingFaceH4/zephyr-7b-beta" \
174+
"model_name=HuggingFaceH4/zephyr-7b-beta" \
175175
"community|{custom_task}|{fewshots}|{truncate_few_shot}" \
176176
--custom-tasks {path_to_your_custom_task_file}
177177
```

docs/source/package_reference/models.mdx

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,6 @@
3131
### Open AI Models
3232
[[autodoc]] models.endpoints.openai_model.OpenAIClient
3333

34-
## Nanotron Model
35-
### NanotronLightevalModel
36-
[[autodoc]] models.nanotron.nanotron_model.NanotronLightevalModel
37-
3834
## VLLM Model
3935
### VLLMModel
4036
[[autodoc]] models.vllm.vllm_model.VLLMModelConfig

docs/source/quicktour.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ To evaluate `GPT-2` on the Truthful QA benchmark with [🤗
2727

2828
```bash
2929
lighteval accelerate \
30-
"pretrained=gpt2" \
30+
"model_name=openai-community/gpt2" \
3131
"leaderboard|truthfulqa:mc|0|0"
3232
```
3333

@@ -59,7 +59,7 @@ When specifying a path to file, it should start with `./`.
5959

6060
```bash
6161
lighteval accelerate \
62-
"pretrained=gpt2" \
62+
"model_name=openai-community/gpt2" \
6363
./path/to/lighteval/examples/tasks/recommended_set.txt
6464
# or, e.g., "leaderboard|truthfulqa:mc|0|0|,leaderboard|gsm8k|3|1"
6565
```
@@ -79,7 +79,7 @@ You can then evaluate a model using data parallelism on 8 GPUs like follows:
7979
```bash
8080
accelerate launch --multi_gpu --num_processes=8 -m \
8181
lighteval accelerate \
82-
"pretrained=gpt2" \
82+
"model_name=openai-community/gpt2" \
8383
"leaderboard|truthfulqa:mc|0|0"
8484
```
8585

@@ -92,7 +92,7 @@ To evaluate a model using pipeline parallelism on 2 or more GPUs, run:
9292

9393
```bash
9494
lighteval accelerate \
95-
"pretrained=gpt2,model_parallel=True" \
95+
"model_name=openai-community/gpt2,model_parallel=True" \
9696
"leaderboard|truthfulqa:mc|0|0"
9797
```
9898

@@ -129,7 +129,7 @@ accelerate).
129129
- **add_special_tokens** (bool, optional, defaults to True): Whether to add special tokens to the input sequences.
130130
If `None`, the default value will be set to `True` for seq2seq models (e.g. T5) and
131131
`False` for causal models.
132-
- **model_parallel** (bool, optional, defaults to False):
132+
- **model_parallel** (bool, optional, defaults to None):
133133
True/False: force to use or not the `accelerate` library to load a large
134134
model across multiple devices.
135135
Default: None which corresponds to comparing the number of processes with

docs/source/saving-and-reading-results.mdx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,20 @@ This will create a Tensorboard dashboard in a HF org set with the `--results-org
3131
option.
3232

3333

34+
## Pushing results to WandB
35+
36+
You can push the results to WandB by setting `--wandb`. This will init a WandB
37+
run and log the results.
38+
39+
Wandb args need to be set in your env variables.
40+
41+
```
42+
export WANDB_PROJECT="lighteval"
43+
```
44+
45+
You can find a list of variable in the [wandb documentation](https://docs.wandb.ai/guides/track/environment-variables/).
46+
47+
3448
## How to load and investigate details
3549

3650
### Load from local detail files

docs/source/evaluate-the-model-on-a-server-or-container.mdx renamed to docs/source/use-huggingface-inference-endpoints-or-tgi-as-backend.mdx

Lines changed: 6 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,12 @@ be deleted afterwards).
2525
__configuration file example:__
2626

2727
```yaml
28-
model:
29-
base_params:
30-
# Pass either model_name, or endpoint_name and true reuse_existing
31-
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
32-
# reuse_existing: true # defaults to false; if true, ignore all params in instance, and don't delete the endpoint after evaluation
28+
model_parameters:
29+
reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
30+
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
3331
model_name: "meta-llama/Llama-2-7b-hf"
34-
# revision: "main" # defaults to "main"
32+
revision: "main" # defaults to "main"
3533
dtype: "float16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
36-
instance:
3734
accelerator: "gpu"
3835
region: "eu-west-1"
3936
vendor: "aws"
@@ -44,7 +41,7 @@ model:
4441
namespace: null # The namespace under which to launch the endpoint. Defaults to the current user's namespace
4542
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
4643
env_vars:
47-
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
44+
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
4845
```
4946

5047
### Text Generation Inference (TGI)
@@ -55,25 +52,8 @@ serverless inference.
5552
__configuration file example:__
5653

5754
```yaml
58-
model:
59-
instance:
55+
model_parameters:
6056
inference_server_address: ""
6157
inference_server_auth: null
6258
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
6359
```
64-
65-
### OpenAI API
66-
67-
Lighteval also supports evaluating models on the OpenAI API. To do so you need to set your OpenAI API key in the environment variable.
68-
69-
```bash
70-
export OPENAI_API_KEY={your_key}
71-
```
72-
73-
And then run the following command:
74-
75-
```bash
76-
lighteval endpoint openai \
77-
{model-name} \
78-
<task parameters>
79-
```

docs/source/use-inference-providers-as-backend.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Lighteval allows to use Hugging Face's Inference Providers to evaluate llms on s
1111

1212
```bash
1313
lighteval endpoint inference-providers \
14-
"model=deepseek-ai/DeepSeek-R1,provider=hf-inference" \
14+
"model_name=deepseek-ai/DeepSeek-R1,provider=hf-inference" \
1515
"lighteval|gsm8k|0|0"
1616
```
1717

@@ -28,13 +28,13 @@ lighteval endpoint inference-providers \
2828
with the following config file:
2929

3030
```yaml
31-
model:
31+
model_parameters:
3232
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
3333
provider: "novita"
3434
timeout: null
3535
proxies: null
3636
parallel_calls_count: 10
37-
generation:
37+
generation_parameters:
3838
temperature: 0.8
3939
top_k: 10
4040
max_new_tokens: 10000

docs/source/use-litellm-as-backend.mdx

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,14 @@ Documentation for available APIs and compatible endpoints can be found [here](ht
1010

1111
```bash
1212
lighteval endpoint litellm \
13-
"gpt-3.5-turbo" \
14-
"lighteval|gsm8k|0|0"
13+
"provider=openai,model_name=gpt-3.5-turbo" \
14+
"lighteval|gsm8k|0|0" \
15+
--use-chat-template
1516
```
1617

18+
> [!WARNING]
19+
> `--use-chat-template` is required for litellm to work properly.
20+
1721
## Using a config file
1822

1923
Litellm allows generation with any OpenAI compatible endpoint, for example you
@@ -22,17 +26,16 @@ can evaluate a model running on a local vllm server.
2226
To do so you will need to use a config file like so:
2327

2428
```yaml
25-
model:
26-
base_params:
29+
model_parameters:
2730
model_name: "openai/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
2831
base_url: "URL OF THE ENDPOINT YOU WANT TO USE"
2932
api_key: "" # remove or keep empty as needed
30-
generation:
31-
temperature: 0.5
32-
max_new_tokens: 256
33-
stop_tokens: [""]
34-
top_p: 0.9
35-
seed: 0
36-
repetition_penalty: 1.0
37-
frequency_penalty: 0.0
33+
generation_parameters:
34+
temperature: 0.5
35+
max_new_tokens: 256
36+
stop_tokens: [""]
37+
top_p: 0.9
38+
seed: 0
39+
repetition_penalty: 1.0
40+
frequency_penalty: 0.0
3841
```

0 commit comments

Comments
 (0)