Skip to content

Commit ce0a261

Browse files
committed
fix: add known issues
1 parent ca85c0c commit ce0a261

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ bigcodebench.sanitize --samples samples.jsonl
164164

165165
# 💡 If you want to get the calibrated results:
166166
bigcodebench.sanitize --samples samples.jsonl --calibrate
167-
# Sanitized code will be produced to `samples-sanitized-calibrate.jsonl`
167+
# Sanitized code will be produced to `samples-sanitized-calibrated.jsonl`
168168

169169
# 💡 If you are storing codes in directories:
170170
bigcodebench.sanitize --samples /path/to/vicuna-[??]b_temp_[??]
@@ -197,7 +197,7 @@ You are strongly recommended to use a sandbox such as [docker](https://docs.dock
197197
docker run -v $(pwd):/bigcodebench terryzho/bigcodebench-evaluate:latest --subset [complete|instruct] --samples samples.jsonl
198198
# ...Or locally ⚠️
199199
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl
200-
# ...If the ground truth is working
200+
# ...If the ground truth is working locally
201201
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl --no-gt
202202
```
203203

@@ -288,6 +288,8 @@ We will share pre-generated code samples from LLMs we have [evaluated](https://b
288288

289289
- [ ] Due to the flakes in the evaluation, the execution results may vary slightly (~0.5%) between runs. We are working on improving the evaluation stability.
290290

291+
- [ ] We are aware of the issue that some users may need to use a proxy to access the internet. We are working on a subset of the tasks that do not require internet access to evaluate the code.
292+
291293
## 📜 Citation
292294

293295
```bibtex

0 commit comments

Comments
 (0)