Skip to content

Commit 865e7e6

Browse files
authored
Merge branch 'main' into agent
2 parents 7f36ff6 + d80a044 commit 865e7e6

18 files changed

+1057
-158
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,4 +166,5 @@ repos/
166166
config.yml
167167
hydra_outputs/
168168
.commit0*
169-
.agent*
169+
.agent*
170+
docs/analysis_*.md

LICENSE

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Copyright (c) 2024 Wenting Zhao
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy
4+
of this software and associated documentation files (the "Software"), to deal
5+
in the Software without restriction, including without limitation the rights
6+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
copies of the Software, and to permit persons to whom the Software is
8+
furnished to do so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in all
11+
copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19+
SOFTWARE.

README.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,170 @@
11
# Commit0
2+
3+
<a href="https://commit-0.github.io/">Commit0</a> is a from scratch AI coding challenge. Can you create a library from commit 0?
4+
5+
<p align="center">
6+
<img src="docs/arch.png" width="500px">
7+
</p>
8+
9+
<p align="center">
10+
<a href="https://commit-0.github.io/">
11+
<img src="https://img.shields.io/badge/Read-Docs-green.svg"/>
12+
</a>
13+
</p>
14+
15+
16+
The benchmark consists of 57 core Python libraries. The challenge is to rebuild these libraries and pass their unit tests. All libraries have:
17+
18+
* Significant test coverage
19+
* Detailed specification and documentation
20+
* Lint and type checking
21+
22+
Commit0 is an interactive environment that makes it easy to design and test new agents. You can:
23+
24+
* Efficiently run tests in isolated environments
25+
* Distribute testing and development across cloud systems
26+
* Track and log all changes made throughout.
27+
28+
To install Commit0, run:
29+
30+
```
31+
pip install commit0
32+
```
33+
34+
Commit0 provides several commands to facilitate the process of cloning, building, testing, and evaluating repositories. Here's an overview of the available commands:
35+
36+
### Setup
37+
38+
<p align=center>
39+
<img src="docs/commit0.gif" width="500px">
40+
</p>
41+
42+
Use `commit0 setup [OPTIONS] REPO_SPLIT` to clone a repository split.
43+
Available options include:
44+
45+
| Argument | Type | Description | Default |
46+
|----------|------|-------------|---------|
47+
| `repo_split` | str | Split of repositories to clone | |
48+
| `--dataset-name` | str | Name of the Huggingface dataset | `wentingzhao/commit0_combined` |
49+
| `--dataset-split` | str | Split of the Huggingface dataset | `test` |
50+
| `--base-dir` | str | Base directory to clone repos to | `repos/` |
51+
| `--commit0-dot-file-path` | str | Storing path for stateful commit0 configs | `.commit0.yaml` |
52+
53+
### Build
54+
55+
Use `commit0 build [OPTIONS]` to build the Commit0 split chosen in the Setup stage.
56+
Available options include:
57+
58+
| Argument | Type | Description | Default |
59+
|----------|------|-------------|---------|
60+
| `--num-workers` | int | Number of workers | `8` |
61+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
62+
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
63+
64+
### Get Tests
65+
66+
Use `commit0 get-tests REPO_NAME` to get tests for a Commit0 repository.
67+
68+
| Argument | Type | Description | Default |
69+
|----------|------|-------------|---------|
70+
| `repo_name` | str | Name of the repository to get tests for | |
71+
72+
### Test
73+
74+
Use `commit0 test [OPTIONS] REPO_OR_REPO_PATH [TEST_IDS]` to run tests on a Commit0 repository.
75+
Available options include:
76+
77+
| Argument | Type | Description | Default |
78+
|----------|------|-------------|---------|
79+
| `repo_or_repo_path` | str | Directory of the repository to test | |
80+
| `test_ids` | str | Test IDs to run | |
81+
| `--branch` | str | Branch to test | |
82+
| `--backend` | str | Backend to use for testing | `modal` |
83+
| `--timeout` | int | Timeout for tests in seconds | `1800` |
84+
| `--num-cpus` | int | Number of CPUs to use | `1` |
85+
| `--reference` | bool | Test the reference commit | `False` |
86+
| `--coverage` | bool | Get coverage information | `False` |
87+
| `--rebuild` | bool | Rebuild an image | `False` |
88+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
89+
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
90+
| `--stdin` | bool | Read test names from stdin | `False` |
91+
92+
### Evaluate
93+
94+
Use `commit0 evaluate [OPTIONS]` to evaluate the Commit0 split chosen in the Setup stage.
95+
Available options include:
96+
97+
| Argument | Type | Description | Default |
98+
|----------|------|-------------|---------|
99+
| `--branch` | str | Branch to evaluate | |
100+
| `--backend` | str | Backend to use for evaluation | `modal` |
101+
| `--timeout` | int | Timeout for evaluation in seconds | `1800` |
102+
| `--num-cpus` | int | Number of CPUs to use | `1` |
103+
| `--num-workers` | int | Number of workers to use | `8` |
104+
| `--reference` | bool | Evaluate the reference commit | `False` |
105+
| `--coverage` | bool | Get coverage information | `False` |
106+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
107+
| `--rebuild` | bool | Rebuild images | `False` |
108+
109+
### Lint
110+
111+
Use `commit0 lint [OPTIONS] REPO_OR_REPO_DIR` to lint files in a repository.
112+
Available options include:
113+
114+
| Argument | Type | Description | Default |
115+
|----------|------|-------------|---------|
116+
| `repo_or_repo_dir` | str | Directory of the repository to test | |
117+
| `--files` | List[Path] | Files to lint (optional) | |
118+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
119+
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
120+
121+
### Save
122+
123+
Use `commit0 save [OPTIONS] OWNER BRANCH` to save the Commit0 split to GitHub.
124+
Available options include:
125+
126+
| Argument | Type | Description | Default |
127+
|----------|------|-------------|---------|
128+
| `owner` | str | Owner of the repository | |
129+
| `branch` | str | Branch to save | |
130+
| `--github-token` | str | GitHub token for authentication | |
131+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
132+
133+
## Agent
134+
135+
### Config
136+
137+
Use `agent config [OPTIONS] AGENT_NAME` to set up the configuration for an agent.
138+
Available options include:
139+
140+
| Argument | Type | Description | Default |
141+
|----------|------|-------------|---------|
142+
| `agent_name` | str | Agent to use, we only support [aider](https://aider.chat/) for now. | `aider` |
143+
| `--model-name` | str | LLM model to use, check [here](https://aider.chat/docs/llms.html) for all supported models. | `claude-3-5-sonnet-20240620` |
144+
| `--use-user-prompt` | bool | Use a custom prompt instead of the default prompt. | `False` |
145+
| `--user-prompt` | str | The prompt sent to agent. | See code for details. |
146+
| `--run-tests` | bool | Run tests after code modifications for feedback. You need to set up `docker` or `modal` before running tests, refer to commit0 docs. | `False` |
147+
| `--max-iteration` | int | Maximum number of agent iterations. | `3` |
148+
| `--use-repo-info` | bool | Include the repository information. | `False` |
149+
| `--max-repo-info-length` | int | Maximum length of the repository information to use. | `10000` |
150+
| `--use-unit-tests-info` | bool | Include the unit tests information. | `False` |
151+
| `--max-unit-tests-info-length` | int | Maximum length of the unit tests information to use. | `10000` |
152+
| `--use-spec-info` | bool | Include the spec information. | `False` |
153+
| `--max-spec-info-length` | int | Maximum length of the spec information to use. | `10000` |
154+
| `--use-lint-info` | bool | Include the lint information. | `False` |
155+
| `--max-lint-info-length` | int | Maximum length of the lint information to use. | `10000` |
156+
| `--pre-commit-config-path` | str | Path to the pre-commit config file. This is needed for running `lint`. | `.pre-commit-config.yaml` |
157+
| `--agent-config-file` | str | Path to write the agent config. | `.agent.yaml` |
158+
159+
### Running
160+
161+
Use `agent run [OPTIONS] BRANCH` to execute an agent on a specific branch.
162+
Available options include:
163+
164+
| Argument | Type | Description | Default |
165+
|----------|------|-------------|---------|
166+
| `branch` | str | Branch to run the agent on, you can specific the name of the branch | |
167+
| `--backend` | str | Test backend to run the agent on, ignore this option if you are not adding `run_tests` option to agent. | `modal` |
168+
| `--log-dir` | str | Log directory to store the logs. | `logs/aider` |
169+
| `--max-parallel-repos` | int | Maximum number of repositories for agent to run in parallel. Running in sequential if set to 1. | `1` |
170+
| `--display-repo-progress-num` | int | Number of repo progress displayed when running. | `5` |

docs/agent.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
## Running
2+
3+
Commit0 provides a command-line `agent` for configuring and
4+
running AI agents to assist with code development and testing.
5+
In this example we use [Aider](https://aider.chat/) as the
6+
baseline code completion agent
7+
8+
```bash
9+
pip install aider
10+
```
11+
12+
First we assume there is an underlying `commit0`
13+
project that is configured. To create a new project,
14+
run the commit0 `setup` command.
15+
16+
```bash
17+
commit0 setup lite
18+
```
19+
20+
Next we need to configure the backend for the agent.
21+
Currently we only support the aider backend. Config
22+
can also be used to pass in arguments.
23+
24+
```bash
25+
export ANTHROPIC_API_KEY="..."
26+
agent config aider
27+
```
28+
29+
Finally we run the underlying agent. This will create a display
30+
that shows the current progress of the agent.
31+
32+
```bash
33+
agent run
34+
```
35+
36+
37+
### Extending
38+
Refer to `class Agents` in `agent/agents.py`. You can design your own agent by inheriting `Agents` class and implement the `run` method.
39+
40+
## Notes
41+
42+
43+
* Aider automatically retries certain API errors. For details, see [here](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L17).
44+
* When increasing --max-parallel-repos, be mindful of aider's [60-second retry timeout](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L39). Set this value according to your API tier to avoid RateLimitErrors stopping processes.
45+
* Currently, agent will skip file with more than 1500 lines. See `agent/agent_utils.py#L199` for details.
46+
* Running a full `all` commit0 split costs approximately $100 with Claude Sonnet 3.5.

docs/analysis.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
2+
| | Name | Summary | |
3+
|--|--------|----------|--|
4+
||[reference](/analysis_reference)|3628 / 33 ; duration: 18.66s||
5+
||[test-save-commit0](/analysis_test-save-commit0)|0 / 0 ; duration: 0.00s||
6+
||[model_name-claude-3-5-sonnet-20240620__run_tests-0__use_lint_info-0__use_spec_info-0](/analysis_model_name-claude-3-5-sonnet-20240620__run_tests-0__use_lint_info-0__use_spec_info-0)|0 / 0 ; duration: 0.00s||

docs/api.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
## Commit0
2+
3+
Commit0 provides several commands to facilitate the process of cloning, building, testing, and evaluating repositories. Here's an overview of the available commands:
4+
5+
### Setup
6+
7+
Use `commit0 setup [OPTIONS] REPO_SPLIT` to clone a repository split.
8+
Available options include:
9+
10+
| Argument | Type | Description | Default |
11+
|----------|------|-------------|---------|
12+
| `repo_split` | str | Split of repositories to clone | |
13+
| `--dataset-name` | str | Name of the Huggingface dataset | `wentingzhao/commit0_combined` |
14+
| `--dataset-split` | str | Split of the Huggingface dataset | `test` |
15+
| `--base-dir` | str | Base directory to clone repos to | `repos/` |
16+
| `--commit0-dot-file-path` | str | Storing path for stateful commit0 configs | `.commit0.yaml` |
17+
18+
### Build
19+
20+
Use `commit0 build [OPTIONS]` to build the Commit0 split chosen in the Setup stage.
21+
Available options include:
22+
23+
| Argument | Type | Description | Default |
24+
|----------|------|-------------|---------|
25+
| `--num-workers` | int | Number of workers | `8` |
26+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
27+
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
28+
29+
### Get Tests
30+
31+
Use `commit0 get-tests REPO_NAME` to get tests for a Commit0 repository.
32+
33+
| Argument | Type | Description | Default |
34+
|----------|------|-------------|---------|
35+
| `repo_name` | str | Name of the repository to get tests for | |
36+
37+
### Test
38+
39+
Use `commit0 test [OPTIONS] REPO_OR_REPO_PATH [TEST_IDS]` to run tests on a Commit0 repository.
40+
Available options include:
41+
42+
| Argument | Type | Description | Default |
43+
|----------|------|-------------|---------|
44+
| `repo_or_repo_path` | str | Directory of the repository to test | |
45+
| `test_ids` | str | Test IDs to run | |
46+
| `--branch` | str | Branch to test | |
47+
| `--backend` | str | Backend to use for testing | `modal` |
48+
| `--timeout` | int | Timeout for tests in seconds | `1800` |
49+
| `--num-cpus` | int | Number of CPUs to use | `1` |
50+
| `--reference` | bool | Test the reference commit | `False` |
51+
| `--coverage` | bool | Get coverage information | `False` |
52+
| `--rebuild` | bool | Rebuild an image | `False` |
53+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
54+
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
55+
| `--stdin` | bool | Read test names from stdin | `False` |
56+
57+
### Evaluate
58+
59+
Use `commit0 evaluate [OPTIONS]` to evaluate the Commit0 split chosen in the Setup stage.
60+
Available options include:
61+
62+
| Argument | Type | Description | Default |
63+
|----------|------|-------------|---------|
64+
| `--branch` | str | Branch to evaluate | |
65+
| `--backend` | str | Backend to use for evaluation | `modal` |
66+
| `--timeout` | int | Timeout for evaluation in seconds | `1800` |
67+
| `--num-cpus` | int | Number of CPUs to use | `1` |
68+
| `--num-workers` | int | Number of workers to use | `8` |
69+
| `--reference` | bool | Evaluate the reference commit | `False` |
70+
| `--coverage` | bool | Get coverage information | `False` |
71+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
72+
| `--rebuild` | bool | Rebuild images | `False` |
73+
74+
### Lint
75+
76+
Use `commit0 lint [OPTIONS] REPO_OR_REPO_DIR` to lint files in a repository.
77+
Available options include:
78+
79+
| Argument | Type | Description | Default |
80+
|----------|------|-------------|---------|
81+
| `repo_or_repo_dir` | str | Directory of the repository to test | |
82+
| `--files` | List[Path] | Files to lint (optional) | |
83+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
84+
| `--verbose` | int | Verbosity level (1 or 2) | `1` |
85+
86+
### Save
87+
88+
Use `commit0 save [OPTIONS] OWNER BRANCH` to save the Commit0 split to GitHub.
89+
Available options include:
90+
91+
| Argument | Type | Description | Default |
92+
|----------|------|-------------|---------|
93+
| `owner` | str | Owner of the repository | |
94+
| `branch` | str | Branch to save | |
95+
| `--github-token` | str | GitHub token for authentication | |
96+
| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` |
97+
98+
## Agent
99+
100+
### Config
101+
102+
Use `agent config [OPTIONS] AGENT_NAME` to set up the configuration for an agent.
103+
Available options include:
104+
105+
| Argument | Type | Description | Default |
106+
|----------|------|-------------|---------|
107+
| `agent_name` | str | Agent to use, we only support [aider](https://aider.chat/) for now. | `aider` |
108+
| `--model-name` | str | LLM model to use, check [here](https://aider.chat/docs/llms.html) for all supported models. | `claude-3-5-sonnet-20240620` |
109+
| `--use-user-prompt` | bool | Use a custom prompt instead of the default prompt. | `False` |
110+
| `--user-prompt` | str | The prompt sent to agent. | See code for details. |
111+
| `--run-tests` | bool | Run tests after code modifications for feedback. You need to set up `docker` or `modal` before running tests, refer to commit0 docs. | `False` |
112+
| `--max-iteration` | int | Maximum number of agent iterations. | `3` |
113+
| `--use-repo-info` | bool | Include the repository information. | `False` |
114+
| `--max-repo-info-length` | int | Maximum length of the repository information to use. | `10000` |
115+
| `--use-unit-tests-info` | bool | Include the unit tests information. | `False` |
116+
| `--max-unit-tests-info-length` | int | Maximum length of the unit tests information to use. | `10000` |
117+
| `--use-spec-info` | bool | Include the spec information. | `False` |
118+
| `--max-spec-info-length` | int | Maximum length of the spec information to use. | `10000` |
119+
| `--use-lint-info` | bool | Include the lint information. | `False` |
120+
| `--max-lint-info-length` | int | Maximum length of the lint information to use. | `10000` |
121+
| `--pre-commit-config-path` | str | Path to the pre-commit config file. This is needed for running `lint`. | `.pre-commit-config.yaml` |
122+
| `--agent-config-file` | str | Path to write the agent config. | `.agent.yaml` |
123+
124+
### Running
125+
126+
Use `agent run [OPTIONS] BRANCH` to execute an agent on a specific branch.
127+
Available options include:
128+
129+
| Argument | Type | Description | Default |
130+
|----------|------|-------------|---------|
131+
| `branch` | str | Branch to run the agent on, you can specific the name of the branch | |
132+
| `--backend` | str | Test backend to run the agent on, ignore this option if you are not adding `run_tests` option to agent. | `modal` |
133+
| `--log-dir` | str | Log directory to store the logs. | `logs/aider` |
134+
| `--max-parallel-repos` | int | Maximum number of repositories for agent to run in parallel. Running in sequential if set to 1. | `1` |
135+
| `--display-repo-progress-num` | int | Number of repo progress displayed when running. | `5` |

docs/arch.png

281 KB
Loading

docs/baseline.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Baseline
2+
3+
Commit0 contains a baseline system based on
4+
the [Aider](https://aider.chat/) code generation
5+
system.
6+
7+
...

docs/commit0.gif

22.3 MB
Loading

0 commit comments

Comments
 (0)