Skip to content

Commit 5bd6b74

Browse files
committed
Merge branch 'main' into aider
2 parents 69f5064 + 881d931 commit 5bd6b74

File tree

17 files changed

+511
-473
lines changed

17 files changed

+511
-473
lines changed

.github/workflows/system.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,18 @@ jobs:
1818
uses: docker/setup-buildx-action@v3
1919
- name: Install the project
2020
run: uv sync
21-
- name: Clone
21+
- name: Set up commit0
2222
run: uv run commit0 clone simpy
23-
- name: Setup
23+
- name: Build docker images
2424
run: uv run commit0 build simpy
2525
- name: Get tests
2626
run: uv run commit0 get-tests simpy
2727
- name: Test
2828
run: uv run commit0 test-reference simpy tests/test_event.py::test_succeed
2929
- name: Evaluate
3030
run: uv run commit0 evaluate-reference simpy
31+
- name: Save
32+
env:
33+
GITHUB_TOKEN: ${{ secrets.MY_GITHUB_TOKEN }}
34+
run: |
35+
uv run commit0 save simpy test-save-commit0

README.md

Lines changed: 1 addition & 158 deletions
Original file line numberDiff line numberDiff line change
@@ -1,158 +1 @@
1-
# spec2repo
2-
3-
We set up a task where, given a specification, the goal is to produce an implementation of the specification.
4-
Specifically, we are interested in converting library specifications to implementations (i.e., repositories).
5-
We lay out the steps to create a spec2repo example and perform an evaluation on the example using the SWE-bench framework.
6-
7-
First, to install required packages,
8-
```
9-
pip install -r requirements.txt
10-
```
11-
12-
Please provide the following information for the list of repositories in a YAML file,
13-
```
14-
repos.yml
15-
0: # just an index
16-
name: [repository_name] # in the form of {organization_name}/{library_name}
17-
commit: [commit_sha]
18-
tag: [version_tag]
19-
setup:
20-
- [command_1]
21-
- [command_2]
22-
- ...
23-
```
24-
There are two options to specify the version of the library:
25-
you can either provide a specific commit or a specific tag. You cannot specify both at the same time.
26-
Finally, include the commands that sets up the library from a local repository.
27-
For example, to create an example for the ``msiemens/tinydb`` with version 4.8,
28-
```
29-
repos.yml
30-
0:
31-
name: "msiemens/tinydb"
32-
commit: null
33-
tag: "v4.8.0"
34-
setup:
35-
- "python -m pip install --upgrade pip twine"
36-
- "pip install poetry"
37-
- "poetry install"
38-
```
39-
40-
We are now ready to generate the dataset. Before that, add your GitHub token in the environment.
41-
```
42-
export GITHUB_TOKEN=[github_token]
43-
```
44-
Now run,
45-
```
46-
python create-data/build_dataset.py repos.json --hf_name wentingzhao/spec2repo
47-
```
48-
where ``repos.json`` is the file we specified above, and ``wentingzhao/spec2repo`` is where you want to upload the dataset on HF.
49-
This command produces the base commit (with function body removed), gold patch that passes all unit tests, and all test function names.
50-
Note that this script will create a fork for the libaray. The fork will be created under organization ``spec2repo``.
51-
You can change the organization to somewhere else. But if you want to create a fork under ``spec2repo``, please contact Wenting Zhao to be added to the organization.
52-
53-
Now that dataset has been generated, we move on to using SWE-bench to perform an evaluation.
54-
First, follow the instructions in the [Docker setup guide](https://docs.docker.com/engine/install/) to install Docker on your machine.
55-
If you're setting up on Linux, we recommend seeing the [post-installation steps](https://docs.docker.com/engine/install/linux-postinstall/) as well.
56-
57-
To install SWE-bench:
58-
```bash
59-
git clone https://github.com/princeton-nlp/SWE-bench.git
60-
cd SWE-bench
61-
pip install -e .
62-
```
63-
64-
Now, let's add a configuration file to build a DOCKER environment for the library in a YAML file:
65-
```
66-
configs/specs.yml
67-
spec2repo/tinydb:
68-
"1.0":
69-
python: 3.11
70-
install: "python -m pip install --upgrade pip twine; pip install poetry; poetry install"
71-
test_cmd: "pytest"
72-
```
73-
To make this for your own library, leave the ``1.0`` unchanged, specify the Python version with ``python``, and how to locally build the library with ``install``, and how to run tests with ``test_cmd``.
74-
75-
You also need to write your own function to process the test logs. Please add your function in ``configs/log_parsers.py``. The function should take in a log text file and return a dictionary that maps from a test function to its test stutas such as passed or failed. After that, update the global variable ``ADD_MAP_REPO_TO_PARSER``.
76-
```
77-
configs/log_parsers.py
78-
def parse_log_tinydb(log: str) -> dict[str, str]:
79-
"""
80-
Parser for test logs generated with TinyDB framework
81-
82-
Args:
83-
log (str): log content
84-
Returns:
85-
dict: test case to test status mapping
86-
"""
87-
test_status_map = {}
88-
pattern = r"^(.*\/.*)::(.*)\s+\w+\s+\[\s*(\d+%)\]$"
89-
for line in log.split("\n"):
90-
line = line.strip()
91-
m = re.match(pattern, line)
92-
if m:
93-
line = line.split()
94-
test, value = line[:2]
95-
if value == "PASSED":
96-
test_status_map[test] = TestStatus.PASSED.value
97-
else:
98-
test_status_map[test] = TestStatus.FAILED.value
99-
return test_status_map
100-
101-
ADD_MAP_REPO_TO_PARSER = {
102-
"spec2repo/tinydb": parse_log_tinydb,
103-
}
104-
```
105-
106-
Finally, to run evaluation for the created example using the gold patch with the following script:
107-
```
108-
python run.py \
109-
--dataset_name wentingzhao/spec2repo \
110-
--split train \
111-
--max_workers 2 \
112-
--predictions_path 'gold' \
113-
--instance_ids spec2repo__tinydb-01 \
114-
--run_id validate-gold \
115-
--spec_config configs/specs.yml
116-
```
117-
118-
## Baseline
119-
### Baseline Input & Output
120-
121-
A simple baseline evaluation can be described like this
122-
```python
123-
def run_baseline(base_model, agent, prompt, context, target, error_history) -> test_results, error_message:
124-
pass
125-
```
126-
127-
**Input**
128-
129-
`base_model`: base LLM, e.g. `gpt-4o`, `claude-3-5-sonnet-20240620`
130-
131-
`agent`: agent, e.g. `aider`, `opendevin`, `None`
132-
133-
`prompt`: the prompt/instruction given to `agent`/`base_model`
134-
135-
`context`: there are 3 types of context
136-
- `context-type-1`: reference doc/pdf/website
137-
- `context-type-2`: unit_tests that target will be tested with
138-
- `context-type-3`: Repo info
139-
- skeleton of the repo(filenames under each dir)
140-
- function stubs
141-
- function name in each file(granularity need to be specified)
142-
143-
`target`: target function or file for agent or base_model to complete
144-
`edit_history`: entire edit histories, each contains previous implementation, updated implementation and corresponding error message
145-
146-
**Output**
147-
148-
`test_results`: WIP
149-
`error_message`: WIP
150-
151-
## Baseline Evaluation & Ablation
152-
153-
There are mainly 3 axes.
154-
- different `base_model`
155-
- different `agent`
156-
- different `context`
157-
158-
Current priority is to run `gpt-4o` + `aider` with certain `context` to get first baseline result.
1+
# Commit0

commit0/__main__.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import commit0.harness.build
44
import commit0.harness.setup
55
import commit0.harness.evaluate
6+
import commit0.harness.save
67
import copy
78
import sys
89
import os
@@ -20,7 +21,7 @@ def main() -> None:
2021
)
2122
# type check config values
2223
cs = ConfigStore.instance()
23-
cs.store(name="user", node=Commit0Config)
24+
cs.store(name="user", group="Commit0Config", node=Commit0Config)
2425
# have hydra to ignore all command-line arguments
2526
sys_argv = copy.deepcopy(sys.argv)
2627
sys.argv = [sys.argv[0]]
@@ -29,8 +30,14 @@ def main() -> None:
2930
# after hydra gets all configs, put command-line arguments back
3031
sys.argv = sys_argv
3132
# repo_split: split from command line has a higher priority than split in hydra
32-
if command in ["clone", "build", "evaluate", "evaluate-reference"]:
33-
if len(sys.argv) == 3:
33+
if command in [
34+
"clone",
35+
"build",
36+
"evaluate",
37+
"evaluate-reference",
38+
"save",
39+
]:
40+
if len(sys.argv) >= 3:
3441
if sys.argv[2] not in SPLIT:
3542
raise ValueError(
3643
f"repo split must be from {', '.join(SPLIT.keys())}, but you provided {sys.argv[2]}"
@@ -52,6 +59,7 @@ def main() -> None:
5259
config.dataset_split,
5360
config.repo_split,
5461
config.num_workers,
62+
config.backend,
5563
)
5664
elif command == "get-tests":
5765
repo = sys.argv[2]
@@ -85,6 +93,17 @@ def main() -> None:
8593
config.timeout,
8694
config.num_workers,
8795
)
96+
elif command == "save":
97+
organization = sys.argv[3]
98+
commit0.harness.save.main(
99+
config.dataset_name,
100+
config.dataset_split,
101+
config.repo_split,
102+
config.base_dir,
103+
organization,
104+
config.branch,
105+
config.github_token,
106+
)
88107

89108

90109
if __name__ == "__main__":

commit0/configs/base.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,6 @@ num_workers: 8
1616
backend: local
1717
branch: ai
1818
timeout: 1_800
19+
20+
# save related
21+
github_token: null

commit0/configs/config_class.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from dataclasses import dataclass
2+
from typing import Optional
23

34

45
@dataclass
@@ -21,3 +22,6 @@ class Commit0Config:
2122
branch: str
2223
# timeout for running pytest
2324
timeout: int
25+
26+
# save related
27+
github_token: Optional[str]

commit0/configs/user.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
defaults:
22
- base
33
- _self_
4+
5+
backend: local

commit0/harness/build.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
from datasets import load_dataset
55
from typing import Iterator
66

7+
from commit0.harness.constants import RepoInstance, SPLIT
78
from commit0.harness.docker_build import build_repo_images
89
from commit0.harness.spec import make_spec
9-
from commit0.harness.constants import RepoInstance, SPLIT
1010

1111
logging.basicConfig(
1212
level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
@@ -15,7 +15,11 @@
1515

1616

1717
def main(
18-
dataset_name: str, dataset_split: str, repo_split: str, num_workers: int
18+
dataset_name: str,
19+
dataset_split: str,
20+
repo_split: str,
21+
num_workers: int,
22+
backend: str,
1923
) -> None:
2024
dataset: Iterator[RepoInstance] = load_dataset(dataset_name, split=dataset_split) # type: ignore
2125
specs = []
@@ -26,8 +30,9 @@ def main(
2630
spec = make_spec(example)
2731
specs.append(spec)
2832

29-
client = docker.from_env()
30-
build_repo_images(client, specs, num_workers)
33+
if backend == "local":
34+
client = docker.from_env()
35+
build_repo_images(client, specs, num_workers)
3136

3237

3338
__all__ = []

commit0/harness/constants.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ class RepoInstance(TypedDict):
1111
test: Dict[str, str]
1212

1313

14+
class Files(TypedDict):
15+
eval_script: Dict[str, Path]
16+
patch: Dict[str, Path]
17+
18+
1419
# Constants - Evaluation Log Directories
1520
BASE_IMAGE_BUILD_DIR = Path("logs/build_images/base")
1621
REPO_IMAGE_BUILD_DIR = Path("logs/build_images/repo")
@@ -34,6 +39,7 @@ class RepoInstance(TypedDict):
3439
"get-tests",
3540
"evaluate",
3641
"evaluate-reference",
42+
"save",
3743
]
3844
# repo splits
3945
SPLIT_MINITORCH = ["minitorch"]

commit0/harness/docker_utils.py

Lines changed: 0 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -140,49 +140,6 @@ def delete_file_from_container(container: Container, file_path: str) -> None:
140140
raise Exception(f"General Error: {str(e)}")
141141

142142

143-
def copy_ssh_pubkey_from_container(container: Container) -> None:
144-
"""Copy the SSH public key from a Docker container to the local authorized_keys file.
145-
146-
Args:
147-
----
148-
container (Container): Docker container to copy the key from.
149-
150-
Raises:
151-
------
152-
docker.errors.APIError: If there is an error calling the Docker API.
153-
Exception: If the file reading or writing process fails.
154-
155-
"""
156-
try:
157-
exit_code, output = container.exec_run("cat /root/.ssh/id_rsa.pub")
158-
if exit_code != 0:
159-
raise Exception(f"Error reading file: {output.decode('utf-8').strip()}")
160-
public_key = output.decode("utf-8").strip()
161-
162-
local_authorized_keys_path = os.path.expanduser("~/.ssh/authorized_keys")
163-
os.makedirs(os.path.dirname(local_authorized_keys_path), exist_ok=True)
164-
if not os.path.exists(local_authorized_keys_path):
165-
# Since the file does not exist, create it
166-
open(local_authorized_keys_path, "a").close()
167-
write = True
168-
else:
169-
with open(local_authorized_keys_path, "r") as authorized_keys_file:
170-
content = authorized_keys_file.read()
171-
if public_key not in content:
172-
write = True
173-
else:
174-
write = False
175-
176-
if write:
177-
with open(local_authorized_keys_path, "a") as authorized_keys_file:
178-
authorized_keys_file.write(public_key + "\n")
179-
180-
except docker.errors.APIError as e:
181-
raise docker.errors.APIError(f"Docker API Error: {str(e)}")
182-
except Exception as e:
183-
raise Exception(f"General Error: {str(e)}")
184-
185-
186143
def write_to_container(container: Container, data: str, dst: Path) -> None:
187144
"""Write a string to a file in a docker container"""
188145
# echo with heredoc to file

0 commit comments

Comments
 (0)