Code and data for the ICSE 2025 paper "HumanEvo: An Evolving-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation"

📝Overview

In this paper, we identify two common flaws in the existing evluation approaches for repository-level code generation.

Future context leakage

Useful context missing

To provide LLMs with a more realistic evaluation scenario, we construct HumanEvo. The follwoing is the construction pipeline of HumanEvo.

🔥Source code

🐳Environment

clone this repository
run conda env create -f environment.yml to create a conda environment named HumanEvo

🔧To run the HumanEvo construct pipeline

HumanEvo_construct
- collect
  - make_repo
    - call_make_repo.py : call make_repo.sh to make mirror repo for a high-quality github repository
    - make_repo.sh
  - build_dataset.py : build intial dataset for validation
  - get_top_pypi.py : get high-quality Python repository
  - print_pulls.py : crawl pull requests from the target repository
  - run_build_dataset.sh
  - utils.py
- get_version
  - extract_web
    - get_version_java.py : get version for each pull request (PR)
    - get_version_python.py : get version for each PR
- validation
  - constans.py
  - context_manager.py
  - engine_validation.py
  - run_validation.sh

Take construct pipeline of HumanEvo-Python for example:

run get_top_pypi.py for high-quality repository
run call_make_repo.py to make mirror repo
run print_pulls.py to crawl pull requests from the target repository
run run_build_dataset.sh to handle the initial PRs
run get_version_python.py to get version number for each crawled PR
run run_validation.sh to invoke engine_validation.py to validate the PRs crawled

After aodingg all this, we can make sure the PR wo get is of high-quality and covered by the project's test fromework.

🎨To run evaluation

The command to run evaluation is in eval/run.sh, you should cd eval and run bash run.sh. Please remember to fill all the needed path before you run the source code.

Here is an example:

python run_eval.py \
    --instances_path "../HumanEvo/HumanEvo_Python.json" \
    --log_dir "./log" \
    --num_workers 1 \
    --path_conda "path/to/your/conda" \
    --testbed "./testbed" \
    --language "python" \
    --timeout 1800 \
    --verbose

Before the program starts evaluation process, It may take a while to clone all the target repositories and creat runtime environments for every signle task instance, both of which may can not be completed in one round, please be patient.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
HumanEvo		HumanEvo
HumanEvo_construct		HumanEvo_construct
evaluation		evaluation
figures		figures
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code and data for the ICSE 2025 paper "HumanEvo: An Evolving-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation"

📝Overview

Future context leakage

Useful context missing

🔥Source code

🐳Environment

🔧To run the HumanEvo construct pipeline

🎨To run evaluation

About

Uh oh!

Releases

Packages

Languages

DeepSoftwareAnalytics/HumanEvo

Folders and files

Latest commit

History

Repository files navigation

Code and data for the ICSE 2025 paper "HumanEvo: An Evolving-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation"

📝Overview

Future context leakage

Useful context missing

🔥Source code

🐳Environment

🔧To run the HumanEvo construct pipeline

🎨To run evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages