Skip to content

Code and data for the ICSE 2025 paper "HumanEvo: An Evolving-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation"

Notifications You must be signed in to change notification settings

DeepSoftwareAnalytics/HumanEvo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code and data for the ICSE 2025 paper "HumanEvo: An Evolving-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation"

📝Overview

In this paper, we identify two common flaws in the existing evluation approaches for repository-level code generation.

Future context leakage

Issues of future context leakage and useful context missing

Useful context missing

Issues of future context leakage and useful context missing

To provide LLMs with a more realistic evaluation scenario, we construct HumanEvo. The follwoing is the construction pipeline of HumanEvo.

Construction pipeline of HumanEvo

🔥Source code

🐳Environment

  1. clone this repository
  2. run conda env create -f environment.yml to create a conda environment named HumanEvo

🔧To run the HumanEvo construct pipeline

  • HumanEvo_construct
    • collect
      • make_repo
        • call_make_repo.py : call make_repo.sh to make mirror repo for a high-quality github repository
        • make_repo.sh
      • build_dataset.py : build intial dataset for validation
      • get_top_pypi.py : get high-quality Python repository
      • print_pulls.py : crawl pull requests from the target repository
      • run_build_dataset.sh
      • utils.py
    • get_version
      • extract_web
        • get_version_java.py : get version for each pull request (PR)
        • get_version_python.py : get version for each PR
    • validation
      • constans.py
      • context_manager.py
      • engine_validation.py
      • run_validation.sh

Take construct pipeline of HumanEvo-Python for example:

  1. run get_top_pypi.py for high-quality repository
  2. run call_make_repo.py to make mirror repo
  3. run print_pulls.py to crawl pull requests from the target repository
  4. run run_build_dataset.sh to handle the initial PRs
  5. run get_version_python.py to get version number for each crawled PR
  6. run run_validation.sh to invoke engine_validation.py to validate the PRs crawled

After aodingg all this, we can make sure the PR wo get is of high-quality and covered by the project's test fromework.

🎨To run evaluation

The command to run evaluation is in eval/run.sh, you should cd eval and run bash run.sh. Please remember to fill all the needed path before you run the source code.

Here is an example:

python run_eval.py \
    --instances_path "../HumanEvo/HumanEvo_Python.json" \
    --log_dir "./log" \
    --num_workers 1 \
    --path_conda "path/to/your/conda" \
    --testbed "./testbed" \
    --language "python" \
    --timeout 1800 \
    --verbose

Before the program starts evaluation process, It may take a while to clone all the target repositories and creat runtime environments for every signle task instance, both of which may can not be completed in one round, please be patient.

About

Code and data for the ICSE 2025 paper "HumanEvo: An Evolving-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published