Skip to content

DeepSoftwareAnalytics/RepoTransBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📂 RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation

📄 Here, we anonymously provide the data, automation scripts, prompt templates, and experimental results of RepoTransBench.

In this paper, we introduce a real-world benchmark for repository-level code translation.

📦 Repository Dataset: Download the repository dataset from RepositoryDataset and use the command tar -zxvf python_repos.tar.gz to extract the dataset to the ./repos directory.

🔬 Experimental Results: Download the experimental results from ExperimentalResults and use the command tar -zxvf experiment_results.tar.gz to extract the result files.

🔧 Research Questions: The research questions results and corresponding scripts are available at the ./RQ directory.


Translation Level

Translation Level

Translation Performance

Model Success@1 Success@2 Success@3 Build@1 Build@2 Build@3 APR
Llama-3.1-8B-Inst 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Llama-3.1-70B-Inst 1.33% 2.33% 3.00% 2.67% 4.33% 6.00% 1.30%
Llama-3.1-405B-Inst 2.67% 3.33% 4.00% 5.67% 8.00% 10.00% 4.70%
DeepSeek-V2.5 3.00% 4.67% 6.00% 12.00% 17.00% 20.00% 6.20%
GPT-3.5-Turbo 0.67% 1.00% 1.00% 2.33% 4.00% 5.00% 1.10%
GPT-4 2.33% 3.33% 4.00% 4.33% 7.00% 9.00% 2.00%
GPT-4o 4.00% 6.33% 8.00% 9.00% 14.67% 19.00% 6.40%
Claude-3.5-Sonnet 7.33% 10.33% 12.00% 28.33% 37.67% 42.00% 16.50%
CodeLlama-34B-Inst 0.00% 0.00% 0.00% 0.37% 0.67% 1.00% 0.00%
Codestral-22B 2.08% 3.33% 5.00% 5.90% 8.33% 12.00% 2.60%
DeepSeek-Coder-V2-Inst 4.86% 6.33% 7.00% 16.84% 20.33% 24.00% 8.40%

Debugging Performance

Debug Results


⚠️ If you want to obtain the results from scratch, please follow these steps:

🛠️ Set-Up: Download the docker container from Docker4RepoTransBench and load it to construct your docker environment.

🚀 Evaluation

The evaluation command is as follows, we provide examples for GPT-4o:

# Translation and debugging
python main.py \
    --enable_translate \
    --model_name 'GPT-4o' \
    --enable_debug \
    --debug_mode 'filter'
# Translation only
python main.py \
    --enable_translate \
    --model_name 'GPT-4o'
# Debugging only
python main.py \
    --model_name 'GPT-4o' \
    --enable_history '' \ 
    --history_time '' \  # History time of translation results
    --enable_debug \
    --debug_mode 'filter'

Citation

If you find this benchmark or dataset helpful, please cite us:

@article{wang2024repotransbench,
  title={RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation},
  author={Wang, Yanli and Wang, Yanlin and Wang, Suiquan and Guo, Daya and Chen, Jiachi and Grundy, John and Liu, Xilin and Ma, Yuchi and Mao, Mingzhi and Zhang, Hongyu and others},
  journal={arXiv preprint arXiv:2412.17744},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages