📂 RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation

📄 Here, we anonymously provide the data, automation scripts, prompt templates, and experimental results of RepoTransBench.

In this paper, we introduce a real-world benchmark for repository-level code translation.

📦 Repository Dataset: Download the repository dataset from RepositoryDataset and use the command tar -zxvf python_repos.tar.gz to extract the dataset to the ./repos directory.

🔬 Experimental Results: Download the experimental results from ExperimentalResults and use the command tar -zxvf experiment_results.tar.gz to extract the result files.

🔧 Research Questions: The research questions results and corresponding scripts are available at the ./RQ directory.

Translation Level

Translation Performance

Model	Success@1	Success@2	Success@3	Build@1	Build@2	Build@3	APR
Llama-3.1-8B-Inst	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%
Llama-3.1-70B-Inst	1.33%	2.33%	3.00%	2.67%	4.33%	6.00%	1.30%
Llama-3.1-405B-Inst	2.67%	3.33%	4.00%	5.67%	8.00%	10.00%	4.70%
DeepSeek-V2.5	3.00%	4.67%	6.00%	12.00%	17.00%	20.00%	6.20%
GPT-3.5-Turbo	0.67%	1.00%	1.00%	2.33%	4.00%	5.00%	1.10%
GPT-4	2.33%	3.33%	4.00%	4.33%	7.00%	9.00%	2.00%
GPT-4o	4.00%	6.33%	8.00%	9.00%	14.67%	19.00%	6.40%
Claude-3.5-Sonnet	7.33%	10.33%	12.00%	28.33%	37.67%	42.00%	16.50%
CodeLlama-34B-Inst	0.00%	0.00%	0.00%	0.37%	0.67%	1.00%	0.00%
Codestral-22B	2.08%	3.33%	5.00%	5.90%	8.33%	12.00%	2.60%
DeepSeek-Coder-V2-Inst	4.86%	6.33%	7.00%	16.84%	20.33%	24.00%	8.40%

Debugging Performance

⚠️ If you want to obtain the results from scratch, please follow these steps:

🛠️ Set-Up: Download the docker container from Docker4RepoTransBench and load it to construct your docker environment.

🚀 Evaluation

The evaluation command is as follows, we provide examples for GPT-4o:

# Translation and debugging
python main.py \
    --enable_translate \
    --model_name 'GPT-4o' \
    --enable_debug \
    --debug_mode 'filter'

# Translation only
python main.py \
    --enable_translate \
    --model_name 'GPT-4o'

# Debugging only
python main.py \
    --model_name 'GPT-4o' \
    --enable_history '' \ 
    --history_time '' \  # History time of translation results
    --enable_debug \
    --debug_mode 'filter'

Citation

If you find this benchmark or dataset helpful, please cite us:

@article{wang2024repotransbench,
  title={RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation},
  author={Wang, Yanli and Wang, Yanlin and Wang, Suiquan and Guo, Daya and Chen, Jiachi and Grundy, John and Liu, Xilin and Ma, Yuchi and Mao, Mingzhi and Zhang, Hongyu and others},
  journal={arXiv preprint arXiv:2412.17744},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
RQ		RQ
asset		asset
prompts		prompts
repos		repos
utils		utils
.gitattributes		.gitattributes
README.md		README.md
debuger.py		debuger.py
docker.py		docker.py
generator.py		generator.py
main.py		main.py
translator.py		translator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📂 RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation

Translation Level

Translation Performance

Debugging Performance

🚀 Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

DeepSoftwareAnalytics/RepoTransBench

Folders and files

Latest commit

History

Repository files navigation

📂 RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation

Translation Level

Translation Performance

Debugging Performance

🚀 Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages