Skip to content

apple/ml-tic-lm

Repository files navigation

TiC-LM

This software project accompanies the following research paper:

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining, Li, J., Armandpour, M., Mirzadeh, I., Mehta, S., Shankar, V., Vemulapalli, R., Bengio, S., Tuzel, O., Farajtabar, M., Pouransari H., and Faghri, F., ArXiv preprint, 2025.

Overview

We provide a high-level overview of the structure and main components (i.e., dataset creation, training, evaluation) of this codebase:

Each of these folders contains its own instructions for setting up environments and how to use their code. We defer more specific details about each to their specific READMEs.

Note on Data

This benchmark provides scripts for reproducing the training/evaluation data from publicly available sources. It does not redistribute any original data. All data accessed for the creation of these scripts was obtained prior to August 2024. The data sourced from Uncyclopedia, StackExchange, and code repositories, is used solely for the purpose of benchmark construction and evaluation, and is not used for training any models. Users are responsible for adhering to the terms of service and licensing agreements of the respective data sources. Please be aware that data from public sources can change over time, which may affect the reproducibility of this benchmark. We recommend reporting standard deviations for multiple training and evaluations. The scripts provided in this benchmark are released under the ASCL license.

Acknowledgements

Our codebase is built using multiple open source contributions, please see ACKNOWLEDGEMENTS for more details.

License

This software and accompanying data and models have been released under the following licenses:

Citation

If you find this repository useful or use this code in your research, please cite the following paper:

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining, Li, J., Armandpour, M., Mirzadeh, I., Mehta, S., Shankar, V., Vemulapalli, R., Bengio, S., Tuzel, O., Farajtabar, M., Pouransari H., and Faghri, F., ArXiv preprint, 2025.

@article{li2025ticlm,
  title={TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining},
  author={Li, Jeffrey and Armandpour, Mohammadreza and Mirzadeh, Iman and Mehta, Sachin and Shankar, Vaishaal and Vemulapalli Raviteja and Bengio, Samy and Tuzel, Oncel and Farajtabar, Mehrdad and Pouransari, Hadi and Faghri, Fartash},
  journal={arXiv preprint arXiv:2504.02107},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published