LongLLaMA: Focused Transformer: Contrastive Training for Context Scaling #2173

daboe01 started this conversation in Ideas

daboe01
Jul 11, 2023

https://www.marktechpost.com/2023/07/10/meet-longllama-a-large-language-model-capable-of-handling-long-contexts-of-256k-tokens/?amp
https://arxiv.org/abs/2307.03170
https://github.com/cstankonrad/long_llama

Replies: 2 comments

ghost
Jul 11, 2023

Hi,

It appears models with Focused Transformer(FoT) improve perplexity as context increases up to 64k, is that right?

0 replies

FNsi
Jul 12, 2023

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment