Skip to content
Arun Raja edited this page Jul 25, 2021 · 22 revisions

GPT-Neo

GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models by Eleuther Ai, utilizing Mesh Tensorflow for distributed support and specially designed for TPUs.

Causal Language Modelling

Causal language modelling is the task of predicting the token following a sequence of tokens. In this situation, the model only attends to the left context (tokens on the left of the mask) (HuggingFace (n.d.)) and thus is useful for generation tasks.

Gradient Accumulation

GPT-Neo architecture

Model base

  • To view the model cards please click the links provided in the Modelcolumn below
Model Dataset Used pass@1 pass@2 pass@5 pass@10
gpt-neo-125M The Pile 0.12% 0.24% 0.61% 1.22%
gpt-neo-125M APPS (Train) 0.06% 0.12% 0.30% 0.61%
gpt-neo-125M APPS (Train + Test) TBD...
gpt-neo-1.3B APPS (Train) TBD...
gpt-neo-1.3B APPS (Train + Test) Desc...
gpt-neo-125M Code Clippy Data 0.00% 0.00% 0.00% 0.00%
gpt-neo-125M Code Clippy Data (Deduplicated) 0.00% 0.00% 0.00% 0.00%
gpt-neo-125M Code Search Net Challenge (All) 0.00% 0.00% 0.00% 0.00%
gpt-neo-125M Code Search Net Challenge (Python) 0.00% 0.00% 0.00% 0.00%

Page Directory

Clone this wiki locally