Skip to content

GPTNeoX: Use static rotary embedding #1498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 1, 2024

Conversation

dwyatte
Copy link
Contributor

@dwyatte dwyatte commented Jan 26, 2024

What does this PR do?

transformers 4.35 removed rotary embeddings from GPTNeoX's weights (link to line diff). This applies the same fix as #793 which generates them on-the-fly using the appropriate value from the config file

Fixes #1460

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@OlivierDehaene OR @Narsil

@drbh
Copy link
Collaborator

drbh commented Jan 29, 2024

Hi @dwyatte thank you for the contribution, unfortunately this change appears to break the neox tests.

tested locally with:

cd integration-tests/
pytest -s -vv models/test_flash_neox.py

**note you may have to comment out the skip flags for some tests

would you be able to review these changes and ensure they pass the tests. Thank you!!

@dwyatte
Copy link
Contributor Author

dwyatte commented Jan 30, 2024

@drbh thanks for the reply

Any chance there is a guide to developing using TGI's docker image to manage all dependencies when running the integration tests? I've gotten pretty far on compiling everything but it seems like a never ending rabbit hole (on to Flash Attention now and it even recommended using TGI's image)

2024-01-30T00:04:51.061971Z  WARN text_generation_launcher: Could not import Flash Attention enabled models: Flash Attention is not installed.
Use the official Docker image (ghcr.io/huggingface/text-generation-inference:latest) or install flash attention with `cd server && make install install-flash-attention`

I can probably mount my code / integration tests, but I'm probably missing something easy (e.g., just reading through source code, maybe setting the env var DOCKER_IMAGE)

Update: I was able to get a dev env set up and some of these tests are broken on main. I think probably the snapshots weren't updated. Failing assertions:

assert all(
[text == generated_texts[0] for text in generated_texts]
), generated_texts
assert responses == response_snapshot

@dwyatte
Copy link
Contributor Author

dwyatte commented Jan 30, 2024

Ok @drbh 871e5e7 fixes the actual rotary embedding logic (thanks for flagging) but I suspect there might still be failures in the local tests due to stale snapshots since they fail on main too

Wondering if we should update these as part of the PR

@drbh
Copy link
Collaborator

drbh commented Jan 30, 2024

awesome thank you for resolving @dwyatte. I actually think those tests have been flakey for some time now and neox is not a current priority, so we can continue to skip them for now. Overall the PR looks good and has reasonable responses when tested locally but I think we should get a quick second look before approving.

@Narsil what do you think about approving when tests are flakey/skipped?

Thank you both!

@Narsil
Copy link
Collaborator

Narsil commented Jan 31, 2024

Thanks a lot for this contribution.

We' ll take the contribution as-is. It possibly breaks some stuff but overall is more aligned with other modeling code, therefore i' m ok to add it.
Neox non flash is not tested, but this is neox flash

@dwyatte
Copy link
Contributor Author

dwyatte commented Jan 31, 2024

Thanks @Narsil. Happy to try contributing fixes for other breakage including tests as time allows, but I do need GPTNeoX flash support so this was the minimum change and aligns with with the static embedding use in other models as you mention. Appreciate the approval!

@Narsil Narsil merged commit 13c62be into huggingface:main Feb 1, 2024
kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request Apr 29, 2024
# What does this PR do?

`transformers` 4.35 removed rotary embeddings from GPTNeoX's weights
([link to line
diff](huggingface/transformers@253f9a3#diff-0e2a05d86c82e96f516db8c14070ceb36f53ca44c6bc21a9cd92ad2e777b9cf1R298)).
This applies the same fix as
huggingface#793 which
generates them on-the-fly using the appropriate value from the config
file

Fixes
huggingface#1460

## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [x] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

@OlivierDehaene OR @Narsil
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem loading model with GPTNeoX architecture (weight gpt_neox.layers.0.attention.rotary_emb.inv_freq does not exist)
3 participants