Skip to content

Fixing frequency penalty #1811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 30, 2024
Merged

Conversation

martinigoyanes
Copy link
Contributor

@martinigoyanes martinigoyanes commented Apr 25, 2024

Thank you so much for the work you are doing, this is my little contribution to this great thing you have built. I hope it is useful and helpful, please don't hesitate to discuss any matters that are not clear!

I am basing my implementation of frequency penalty on OpenAI's implementation: https://platform.openai.com/docs/guides/text-generation/parameter-details

The problem I see with TGI's current implementation is that is not taking into account the frequency of tokens which have already been sampled in the current generation stream. Also, the scaling is of the adjusted token logits is done differently for positive and negative logits. While in OpenAI's implementation token frequency is taking into account and the scaling is always done with a subtraction (if penalty is positive) or add operation (if penalty is negative).

This leads to corrupt generations as I mentioned in issue #1810 . Moreover, after my tests, other issues are also gone like the one about some request's with penalty_frequency = 1.0 overruling other requests (with frequency_penalty = 0.0) in the same batch and therefore corrupting all generations in the batch. Basically, padding does not affect this implementation so I believe this score *= input_ids.ne(0) is not needed anymore.

Frequency penalty -1.0 0.0 1.0
Before my change https://paste.mozilla.org/JxqGJkWY https://paste.mozilla.org/hrztJ56h https://paste.mozilla.org/pBSEH2zw
After my change https://paste.mozilla.org/7gXCi7zo https://paste.mozilla.org/ZR9rJ92g https://paste.mozilla.org/gHaD2YnC

@drbh drbh self-assigned this Apr 26, 2024
@martinigoyanes
Copy link
Contributor Author

Hey I noticed the links to the generation examples were broken, so I have updated them!

@martinigoyanes
Copy link
Contributor Author

martinigoyanes commented Apr 30, 2024

Hey @drbh I saw you created a new branch off of this one, why is that?

Btw, I have rebased and fixed the styling! Could you guys maybe take a look and give me your feedback? @drbh @Narsil @OlivierDehaene

This is the commit with my changes 949b889

Are you guys comfortable with merging this branch?

Thank you!

@Narsil
Copy link
Collaborator

Narsil commented Apr 30, 2024

We create branches to run the CI because our secrets will not be available on forks (and we need them to run the integration tests unfortunately).

The rebase created a lot of bogus commits, any way you could remove them ? I'm happy to help with the rebase if you want.

Thanks a lot for the fix !

@Narsil Narsil force-pushed the fix-frequency-penalty branch from fa9713d to 21ec539 Compare April 30, 2024 10:12
@Narsil
Copy link
Collaborator

Narsil commented Apr 30, 2024

I took the liberty of doing the rebase so your fix could be included in the upcoming release.

@Narsil Narsil merged commit 9192de5 into huggingface:main Apr 30, 2024
4 of 8 checks passed
@martinigoyanes
Copy link
Contributor Author

Thank you so much @Narsil ! Okay now I understand why the fork happened, keep up the great work you guys are doing with TGI!

(and sorry for the mess I created with my rebase :/ )

@Narsil
Copy link
Collaborator

Narsil commented Apr 30, 2024

No worries :) Cheers

Nilabhra pushed a commit to TII-AI-Research-Center/text-generation-inference that referenced this pull request May 14, 2024
Thank you so much for the work you are doing, this is my little
contribution to this great thing you have built. I hope it is useful and
helpful, please don't hesitate to discuss any matters that are not
clear!

I am basing my implementation of frequency penalty on OpenAI's
implementation:
https://platform.openai.com/docs/guides/text-generation/parameter-details

The problem I see with TGI's current implementation is that is not
taking into account the frequency of tokens which have already been
sampled in the current generation stream. Also, the scaling is of the
adjusted token logits is done differently for positive and negative
logits. While in OpenAI's implementation token frequency is taking into
account and the scaling is always done with a subtraction (if penalty is
positive) or add operation (if penalty is negative).

This leads to corrupt generations as I mentioned in issue huggingface#1810 .
Moreover, after my tests, other issues are also gone like the one about
some request's with ``penalty_frequency = 1.0`` overruling other
requests (with ``frequency_penalty = 0.0``) in the same batch and
therefore corrupting all generations in the batch. Basically, padding
does not affect this implementation so I believe this ``score *=
input_ids.ne(0)`` is not needed anymore.



Frequency penalty | -1.0 | 0.0 | 1.0
-- | -- | -- | --
Before my change | https://paste.mozilla.org/JxqGJkWY |
https://paste.mozilla.org/hrztJ56h | https://paste.mozilla.org/pBSEH2zw
After my change | https://paste.mozilla.org/7gXCi7zo |
https://paste.mozilla.org/ZR9rJ92g | https://paste.mozilla.org/gHaD2YnC

---------

Co-authored-by: martini <[email protected]>
kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request Jun 10, 2024
Thank you so much for the work you are doing, this is my little
contribution to this great thing you have built. I hope it is useful and
helpful, please don't hesitate to discuss any matters that are not
clear!

I am basing my implementation of frequency penalty on OpenAI's
implementation:
https://platform.openai.com/docs/guides/text-generation/parameter-details

The problem I see with TGI's current implementation is that is not
taking into account the frequency of tokens which have already been
sampled in the current generation stream. Also, the scaling is of the
adjusted token logits is done differently for positive and negative
logits. While in OpenAI's implementation token frequency is taking into
account and the scaling is always done with a subtraction (if penalty is
positive) or add operation (if penalty is negative).

This leads to corrupt generations as I mentioned in issue huggingface#1810 .
Moreover, after my tests, other issues are also gone like the one about
some request's with ``penalty_frequency = 1.0`` overruling other
requests (with ``frequency_penalty = 0.0``) in the same batch and
therefore corrupting all generations in the batch. Basically, padding
does not affect this implementation so I believe this ``score *=
input_ids.ne(0)`` is not needed anymore.



Frequency penalty | -1.0 | 0.0 | 1.0
-- | -- | -- | --
Before my change | https://paste.mozilla.org/JxqGJkWY |
https://paste.mozilla.org/hrztJ56h | https://paste.mozilla.org/pBSEH2zw
After my change | https://paste.mozilla.org/7gXCi7zo |
https://paste.mozilla.org/ZR9rJ92g | https://paste.mozilla.org/gHaD2YnC

---------

Co-authored-by: martini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants