Skip to content

Update emoji regex #11584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 29, 2020
Merged

Update emoji regex #11584

merged 10 commits into from
May 29, 2020

Conversation

mrsdizzie
Copy link
Member

@mrsdizzie mrsdizzie commented May 23, 2020

Replace regex matching with string index matching from our existing list of known emoji. This makes it more performant and now accurately matches two emoji next to each other (previously didn't always know the difference between two emoji and one emoji made up of two separate emoji)

When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other.

This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
Copy link
Member

@6543 6543 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks not bad, hope it does not comsume mouch mem ...
... emoji at all eat mem 😅 throu it should be worth it 🍏

@GiteaBot GiteaBot added the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label May 23, 2020
zeripath and others added 3 commits May 23, 2020 16:45
Sort dataset from largest to smallest so that emjois that are made up of a combination of other emojis will match first before matching just one of them
@jolheiser jolheiser added the type/enhancement An improvement of existing functionality label May 24, 2020
@jolheiser jolheiser added this to the 1.13.0 milestone May 24, 2020
@lafriks
Copy link
Member

lafriks commented May 26, 2020

Looks good. Can't we use finding : characters and doing map key lookup for text between them?

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels May 26, 2020
@mrsdizzie
Copy link
Member Author

@lafriks this one isn't for short codes -- it is for unicode points for literal emoji

The short code one already does do that, finds the text between : and then does a map lookup.

@mrsdizzie
Copy link
Member Author

I think this should be back port to 1.12 also since it fixes some code not released yet

@zeripath
Copy link
Contributor

make lg-tm work

@zeripath
Copy link
Contributor

When you're quite ready LGTM...

@zeripath zeripath merged commit 4c1ff57 into go-gitea:master May 29, 2020
@zeripath
Copy link
Contributor

@mrsdizzie I think you're right. I've marked this as kind/bug too.

@zeripath
Copy link
Contributor

Please send backport

mrsdizzie added a commit to mrsdizzie/gitea that referenced this pull request May 29, 2020
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other.

This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
@zeripath zeripath added the backport/done All backports for this PR have been created label May 29, 2020
zeripath pushed a commit that referenced this pull request May 29, 2020
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other.

This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
ydelafollye pushed a commit to ydelafollye/gitea that referenced this pull request Jul 31, 2020
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other.

This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants