-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Update emoji regex #11584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update emoji regex #11584
Conversation
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other. This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks not bad, hope it does not comsume mouch mem ...
... emoji at all eat mem 😅 throu it should be worth it 🍏
Sort dataset from largest to smallest so that emjois that are made up of a combination of other emojis will match first before matching just one of them
Looks good. Can't we use finding |
@lafriks this one isn't for short codes -- it is for unicode points for literal emoji The short code one already does do that, finds the text between : and then does a map lookup. |
I think this should be back port to 1.12 also since it fixes some code not released yet |
make lg-tm work |
When you're quite ready LGTM... |
@mrsdizzie I think you're right. I've marked this as kind/bug too. |
Please send backport |
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other. This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other. This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other. This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...
Replace regex matching with string index matching from our existing list of known emoji. This makes it more performant and now accurately matches two emoji next to each other (previously didn't always know the difference between two emoji and one emoji made up of two separate emoji)