Update emoji regex #11584

mrsdizzie · 2020-05-23T14:47:25Z

Replace regex matching with string index matching from our existing list of known emoji. This makes it more performant and now accurately matches two emoji next to each other (previously didn't always know the difference between two emoji and one emoji made up of two separate emoji)

When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other. This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...

6543

looks not bad, hope it does not comsume mouch mem ...
... emoji at all eat mem 😅 throu it should be worth it 🍏

Sort dataset from largest to smallest so that emjois that are made up of a combination of other emojis will match first before matching just one of them

lafriks · 2020-05-26T18:10:27Z

Looks good. Can't we use finding : characters and doing map key lookup for text between them?

mrsdizzie · 2020-05-26T18:20:02Z

@lafriks this one isn't for short codes -- it is for unicode points for literal emoji

The short code one already does do that, finds the text between : and then does a map lookup.

mrsdizzie · 2020-05-28T13:06:48Z

I think this should be back port to 1.12 also since it fixes some code not released yet

zeripath · 2020-05-29T16:07:25Z

make lg-tm work

zeripath · 2020-05-29T16:07:54Z

When you're quite ready LGTM...

zeripath · 2020-05-29T16:09:43Z

@mrsdizzie I think you're right. I've marked this as kind/bug too.

zeripath · 2020-05-29T16:09:58Z

Please send backport

When matching emoji, use a regex built from the data we have instead of something generic using unicode ranges. A generic regex can't tell the difference between two separate emoji next to each other or one emoji that is built out of two separate emoji next to each other. This means that emoji that are next to each other without space in between will be now accurately spanned individually with proper title etc...

6543 approved these changes May 23, 2020

View reviewed changes

GiteaBot added the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label May 23, 2020

zeripath and others added 3 commits May 23, 2020 16:45

Merge branch 'master' into emoji-regex

695db61

Fix matching

8b55257

Sort dataset from largest to smallest so that emjois that are made up of a combination of other emojis will match first before matching just one of them

Merge branch 'master' into emoji-regex

dadae7f

jolheiser added the type/enhancement An improvement of existing functionality label May 24, 2020

jolheiser added this to the 1.13.0 milestone May 24, 2020

Use string methods instead of regex for less resource use

72503fc

lafriks approved these changes May 26, 2020

View reviewed changes

GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels May 26, 2020

Merge branch 'master' into emoji-regex

ea4b203

techknowlogick and others added 4 commits May 29, 2020 00:01

Merge branch 'master' into emoji-regex

950af60

Merge branch 'master' into emoji-regex

6c85f82

Merge branch 'master' into emoji-regex

ca478e7

Merge branch 'master' into emoji-regex

ebb218c

zeripath merged commit 4c1ff57 into go-gitea:master May 29, 2020

zeripath added the backport/v1.12 label May 29, 2020

zeripath added the type/bug label May 29, 2020

mrsdizzie mentioned this pull request May 29, 2020

Update emoji regex (#11584) #11679

Merged

zeripath added the backport/done All backports for this PR have been created label May 29, 2020

go-gitea locked and limited conversation to collaborators Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update emoji regex #11584

Update emoji regex #11584

Uh oh!

mrsdizzie commented May 23, 2020 •

edited

Loading

Uh oh!

6543 left a comment •

edited

Loading

Uh oh!

lafriks commented May 26, 2020

Uh oh!

mrsdizzie commented May 26, 2020

Uh oh!

mrsdizzie commented May 28, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

Uh oh!

Uh oh!

Update emoji regex #11584

Update emoji regex #11584

Uh oh!

Conversation

mrsdizzie commented May 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

6543 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lafriks commented May 26, 2020

Uh oh!

mrsdizzie commented May 26, 2020

Uh oh!

mrsdizzie commented May 28, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

zeripath commented May 29, 2020

Uh oh!

Uh oh!

mrsdizzie commented May 23, 2020 •

edited

Loading

6543 left a comment •

edited

Loading