Skip to content

Commit 352d199

Browse files
committed
feature #53096 [Intl] [Emoji] Move emoji data in a new component (smnandre)
This PR was merged into the 7.1 branch. Discussion ---------- [Intl] [Emoji] Move emoji data in a new component | Q | A | ------------- | --- | Branch? | 7.1 | Bug fix? | no | New feature? | yes | Deprecations? | yes | Issues | | License | MIT This PR move all the emoji data & code from the Intl component into its own new Emoji component. Objectives/reasons: * reduce the size of a standard "--webapp" install * allow usage of intl (required if your app uses those validators/form types: BIC, country, currency, language, locale, timezone) without downloading all the emoji data * allow usage of Emoji without downloading all the Intl data Thanks to all the reviewers for the feedbacks, opinions, advices ❤️ --- --- Original (obselete) post below --- This PR move all the emoji data & code from the Intl component into its own new IntlEmoji component. ... and hopefully open a debate aboute the future of the Intl component, its role, and the way we handle "data" in the framework and the repositories > [!IMPORTANT] > 🎙️ DISCLAIMER: This PR contains both metrics and opinions. The metrics were collected this morning in a neutral and transparent manner, ensuring they can be [reproduced by anyone](https://gist.github.com/smnandre/38b081253fc8813a5e638661650b01f6). However, the opinions I present here are just that – opinions, and should not be interpreted as objective truths or claims of fact. Update: details/summary added to improve the readability <details> <summary> ## What is symfony/intl ? </summary> Repository: https://github.com/symfony/intl Documentation: https://symfony.com/doc/current/components/intl.html Unicode: https://home.unicode.org/technical-quick-start-guide/ ### Responsabilities Currently, it seems to me this component: * provides a polifill-ish layer for the intl PHP extension * provides exhaustive formatting + locale data for countries, currencies, date, ... * provides methods to check ISO codes or identifiers * provides a dictionnary emoji -> description for every possible combinaison ### Opiniated remarks Two comments highlight the "blurred lines" I believe this component navigates: #### 1) Access or data ? > This component provides **access** to the localization **data** of the ICU library. Maybe my english is in fault, but it seems to me it does not provides access to the data... it provides the data. #### 2) Unicode = CLDR + ICU + UTC > This component provides access to the localization data of the **ICU library**. CLDR (where the emoji data comes from) is not in the ICU library. I'm quibbling over details here, i know. But i think that illustrates the volatile "scope" and "responsabilities" of this component. So we come close to the problem... </details> <details> <summary> ## Symfony/Intl is massive </summary> The data included in the Intl component is massive (especially the emoji descriptions), and will grow more every semester. I looked at the following cases * source code * symfony/symfony : https://github.com/symfony/symfony/archive/refs/heads/7.0.zip * symfony/intl : https://github.com/symfony/intl/archive/refs/heads/7.0.zip * installed * standalone: new folder + composer require symfony/intl * webapp: new folder + symfony create --webapp Versions: * 6.0 6.1 6.2 6.3 6.4 7.0 * Emoji data were added in 6.2 ### Some metrics... | | Size (zip) | Size (unzipped) | Files | PHP Files | | - | - | - | - | - | | symfony/symfony | 12.91 | 58.6 | 6006 | 4729 | | symfony/intl | 7.5 | 43.2 | 1517 | 1487 | | % | 58.1% | 73.8% | 25.3% | 31.5% | So symfony/intl accounts for 30% of the files in the monorepo ... and nearly **75% of its total disk size**. ### ...over time **Size (in MB) of the sources** | | 6.0 | 6.1 | 6.2 | 6.3 | 6.4 | 7.0 | |--------|-----|-----|-----|-----|-----|-----| | intl |15.1 |15.1 |41.8 |41.9 |43.2 |43.2 | | symfony|28.8 |29.1 |56.5 |58.1 |59.8 |58.6 | It was already big in previous versions, but since the emoji data integration, it's off charts. The symfony/intl alone is twice as big as: all the other components, all the bridges and all the bundles. Combined. And it's not over. At all. ### Why it'll grow more The ICU components used in the component are well-defined and constrained by 'real-world' factors, so we can expect minor changes regarding countries, formatting data, etc. It's unlikely, for instance, that 200 new countries will suddenly emerge in 2024. However, emojis may present a major challenge in the near future. New ones are added with every CLDR release. Except for a significant drop (like the upcoming 2000 hieroglyphs), this should be a gradual increase. What bothers me more is the 'combinatorial nature' of these descriptions. We generate a line of text for every combination. And that's why this component is so large. But it's just the beginning of what could be exponential growth. As of today, the 'hand emoji' has variations for skin color (I'm not certain, but let's say there are 6 possible colors), and emojis with multiple people often vary by gender ('boy and two girls'). In the upcoming release, a new variable is the concept of 'left-handed' versus 'right-handed'. So, we'll create a new line for every existing emoji with a visible hand. But we'll need way more than just a new line, because of every emoji where two hands are visible. I don't remember if it's already implemented, but there was discussion about including the same thing for the age of a person, or some hairstyles. So, the symfony/intl component could very soon be 50GB, and a short while later 10^80TB. But there's no way it reduces in size... or even slows its growth. </details> <details> <summary> ## And.... where is the problem ? </summary> I see negative effect on three very different layers. ### Developper Experience Whether these values are low or not in absolute terms (and I have no doubt that everyone will have their own opinion on this)... the reality is that users are downloading a component that is twice as heavy as all the others combined... and this inevitably affects installation times, bandwidth, update times, static analysis, IDE indexing, etc. A prime example is Docker on macOS, which was a real pain until recently with Orbstack, and the performance nightmare was directly related to the number of files mounted in a volume. ### Contributor experience I've lost count of how many times I've seen a contributor propose a feature only to be told: it's userland. (Full disclosure, I understand and share this point of view). But it can be frustrating to see closed doors for a few classes, while at the same time Symfony contains hundreds of lines like 'young woman with dark hair and kid' ### Real world consequences: Ecological & financial costs I have no desire to open a debate (on either of those topics). But again, these small things have real-world consequences. We are talking about Symfony, so the impact is enormous, even on small matters ### What is the real impact ? Downloads data, as provided by packagist (collected today) | Package | install total | install (last 30 days) | |---------|---------------|------------------------| | symfony/intl | 111,000,000 | 2,800,000 | | [symfony/http-foundation](https://packagist.org/packages/symfony/http-foundation) | 528,000,000 | 11,250,000 | | symfony/console | 678,000,000 | 12,900,000 | | symfony/console | 678,000,000 | 12,900,000 | | symfony/translation | 508,000,000 | 10,400,000 | Let's agree on: "it's not without an impact". </details> <details> <summary> ## Why is it used ? </summary> ### For its quality Please don't misinterpret my message. I'm not criticizing the value of the component or questioning its qualities. Besides, my opinion wouldn't have any value for that matter anyway. And i'm absolutely convinced a lot of people decide to install this component knowing what they do. ### For another reason But there are also people who install... Symfony. The recommended installation procedure, as outlined in the documentation on the website, is to install the web application skeleton, which requires symfony/intl. To revisit the argument from earlier, I'm really not sure if anyone realizes after installation why its vendors directory is 80MB and what it's used for (young woman with...). I'm unsure why symfony/intl is included by default in a new project, while other components are not. As a developer, I would appreciate the ability to install a small, lightweight application or to have more packages for the same amount of overhead :) ### For another reason (bis) "There is a third reason, and once again, I'm not fully understanding the situation (and may not have all the backstory required for it). The Country validator requires symfony/intl to validate a given string as a valid ISO alpha country code. To do this, it tries to retrieve the list of country names (indexed by code) from the locale data. Consequently, it's not possible to use BIC, Country, Currency, and probably others without symfony/intl. So, if a developer installs symfony/validator and then wants to validate a BIC, they cannot do so without downloading 80MB of locale-specific data. Wouldn't it be simpler to have a couple of ISO classes/methods in the Validator component? Or perhaps create a small component just for that purpose? Because having to parse giant files just to check if "FR" is a valid ISO country code seems quite inefficient to me. </details> ## Well: Suggestions So, personal conclusion and some suggestions.. ### TODO list Sooner * move "emoji" out of Symfony\Intl * remove "intl" from symfony/webapp * fix the validator requirements Later * move "intl" out of the monorepo * create a distinct class/component to handle iso lists ### Discussions - [x] Handle the BC layer (require symfony/intl-emoji in symfony/intl ?) - [x] Find a way to handle the exponential growth of those files -- Open to any feedback :) Commits ------- f5ba7e31fa Move & adapt "emoji code" from Intl into its own component
2 parents 914f94c + 4a8d00e commit 352d199

File tree

166 files changed

+28
-490892
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

166 files changed

+28
-490892
lines changed

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
11
CHANGELOG
22
=========
33

4+
7.1
5+
---
6+
7+
* Move all emoji code & data to a new `symfony/emoji` component
8+
* Deprecate `EmojiTransliterator` in favor of `Symfony\Component\Emoji\EmojiTransliterator`
9+
410
6.4
511
---
612

7-
* Add support for ISO-3166-1 numeric codes with `Countries::getNumericCode()`, `Countries::getNumericCodes()`,
13+
* Add support for ISO-3166-1 numeric codes with `Countries::getNumericCode()`, `Countries::getNumericCodes()`,
814
`Countries::numericCodeExists()` and `Countries::getAlpha2FromNumeric()`
915

1016
6.3

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@ Intl Component
33

44
The Intl component provides access to the localization data of the ICU library.
55

6+
If you have the zlib extension enabled, you can compress the data by running:
7+
8+
php vendor/symfony/intl/Resources/bin/compress
9+
610
Resources
711
---------
812

Resources/data/transliterator/emoji/emoji-af.php

Lines changed: 0 additions & 3663 deletions
This file was deleted.

Resources/data/transliterator/emoji/emoji-am.php

Lines changed: 0 additions & 3663 deletions
This file was deleted.

Resources/data/transliterator/emoji/emoji-ar.php

Lines changed: 0 additions & 3663 deletions
This file was deleted.

Resources/data/transliterator/emoji/emoji-ar_sa.php

Lines changed: 0 additions & 3663 deletions
This file was deleted.

Resources/data/transliterator/emoji/emoji-as.php

Lines changed: 0 additions & 3663 deletions
This file was deleted.

Resources/data/transliterator/emoji/emoji-ast.php

Lines changed: 0 additions & 274 deletions
This file was deleted.

0 commit comments

Comments
 (0)