Skip to content

[build] simplify makeqstrdata heuristic #4564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 19, 2021
Merged

Conversation

tyomitch
Copy link

@tyomitch tyomitch commented Apr 5, 2021

The simpler one saves ~150 more bytes per translation.

jepler
jepler previously approved these changes Apr 6, 2021
Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which board or translation you verified, but on trinket_m0 with de_DE this saved more like 28 bytes. However, anything is welcome.

@tyomitch
Copy link
Author

tyomitch commented Apr 6, 2021

Hmm, indeed, parsing the CI logs shows that the effect varies a lot across boards and translations:

image

Average savings per language:

image

@dhalbert
Copy link
Collaborator

dhalbert commented Apr 6, 2021

Given the +/- spread, maybe delay merging this?

Could this be calculated per language both ways, and the best chosen? Or does it have to be uniform across languages? If this change makes a difference, maybe a different calculation would be even better for some languages.

@jepler
Copy link

jepler commented Apr 6, 2021

It would be nice to figure out a "right" heuristic. I have some doodles but they've never become mergeable.

The general idea is:

  • build an initial huffman table without dictionary entries
  • now we can count the original # of bits of any word candidate
  • and we can estimate the number of bits of the new dictionary symbol by finding where it would fall in the dictionary
  • of course we know the number of uses of the word

The true bit savings is (dictionary symbol overhead) + uses * (original_bits - new_bits)

The tree where I worked on this is jepler/translation-compression-improvements and specifically jepler@260d5ca

@tyomitch
Copy link
Author

tyomitch commented Apr 6, 2021

To clarify, my original motivation was, why use a sophisticated formula with log and a few magic constants, when a near-trivial estimate performs no worse?

@jepler's work, aiming at improving the compression by increasing its sophistication, is in a totally different direction.

The simpler one saves, on average, 51 more bytes per translation;
the biggest translation per board is reduced, on average, by 85 bytes.
@tyomitch
Copy link
Author

tyomitch commented Apr 9, 2021

Given the +/- spread, maybe delay merging this?

With a slightly different heuristic (but still simpler than the one currently used), the spread looks more convincing:

image

As for the per-language stats, note that the limiting factor for a board is the biggest translation: it doesn't matter if other translations get smaller, you still cannot add new code if it doesn't fit with the biggest translation. The new graph includes a bar for the average savings for the biggest translation per board:

image

@jepler
Copy link

jepler commented Apr 9, 2021

Can you share the scripts to generate the graphs?

@dhalbert
Copy link
Collaborator

dhalbert commented Apr 9, 2021

The complexity of the formula is not really important, because it contributes negligibly to build time, I would think. The important thing is to reduce the size of the largest language build.

@tyomitch
Copy link
Author

tyomitch commented Apr 9, 2021

Can you share the scripts to generate the graphs?

import os, re
for fn in os.listdir():
  if os.path.isfile(fn) and ("build-arm " in fn or "build-riscv " in fn):
    board = re.split('[()]', fn)[1]
    if board in ("spresense", "teensy40", "teensy41", "feather_m7_1011", "feather_mimxrt1011", "feather_mimxrt1062",
                 "imxrt1010_evk", "imxrt1020_evk", "imxrt1060_evk", "metro_m7_1011"):
       continue
    with open(fn, "r") as f:
       head = "Build " + board + " for "
       lines = iter(f)
       for line in lines:
           if head in line:
             tr = line.split(head)[1].split()[0]
             assert("make: Entering directory" in next(lines))
             assert("Use make V=1, make V=2" in next(lines))
             while re.search(r"\{\}|QSTR updated|FREEZE|\{'sku':|hex\tfilename|boot2.elf|Including User C Module from|Font missing|section `.bss' type changed to PROGBITS", next(lines)):
               pass
             free = next(lines).split("bytes used, ")[1].split()[0]
             print(board+","+tr+","+free)

This generates CSV that I open with Excel for graphing.

(The 10 singled-out boards, as well as xtensa ones, don't print out raw code size in their build logs.)

@tyomitch
Copy link
Author

tyomitch commented Apr 9, 2021

The complexity of the formula is not really important, because it contributes negligibly to build time, I would think.

Certainly so; but it does contribute to the time it takes to read the code and understand how it works :)

@tyomitch
Copy link
Author

@jepler @dhalbert ping?

@dhalbert
Copy link
Collaborator

@tyomitch - Hi, I was worried about some builds growing while others shrank, based on the graphs above, which might simply cause overflows for other different languages. I wondered if there was a formula that showed nearly reductions more uniformly.

@tyomitch
Copy link
Author

@tyomitch - Hi, I was worried about some builds growing while others shrank, based on the graphs above, which might simply cause overflows for other different languages. I wondered if there was a formula that showed nearly reductions more uniformly.

Sure; my point above was that if the biggest translation per board shrinks, it doesn't matter if other translations for the board grow.
For completeness, the graph below shows the effect on the biggest translation for each board:

image

dynalora_usb grew by 156 bytes, nine other boards grew up to 52 bytes, and all other boards shrank.
The exceptional growth of dynalora_usb takes away <0.4% of the currently free space, so it's very far from overflowing. Out of the tightly fitting M0 boards with <1KB free, all shrank.

The graph below shows the percentage of currently free space gained or lost:

image

tyomitch added a commit to tyomitch/circuitpython that referenced this pull request Apr 17, 2021
Split out of adafruit#4564

No functional change.
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine merging this as-is. I trust it reduces the size of the largest languages.

@tannewt tannewt merged commit e54e5e3 into adafruit:main Apr 19, 2021
@dhalbert
Copy link
Collaborator

@tyomitch - sorry, I missed your reply to my last comment; I did not mean to let it languish.

@tyomitch tyomitch deleted the patch-1 branch April 28, 2021 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants