[build] simplify makeqstrdata heuristic #4564

tyomitch · 2021-04-05T20:02:28Z

The simpler one saves ~150 more bytes per translation.

jepler

I'm not sure which board or translation you verified, but on trinket_m0 with de_DE this saved more like 28 bytes. However, anything is welcome.

tyomitch · 2021-04-06T07:15:52Z

Hmm, indeed, parsing the CI logs shows that the effect varies a lot across boards and translations:

Average savings per language:

dhalbert · 2021-04-06T14:45:36Z

Given the +/- spread, maybe delay merging this?

Could this be calculated per language both ways, and the best chosen? Or does it have to be uniform across languages? If this change makes a difference, maybe a different calculation would be even better for some languages.

jepler · 2021-04-06T14:57:04Z

It would be nice to figure out a "right" heuristic. I have some doodles but they've never become mergeable.

The general idea is:

build an initial huffman table without dictionary entries
now we can count the original # of bits of any word candidate
and we can estimate the number of bits of the new dictionary symbol by finding where it would fall in the dictionary
of course we know the number of uses of the word

The true bit savings is (dictionary symbol overhead) + uses * (original_bits - new_bits)

The tree where I worked on this is jepler/translation-compression-improvements and specifically jepler@260d5ca

tyomitch · 2021-04-06T15:27:11Z

To clarify, my original motivation was, why use a sophisticated formula with log and a few magic constants, when a near-trivial estimate performs no worse?

@jepler's work, aiming at improving the compression by increasing its sophistication, is in a totally different direction.

The simpler one saves, on average, 51 more bytes per translation; the biggest translation per board is reduced, on average, by 85 bytes.

tyomitch · 2021-04-09T12:32:31Z

Given the +/- spread, maybe delay merging this?

With a slightly different heuristic (but still simpler than the one currently used), the spread looks more convincing:

As for the per-language stats, note that the limiting factor for a board is the biggest translation: it doesn't matter if other translations get smaller, you still cannot add new code if it doesn't fit with the biggest translation. The new graph includes a bar for the average savings for the biggest translation per board:

jepler · 2021-04-09T12:33:07Z

Can you share the scripts to generate the graphs?

No functional change.

dhalbert · 2021-04-09T12:37:33Z

The complexity of the formula is not really important, because it contributes negligibly to build time, I would think. The important thing is to reduce the size of the largest language build.

tyomitch · 2021-04-09T12:43:28Z

Can you share the scripts to generate the graphs?

import os, re
for fn in os.listdir():
  if os.path.isfile(fn) and ("build-arm " in fn or "build-riscv " in fn):
    board = re.split('[()]', fn)[1]
    if board in ("spresense", "teensy40", "teensy41", "feather_m7_1011", "feather_mimxrt1011", "feather_mimxrt1062",
                 "imxrt1010_evk", "imxrt1020_evk", "imxrt1060_evk", "metro_m7_1011"):
       continue
    with open(fn, "r") as f:
       head = "Build " + board + " for "
       lines = iter(f)
       for line in lines:
           if head in line:
             tr = line.split(head)[1].split()[0]
             assert("make: Entering directory" in next(lines))
             assert("Use make V=1, make V=2" in next(lines))
             while re.search(r"\{\}|QSTR updated|FREEZE|\{'sku':|hex\tfilename|boot2.elf|Including User C Module from|Font missing|section `.bss' type changed to PROGBITS", next(lines)):
               pass
             free = next(lines).split("bytes used, ")[1].split()[0]
             print(board+","+tr+","+free)

This generates CSV that I open with Excel for graphing.

(The 10 singled-out boards, as well as xtensa ones, don't print out raw code size in their build logs.)

tyomitch · 2021-04-09T12:48:23Z

The complexity of the formula is not really important, because it contributes negligibly to build time, I would think.

Certainly so; but it does contribute to the time it takes to read the code and understand how it works :)

tyomitch · 2021-04-16T17:25:32Z

@jepler @dhalbert ping?

dhalbert · 2021-04-16T17:53:25Z

@tyomitch - Hi, I was worried about some builds growing while others shrank, based on the graphs above, which might simply cause overflows for other different languages. I wondered if there was a formula that showed nearly reductions more uniformly.

tyomitch · 2021-04-16T19:27:12Z

@tyomitch - Hi, I was worried about some builds growing while others shrank, based on the graphs above, which might simply cause overflows for other different languages. I wondered if there was a formula that showed nearly reductions more uniformly.

Sure; my point above was that if the biggest translation per board shrinks, it doesn't matter if other translations for the board grow.
For completeness, the graph below shows the effect on the biggest translation for each board:

dynalora_usb grew by 156 bytes, nine other boards grew up to 52 bytes, and all other boards shrank.
The exceptional growth of dynalora_usb takes away <0.4% of the currently free space, so it's very far from overflowing. Out of the tightly fitting M0 boards with <1KB free, all shrank.

The graph below shows the percentage of currently free space gained or lost:

Split out of adafruit#4564 No functional change.

tannewt

I'm fine merging this as-is. I trust it reduces the size of the largest languages.

dhalbert · 2021-04-20T12:19:30Z

@tyomitch - sorry, I missed your reply to my last comment; I did not mean to let it languish.

jepler previously approved these changes Apr 6, 2021

View reviewed changes

[build] simplify makeqstrdata heuristic

6892068

The simpler one saves, on average, 51 more bytes per translation; the biggest translation per board is reduced, on average, by 85 bytes.

tyomitch dismissed jepler’s stale review via 6892068 April 9, 2021 12:33

tyomitch force-pushed the patch-1 branch from a051ae6 to 6892068 Compare April 9, 2021 12:33

build: simplify compute_huffman_coding()

dcee89a

No functional change.

tyomitch added a commit to tyomitch/circuitpython that referenced this pull request Apr 17, 2021

[build] Simplify compute_huffman_coding()

f65ec80

Split out of adafruit#4564 No functional change.

tyomitch mentioned this pull request Apr 17, 2021

[build] Simplify compute_huffman_coding() #4623

Closed

tannewt approved these changes Apr 19, 2021

View reviewed changes

tannewt merged commit e54e5e3 into adafruit:main Apr 19, 2021

tyomitch deleted the patch-1 branch April 28, 2021 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[build] simplify makeqstrdata heuristic #4564

[build] simplify makeqstrdata heuristic #4564

Uh oh!

tyomitch commented Apr 5, 2021

Uh oh!

jepler left a comment

Uh oh!

tyomitch commented Apr 6, 2021

Uh oh!

dhalbert commented Apr 6, 2021

Uh oh!

jepler commented Apr 6, 2021 •

edited

Loading

Uh oh!

tyomitch commented Apr 6, 2021

Uh oh!

tyomitch commented Apr 9, 2021

Uh oh!

jepler commented Apr 9, 2021

Uh oh!

dhalbert commented Apr 9, 2021

Uh oh!

tyomitch commented Apr 9, 2021

Uh oh!

tyomitch commented Apr 9, 2021

Uh oh!

tyomitch commented Apr 16, 2021

Uh oh!

dhalbert commented Apr 16, 2021

Uh oh!

tyomitch commented Apr 16, 2021

Uh oh!

tannewt left a comment

Uh oh!

dhalbert commented Apr 20, 2021

Uh oh!

Uh oh!

[build] simplify makeqstrdata heuristic #4564

[build] simplify makeqstrdata heuristic #4564

Uh oh!

Conversation

tyomitch commented Apr 5, 2021

Uh oh!

jepler left a comment

Choose a reason for hiding this comment

Uh oh!

tyomitch commented Apr 6, 2021

Uh oh!

dhalbert commented Apr 6, 2021

Uh oh!

jepler commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyomitch commented Apr 6, 2021

Uh oh!

tyomitch commented Apr 9, 2021

Uh oh!

jepler commented Apr 9, 2021

Uh oh!

dhalbert commented Apr 9, 2021

Uh oh!

tyomitch commented Apr 9, 2021

Uh oh!

tyomitch commented Apr 9, 2021

Uh oh!

tyomitch commented Apr 16, 2021

Uh oh!

dhalbert commented Apr 16, 2021

Uh oh!

tyomitch commented Apr 16, 2021

Uh oh!

tannewt left a comment

Choose a reason for hiding this comment

Uh oh!

dhalbert commented Apr 20, 2021

Uh oh!

Uh oh!

jepler commented Apr 6, 2021 •

edited

Loading