Skip to content

Reduce .text footprint of the network stack #6293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 5, 2018
Merged

Reduce .text footprint of the network stack #6293

merged 1 commit into from
Sep 5, 2018

Conversation

Taiki-San
Copy link
Contributor

@Taiki-San Taiki-San commented Mar 7, 2018

Description

Patches to LWIP and some code from netsocket was using s[n]printf and sscanf to generate and parse strings based on integers. Specifically, converting MAC addresses and IPv6 addresses to and from their hexadecimal form and IPv4 addresses to and from their printable decimal form.

The use of those functions in public APIs forced the compiler to bring the full parser necessary to make *printf/sscanf work. This parser is massive (in order to handle floating point for instance) and several of its components were in the largest functions in the final binary independently, despite the monolithics tcp_input and tcp_output.

The small test program I was using had the following memory breakdown before linking to the network stack when compiling with GCC and a release profile:

Text 38.2KB Data 1.23KB BSS 9.21KB
ROM 39.5KB RAM 10.4KB

The same program using the default network stack (K64F & Ethernet):

Text 114KB Data 1.76KB BSS 54.4KB
ROM 116KB RAM 56.1KB

Then the same program using the network stack included in the PR, not using *printf/sscanf and thus not linking with the expensive parser:

Text 87.3KB Data 1.4KB BSS 54.3KB
ROM 88.7KB RAM 55.7KB

Although the RAM change is negligeable (but still down), the .text impact is massive (-23%).

No network Default network Patched network *printf parser only
ROM change 1x 3.03x 2.24x 0.71x
Overhead 0kB 76.5kB 49.2kB 27.3kB

I couldn't find where to put the small utils I had to write (a basic itoa and two bin2hex) so they are inline and bin2hex had to be duplicated.

Pull request type

  • Fix
  • Refactor
  • New target
  • Feature
  • Breaking change

@Taiki-San
Copy link
Contributor Author

Not sure where the CI failure come from. Can I have more details on what do I need to fix?

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 19, 2018

0xc0170 requested review from kjbracey-arm and mikaleppanen

Can you please review?

@kjbracey
Copy link
Contributor

I'm not a total fan of this - for most real-world code, including anything with tracing, printf is already a sunk cost - if everyone starts having their own local versions, it's not a net win for anything other than the networking equivalent of "blinky"

scanf on the other hand is less often used, so maybe that half is worth taking.

I would be interested in which libraries are pulling in FP code unnecessarily - some at least should be making the effort to spot that no format strings include FP and thus exclude it.

@Taiki-San
Copy link
Contributor Author

I agree that having each occurrence of the code having its own custom parser isn't really necessary. Maybe those basics utils (itoa/bin2hex/hex2bin) would make sense in mbedOS' stdlib? As for their use there, I'd argue it's a fairly straightforward parser not warranting much of sprintf overall complexity (even ignoring the FP code).

You're also right that avoiding printf/scanf based functions would be a complex endeavour for real world applications, but maybe forcing the cost on all users might be an overkill. If string generation utils are necessary, the application may be using its own basic toolkit not to have to pay the printf penalty after going through the same debugging as I did.

As for reducing said penalty, I'm not sure this is feasible at compile time without making mbedOS's printf application not parse floating points at all, or adding a specific function to parse floating points. There would be however no way to detect a misuse of one of those functions at compile time, as the decision is only known when the parser of the format string is run at runtime. This is because as far as I know, current compilers don't precompile the printf parser code and any flag-supporting code must be present.

I don't really see how to get this space back without having to add complexity, either by specialised utils or by making the default functions not behave like it does elsewhere.

@kjbracey
Copy link
Contributor

Not sure about current versions of the various toolchains, but I know historically ARMCC eliminated floating point code by having the compiler check the format string literal, and generate a call to _printf if there was no floating point in it and printf if there was (or if the format string was not a constant). Then at link time you either got full printf or cut-down _printf. Compiler magic - compilers are allowed to be aware of the standard library and do tricks like that.

My point is not so much each bit here having its custom thing, it's that there's no point you not using sprintf if the application or some other library is still using it. You've just increased the code size in that case. I guess if the cost is negligable, it's not too bad, but I wouldn't want to encourage the pattern - the cost will build up.

Do you have the numbers for code size increase with the "don't use sprintf" version if sprintf is still linked in?

@Taiki-San
Copy link
Contributor Author

Impressive! I wasn't aware of that! It'd appear GCC doesn't have that feature though...

That make sense, I'll run my tests again on the last version of mbedOS and give you the overhead of the patch.

@Taiki-San
Copy link
Contributor Author

Taiki-San commented Mar 20, 2018

Here are the results based on a small test program I had on hand (something based on uvisor I stripped out).
The compilation command I used was the following: mbed compile -m K64F -t GCC_ARM --profile mbed-os/tools/profiles/release.json -c

//current-lwip-glue - no sprintf application use

+------------------+--------+-------+-------+
| Module           |  .text | .data |  .bss |
+------------------+--------+-------+-------+
| [fill]           |    243 |     8 |  2201 |
| [lib]/c.a        |  30809 |  2472 |    89 |
| [lib]/gcc.a      |   3368 |     0 |     0 |
| [lib]/misc       |    248 |     8 |    28 |
| main.o           |    198 |     0 |     4 |
| mbed-os/drivers  |     98 |     0 |     0 |
| mbed-os/features |  45680 |   172 | 46072 |
| mbed-os/hal      |   1445 |     4 |    68 |
| mbed-os/platform |   1634 |   260 |   149 |
| mbed-os/rtos     |  14711 |   168 |  6073 |
| mbed-os/targets  |   6776 |    12 |   384 |
| network.o        |   1176 |     0 |   492 |
| secure_box.o     |    110 |     0 |     0 |
| Subtotals        | 106496 |  3104 | 55560 |
+------------------+--------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 109600 bytes

//both-stripped-from-lwip-glue - no sprintf application use

+------------------+-------+-------+-------+
| Module           | .text | .data |  .bss |
+------------------+-------+-------+-------+
| [fill]           |   179 |     8 |  2553 |
| [lib]/c.a        |  4712 |  2108 |    89 |
| [lib]/gcc.a      |   808 |     0 |     0 |
| [lib]/misc       |   248 |     8 |    28 |
| main.o           |   198 |     0 |     4 |
| mbed-os/drivers  |    98 |     0 |     0 |
| mbed-os/features | 45792 |   172 | 46072 |
| mbed-os/hal      |  1445 |     4 |    68 |
| mbed-os/platform |  1630 |   260 |   149 |
| mbed-os/rtos     | 14711 |   168 |  6073 |
| mbed-os/targets  |  6776 |    12 |   384 |
| network.o        |  1176 |     0 |   492 |
| secure_box.o     |   110 |     0 |     0 |
| Subtotals        | 77883 |  2740 | 55912 |
+------------------+-------+-------+-------+
Total Static RAM memory (data + bss): 58652 bytes
Total Flash memory (text + data): 80623 bytes

//sscanf-stripped-from-lwip-glue - no sprintf application use

+------------------+-------+-------+-------+
| Module           | .text | .data |  .bss |
+------------------+-------+-------+-------+
| [fill]           |   237 |     8 |  2201 |
| [lib]/c.a        | 17659 |  2472 |    89 |
| [lib]/gcc.a      |  3144 |     0 |     0 |
| [lib]/misc       |   248 |     8 |    28 |
| main.o           |   198 |     0 |     4 |
| mbed-os/drivers  |    98 |     0 |     0 |
| mbed-os/features | 45711 |   172 | 46072 |
| mbed-os/hal      |  1445 |     4 |    68 |
| mbed-os/platform |  1634 |   260 |   149 |
| mbed-os/rtos     | 14711 |   168 |  6073 |
| mbed-os/targets  |  6776 |    12 |   384 |
| network.o        |  1176 |     0 |   492 |
| secure_box.o     |   110 |     0 |     0 |
| Subtotals        | 93147 |  3104 | 55560 |
+------------------+-------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 96251 bytes

//both-stripped-from-lwip-glue - with sprintf application use

+------------------+-------+-------+-------+
| Module           | .text | .data |  .bss |
+------------------+-------+-------+-------+
| [fill]           |   232 |     8 |  2151 |
| [lib]/c.a        | 17507 |  2472 |    89 |
| [lib]/gcc.a      |  3144 |     0 |     0 |
| [lib]/misc       |   248 |     8 |    28 |
| main.o           |   198 |     0 |     4 |
| mbed-os/drivers  |    98 |     0 |     0 |
| mbed-os/features | 45792 |   172 | 46072 |
| mbed-os/hal      |  1445 |     4 |    68 |
| mbed-os/platform |  1634 |   260 |   149 |
| mbed-os/rtos     | 14711 |   168 |  6073 |
| mbed-os/targets  |  6776 |    12 |   384 |
| network.o        |  1228 |     0 |   542 |
| secure_box.o     |   110 |     0 |     0 |
| Subtotals        | 93123 |  3104 | 55560 |
+------------------+-------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 96227 bytes

It looks like the three calls to s[n]printf generate more code/data than using the utils and their implementation (maybe because the format strings aren't necessary?).

@kjbracey
Copy link
Contributor

I don't think you've got the numbers I want for comparison there - I'd want multiple builds, all with sprintf in use by the application. Then look at "current-lwip-glue", "sprintf-stripped-from-lwip-glue" and "both-stripped-from-lwip-glue".

Maybe worth checking the same with both sprintf+sscanf in use by application, although I'm inclined to believe scanf is little enough used to make avoiding it in utility libraries worthwhile.

@Taiki-San
Copy link
Contributor Author

Taiki-San commented Mar 20, 2018

Those numbers were from multiple build but only one included application use of sprintf (I updated the table names with your syntax).
Do you want me to make two new builds (sscanf-stripped-from-lwip-glue and current-lwip-glue) with the application using sprintf?

@Taiki-San
Copy link
Contributor Author

Taiki-San commented Mar 20, 2018

We are the new test results:
//both-stripped-from-lwip-glue - with sprintf application use

+------------------+-------+-------+-------+
| Module           | .text | .data |  .bss |
+------------------+-------+-------+-------+
| [fill]           |   229 |     8 |  2201 |
| [lib]/c.a        | 17507 |  2472 |    89 |
| [lib]/gcc.a      |  3144 |     0 |     0 |
| [lib]/misc       |   248 |     8 |    28 |
| main.o           |   198 |     0 |     4 |
| mbed-os/drivers  |    98 |     0 |     0 |
| mbed-os/features | 45792 |   172 | 46072 |
| mbed-os/hal      |  1445 |     4 |    68 |
| mbed-os/platform |  1634 |   260 |   149 |
| mbed-os/rtos     | 14711 |   168 |  6073 |
| mbed-os/targets  |  6776 |    12 |   384 |
| network.o        |  1215 |     0 |   492 |
| secure_box.o     |   110 |     0 |     0 |
| Subtotals        | 93107 |  3104 | 55560 |
+------------------+-------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 96211 bytes

//sscanf-stripped-from-lwip-glue - with sprintf application use

+------------------+-------+-------+-------+
| Module           | .text | .data |  .bss |
+------------------+-------+-------+-------+
| [fill]           |   241 |     8 |  2201 |
| [lib]/c.a        | 17659 |  2472 |    89 |
| [lib]/gcc.a      |  3144 |     0 |     0 |
| [lib]/misc       |   248 |     8 |    28 |
| main.o           |   198 |     0 |     4 |
| mbed-os/drivers  |    98 |     0 |     0 |
| mbed-os/features | 45711 |   172 | 46072 |
| mbed-os/hal      |  1445 |     4 |    68 |
| mbed-os/platform |  1634 |   260 |   149 |
| mbed-os/rtos     | 14711 |   168 |  6073 |
| mbed-os/targets  |  6776 |    12 |   384 |
| network.o        |  1239 |     0 |   492 |
| secure_box.o     |   110 |     0 |     0 |
| Subtotals        | 93214 |  3104 | 55560 |
+------------------+-------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 96318 bytes

//current-lwip-glue - with sprintf application use

+------------------+--------+-------+-------+
| Module           |  .text | .data |  .bss |
+------------------+--------+-------+-------+
| [fill]           |    239 |     8 |  2201 |
| [lib]/c.a        |  30809 |  2472 |    89 |
| [lib]/gcc.a      |   3368 |     0 |     0 |
| [lib]/misc       |    248 |     8 |    28 |
| main.o           |    198 |     0 |     4 |
| mbed-os/drivers  |     98 |     0 |     0 |
| mbed-os/features |  45680 |   172 | 46072 |
| mbed-os/hal      |   1445 |     4 |    68 |
| mbed-os/platform |   1634 |   260 |   149 |
| mbed-os/rtos     |  14711 |   168 |  6073 |
| mbed-os/targets  |   6776 |    12 |   384 |
| network.o        |   1239 |     0 |   492 |
| secure_box.o     |    110 |     0 |     0 |
| Subtotals        | 106555 |  3104 | 55560 |
+------------------+--------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 109659 bytes

The application code using sprintf follows:

int status = 0;
while(!shutDown)
{
	char string[50];
	sprintf(string, "%d", status++);
	proxyNewRequest(string, 80, string, sizeof(string));
}

@kjbracey
Copy link
Contributor

Close, but I wanted "sprintf-stripped-from-lwip-glue" - so I can see whether not using sprintf is costing something over using the sprintf already there.

@Taiki-San
Copy link
Contributor Author

Taiki-San commented Mar 20, 2018

Here you go
//sprintf-stripped-from-lwip-glue with sprintf application use

+------------------+--------+-------+-------+
| Module           |  .text | .data |  .bss |
+------------------+--------+-------+-------+
| [fill]           |    243 |     8 |  2201 |
| [lib]/c.a        |  30657 |  2472 |    89 |
| [lib]/gcc.a      |   3368 |     0 |     0 |
| [lib]/misc       |    248 |     8 |    28 |
| main.o           |    198 |     0 |     4 |
| mbed-os/drivers  |     98 |     0 |     0 |
| mbed-os/features |  45761 |   172 | 46072 |
| mbed-os/hal      |   1445 |     4 |    68 |
| mbed-os/platform |   1634 |   260 |   149 |
| mbed-os/rtos     |  14711 |   168 |  6073 |
| mbed-os/targets  |   6776 |    12 |   384 |
| network.o        |   1215 |     0 |   492 |
| secure_box.o     |    110 |     0 |     0 |
| Subtotals        | 106464 |  3104 | 55560 |
+------------------+--------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 109568 bytes

//sprintf-stripped-from-lwip-glue no sprintf application use

+------------------+--------+-------+-------+
| Module           |  .text | .data |  .bss |
+------------------+--------+-------+-------+
| [fill]           |    246 |     8 |  2201 |
| [lib]/c.a        |  30657 |  2472 |    89 |
| [lib]/gcc.a      |   3368 |     0 |     0 |
| [lib]/misc       |    248 |     8 |    28 |
| main.o           |    198 |     0 |     4 |
| mbed-os/drivers  |     98 |     0 |     0 |
| mbed-os/features |  45761 |   172 | 46072 |
| mbed-os/hal      |   1445 |     4 |    68 |
| mbed-os/platform |   1634 |   260 |   149 |
| mbed-os/rtos     |  14711 |   168 |  6073 |
| mbed-os/targets  |   6776 |    12 |   384 |
| network.o        |   1172 |     0 |   492 |
| secure_box.o     |    110 |     0 |     0 |
| Subtotals        | 106424 |  3104 | 55560 |
+------------------+--------+-------+-------+
Total Static RAM memory (data + bss): 58664 bytes
Total Flash memory (text + data): 109528 bytes

It appears that not using sprintf in the glue saves 32 bytes of Flash memory.
Those results were surprising but by stripping sscanf, I could get back the fully stripped build size so I'm guessing sscanf rely on parts of the sprintf parser.

@0xc0170
Copy link
Contributor

0xc0170 commented Apr 9, 2018

@kjbracey-arm @mikaleppanen What is the review status for this pull request?

@kjbracey
Copy link
Contributor

I'd like to hold off on this one, and maybe revisit after 5.9. I do buy the removal of sscanf, but don't really fancy trying to avoid sprintf.

But this is going to add to integration hassle work with the feature-emac branch coming in shortly, so let's put it on hold for now.

@Taiki-San
Copy link
Contributor Author

I'm not sure what the feature-emac branch is about but I don't see what integration hassle work you may be referring to. The point of this PR isn't to prevent any use of sprintf/sscanf but to avoid using those functions when trivial conversion is possible, on a case by case basis and in code critical for minimal setups.
Those are powerful functions and whenever complex parsing is involved (and hopefully when the feature is optional), sprintf and sscanf are much safer to use in place of a ton of custom parsers.
Although those functions have a cost that should be minimised whenever possible, the problem of using them in the core of the network stack is that it increases the space taken by a minimal mbedOS setup that could be used for instance as a redundant network setup when trying to perform in-place OTA updates.
I'd like to request reconsideration, but would understand if this kind of very specific use case isn't a priority for the project.

@kjbracey
Copy link
Contributor

There's a major restructuring about to happen to this code area, currently on the feature-emac branch. Any changes made on master will need to be reintegrated into the new version, causing us work. If you were to submit the patch as a PR to feature-emac instead of master we might be able to consider it there.

I'm not in favour of trying to avoid sprintf if we need to format stuff - formatting stuff is so common that it makes sense to use a central call to do it. And if it's fat, hassle the guys providing it to make it smaller. (Like that non-floating-point _sprintf trick previously referred to).

But how about just eliminating the need to format stuff? In a minimal image as you describe just sending and receiving packets why is any text formatting happening at all? Can you investigate that? Are we unnecessarily filling in a text buffer no-one's using? Should we moving that to be "on-demand", which would allow linker exclusion?

There is a general API issue here in that quite a few calls accept/return text strings, which is unfortunate. If they were binary they'd be more efficient.

@Taiki-San
Copy link
Contributor Author

Taiki-San commented Apr 27, 2018

Understood, I didn't know about this restructuring (is it related to ARMv8-M?).

Although even a 'free' sprintf is more costly that itoa/bin2hex/hex2bin, I get your point. Complexity has a cost that shouldn't be trivially ignored.

Removing this string processing would be ideal, but after a quick investigation it appears this isn't going to be trivial. After having a look at a disassembler, it appears that sscanf uses are called by parts of C++ classes ( NetworkStack::gethostbyname call SocketAddress::set_ip_address which depending of the input calls ipv6_from_address and ipv4_from_address) and doesn't appear to be strippable (at least not by GCC).
ipv[4-6]_to_address are properly stripped when not explicitely used.
mbed_lwip_record_mac_address caches the MAC address as a string and is called by mbed_lwip_emac_init which is called by mbed_lwip_bringup_2 and is at the core of the network initialization code path. This function isn't critical as the string it generates is only used by mbed_lwip_get_mac_address but this function is exposed through a C++ class (EthernetInterface) and thus can't be stripped. Changing the API to return the binary version and provide an helper to generate the string would be a solution but I'm not qualified to recommand API changes. I'd remark that old lwip code generate multiple strings (mbed_lwip_get_ip_address...) but relies on functions such as ip4addr_ntoa_r to do so, which craft them manually without using sprintf.

It looks like stripping either sprintf of sscanf from builds not using those functionnalities would either require to get the compiler to strip parts of a C++ class, or changing the API of those classes.
There are probably significant gains to be made by stripping unecessary methods of exposed C++ classes but I'm unsure of how to do that.
Do you see a way achieve that?

@kjbracey
Copy link
Contributor

The restructuring is basically an improvement to the way Ethernet+Wi-fi drivers are written and installed.

Removing this string processing would be ideal, but after a quick investigation it appears this isn't going to be trivial.

Agreed. The problem is really the design of those classes and APIs. The calls to "get_ip_address" should be available in a form that returns a SocketAddress, not a string. The thing that NetworkStack::gethostbyname is doing is awkward because of that, and invokes the printf and the scanf.

I think that's where I'd like to attack it - add new binary APIs where they're missing, making everything possible without going anywhere near IPv4 or IPv6 strings, unless you're printing or parsing at the user end.

All compilers should already be stripping methods that aren't used. Although I guess that's not true for virtual methods. Hmm. So maybe just not using them isn't good enough.

@pan-
Copy link
Member

pan- commented Apr 27, 2018

Although I guess that's not true for virtual methods.

That's not true with the current compilation options but it can be done with devirtualization and link time optimization enabled. Code compiled in release would greatly benefits of it as in most cases an interfaces class have a unique (final) implementation.

@kjbracey
Copy link
Contributor

My belief (guess) was that base class methods not used because they were always overridden in any instantiated class would be eliminated, as no vtbl would reference them, but virtual methods in a final class that were never called would remain.

@pan-
Copy link
Member

pan- commented Apr 27, 2018

The base class constructor is called when the derived class is constructed therefore the base class vtable is referenced. I guess it might be elided if the compiler has visibility over the base class constructor and the inlining threshold is sufficient.

@adbridge
Copy link
Contributor

@Taiki-San This needs a rebase due to conflicts. Also could you please address the review comments. In order for this to make it into mbed-os-5.9, this needs to be reviewed/through ci and merged by this Friday.

@Taiki-San
Copy link
Contributor Author

Understood, I'll rebase shortly.
As for the issues raised by @kjbracey-arm, not sure what the solution would be.
Although the core suggestion was to prevent the unecessary code to be linked when not actively used, further discussion demonstrated that it wouldn't be trivial to do/require API changes.
Moreover, considering the first point (that there is no point in avoiding sprintf if it'll be used by the application code) was more or less rendered moot (the helper functions introduced by this commit takes up less space than a call to sprintf, even if the application uses sprintf elsewhere), how should I address it?

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 28, 2018

kjbracey-arm approved these changes 4 minutes ago

Is this now ready for review (one outstanding "requested changes") ? Please remove needs preceding PR if the nanostack PR was resolved (I can see it is still open).

@kjbracey
Copy link
Contributor

This uses the first preceding PR, which is already in.

It could be refined later when the second libservice PR goes in after 5.10. (Dropping ipv6_is_valid)

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 5, 2018

/morph build

@mbed-ci
Copy link

mbed-ci commented Sep 5, 2018

Build : FAILURE

Build number : 3008
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6293/

@Taiki-San
Copy link
Contributor Author

What's the build failure cause?

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 5, 2018

What's the build failure cause?

One of the servers went down, restarting

/morph build

@mbed-ci
Copy link

mbed-ci commented Sep 5, 2018

Build : SUCCESS

Build number : 3010
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6293/

Triggering tests

/morph test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented Sep 5, 2018

@cmonr
Copy link
Contributor

cmonr commented Sep 5, 2018

/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented Sep 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants