-
Notifications
You must be signed in to change notification settings - Fork 3k
Reduce .text footprint of the network stack #6293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not sure where the CI failure come from. Can I have more details on what do I need to fix? |
Can you please review? |
I'm not a total fan of this - for most real-world code, including anything with tracing, printf is already a sunk cost - if everyone starts having their own local versions, it's not a net win for anything other than the networking equivalent of "blinky" scanf on the other hand is less often used, so maybe that half is worth taking. I would be interested in which libraries are pulling in FP code unnecessarily - some at least should be making the effort to spot that no format strings include FP and thus exclude it. |
I agree that having each occurrence of the code having its own custom parser isn't really necessary. Maybe those basics utils (itoa/bin2hex/hex2bin) would make sense in mbedOS' stdlib? As for their use there, I'd argue it's a fairly straightforward parser not warranting much of sprintf overall complexity (even ignoring the FP code). You're also right that avoiding printf/scanf based functions would be a complex endeavour for real world applications, but maybe forcing the cost on all users might be an overkill. If string generation utils are necessary, the application may be using its own basic toolkit not to have to pay the printf penalty after going through the same debugging as I did. As for reducing said penalty, I'm not sure this is feasible at compile time without making mbedOS's printf application not parse floating points at all, or adding a specific function to parse floating points. There would be however no way to detect a misuse of one of those functions at compile time, as the decision is only known when the parser of the format string is run at runtime. This is because as far as I know, current compilers don't precompile the printf parser code and any flag-supporting code must be present. I don't really see how to get this space back without having to add complexity, either by specialised utils or by making the default functions not behave like it does elsewhere. |
Not sure about current versions of the various toolchains, but I know historically ARMCC eliminated floating point code by having the compiler check the format string literal, and generate a call to My point is not so much each bit here having its custom thing, it's that there's no point you not using Do you have the numbers for code size increase with the "don't use sprintf" version if sprintf is still linked in? |
Impressive! I wasn't aware of that! It'd appear GCC doesn't have that feature though... That make sense, I'll run my tests again on the last version of mbedOS and give you the overhead of the patch. |
Here are the results based on a small test program I had on hand (something based on uvisor I stripped out). //current-lwip-glue - no sprintf application use
//both-stripped-from-lwip-glue - no sprintf application use
//sscanf-stripped-from-lwip-glue - no sprintf application use
//both-stripped-from-lwip-glue - with sprintf application use
It looks like the three calls to |
I don't think you've got the numbers I want for comparison there - I'd want multiple builds, all with sprintf in use by the application. Then look at "current-lwip-glue", "sprintf-stripped-from-lwip-glue" and "both-stripped-from-lwip-glue". Maybe worth checking the same with both sprintf+sscanf in use by application, although I'm inclined to believe scanf is little enough used to make avoiding it in utility libraries worthwhile. |
Those numbers were from multiple build but only one included application use of sprintf (I updated the table names with your syntax). |
We are the new test results:
//sscanf-stripped-from-lwip-glue - with sprintf application use
//current-lwip-glue - with sprintf application use
The application code using
|
Close, but I wanted "sprintf-stripped-from-lwip-glue" - so I can see whether not using sprintf is costing something over using the sprintf already there. |
Here you go
//sprintf-stripped-from-lwip-glue no sprintf application use
It appears that not using sprintf in the glue saves 32 bytes of Flash memory. |
@kjbracey-arm @mikaleppanen What is the review status for this pull request? |
I'd like to hold off on this one, and maybe revisit after 5.9. I do buy the removal of sscanf, but don't really fancy trying to avoid sprintf. But this is going to add to integration hassle work with the feature-emac branch coming in shortly, so let's put it on hold for now. |
I'm not sure what the |
There's a major restructuring about to happen to this code area, currently on the feature-emac branch. Any changes made on master will need to be reintegrated into the new version, causing us work. If you were to submit the patch as a PR to feature-emac instead of master we might be able to consider it there. I'm not in favour of trying to avoid sprintf if we need to format stuff - formatting stuff is so common that it makes sense to use a central call to do it. And if it's fat, hassle the guys providing it to make it smaller. (Like that non-floating-point But how about just eliminating the need to format stuff? In a minimal image as you describe just sending and receiving packets why is any text formatting happening at all? Can you investigate that? Are we unnecessarily filling in a text buffer no-one's using? Should we moving that to be "on-demand", which would allow linker exclusion? There is a general API issue here in that quite a few calls accept/return text strings, which is unfortunate. If they were binary they'd be more efficient. |
Understood, I didn't know about this restructuring (is it related to ARMv8-M?). Although even a 'free' Removing this string processing would be ideal, but after a quick investigation it appears this isn't going to be trivial. After having a look at a disassembler, it appears that It looks like stripping either |
The restructuring is basically an improvement to the way Ethernet+Wi-fi drivers are written and installed.
Agreed. The problem is really the design of those classes and APIs. The calls to "get_ip_address" should be available in a form that returns a I think that's where I'd like to attack it - add new binary APIs where they're missing, making everything possible without going anywhere near IPv4 or IPv6 strings, unless you're printing or parsing at the user end. All compilers should already be stripping methods that aren't used. Although I guess that's not true for virtual methods. Hmm. So maybe just not using them isn't good enough. |
That's not true with the current compilation options but it can be done with devirtualization and link time optimization enabled. Code compiled in release would greatly benefits of it as in most cases an interfaces class have a unique (final) implementation. |
My belief (guess) was that base class methods not used because they were always overridden in any instantiated class would be eliminated, as no vtbl would reference them, but virtual methods in a final class that were never called would remain. |
The base class constructor is called when the derived class is constructed therefore the base class vtable is referenced. I guess it might be elided if the compiler has visibility over the base class constructor and the inlining threshold is sufficient. |
@Taiki-San This needs a rebase due to conflicts. Also could you please address the review comments. In order for this to make it into mbed-os-5.9, this needs to be reviewed/through ci and merged by this Friday. |
Understood, I'll rebase shortly. |
Is this now ready for review (one outstanding "requested changes") ? Please remove needs preceding PR if the nanostack PR was resolved (I can see it is still open). |
This uses the first preceding PR, which is already in. It could be refined later when the second libservice PR goes in after 5.10. (Dropping |
/morph build |
Build : FAILUREBuild number : 3008 |
What's the build failure cause? |
One of the servers went down, restarting /morph build |
Build : SUCCESSBuild number : 3010 Triggering tests/morph test |
Test : SUCCESSBuild number : 2777 |
/morph mbed2-build |
Exporter Build : SUCCESSBuild number : 2622 |
Description
Patches to LWIP and some code from netsocket was using
s[n]printf
andsscanf
to generate and parse strings based on integers. Specifically, converting MAC addresses and IPv6 addresses to and from their hexadecimal form and IPv4 addresses to and from their printable decimal form.The use of those functions in public APIs forced the compiler to bring the full parser necessary to make
*printf/sscanf
work. This parser is massive (in order to handle floating point for instance) and several of its components were in the largest functions in the final binary independently, despite the monolithicstcp_input
andtcp_output
.The small test program I was using had the following memory breakdown before linking to the network stack when compiling with GCC and a release profile:
The same program using the default network stack (K64F & Ethernet):
Then the same program using the network stack included in the PR, not using
*printf/sscanf
and thus not linking with the expensive parser:Although the RAM change is negligeable (but still down), the .text impact is massive (-23%).
*printf
parser onlyI couldn't find where to put the small utils I had to write (a basic itoa and two bin2hex) so they are inline and bin2hex had to be duplicated.
Pull request type