Skip to content

Refactor memap for speed #5125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 21, 2017

Conversation

theotherjimmy
Copy link
Contributor

@theotherjimmy theotherjimmy commented Sep 15, 2017

Improves performance by 10-15x on my machine. See below for individual compiler comparisons.

Resolves #4976
Resolves #5124
Resolves #5127

Todo:

  • Parsing tests (let's never let this non-performance bug in again)
  • Resolve the Malformed input... output from GCC

@theotherjimmy
Copy link
Contributor Author

@MarceloSalazar Could you take a look?

@theotherjimmy
Copy link
Contributor Author

Adding needs work for parsing tests.

@theotherjimmy
Copy link
Contributor Author

theotherjimmy commented Sep 15, 2017

GCC Performance

Speedup: 10.86

Before

# git checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
# time python2 tools/memap.py -t GCC_ARM BUILD/tests/nrf51_dk/gcc_arm/TESTS/events/queue/queue.map -d 0
+------------------+-------+-------+------+
| Module           | .text | .data | .bss |
+------------------+-------+-------+------+
| [fill]           |    90 |     3 |   41 |
| [lib]/libc.a     | 24469 |  2472 |   56 |
| [lib]/libgcc.a   |  7134 |     0 |    0 |
| [lib]/libnosys.a |    32 |     0 |    0 |
| [lib]/misc       |   208 |     8 |   25 |
| blinky/BUILD     | 20425 |   261 | 5324 |
| fat-fs/BUILD     |  1389 |     0 | 1644 |
| main.o           |  9751 |     4 |  413 |
| mbed-os/BUILD    |  9215 |    12 |  769 |
| Subtotals        | 72713 |  2760 | 8272 |
+------------------+-------+-------+------+
Total Static RAM memory (data + bss): 11032 bytes
Total Flash memory (text + data): 75473 bytes

real    1.012522411s

After

# git checkout improve-memap-performance
Switched to branch 'improve-memap-performance'
Your branch is up-to-date with 'theotherjimmy/improve-memap-performance'.
# time python2 tools/memap.py -t GCC_ARM BUILD/tests/nrf51_dk/gcc_arm/TESTS/events/queue/queue.map -d 0
Malformed input found when parsing GCC map: linker stubs
Malformed input found when parsing GCC map: linker stubs
Malformed input found when parsing GCC map: linker stubs
Malformed input found when parsing GCC map: linker stubs
+--------------------------------+-------+-------+------+
| Module                         | .text | .data | .bss |
+--------------------------------+-------+-------+------+
| TESTS/events                   |  9751 |     4 |  413 |
| [fill]                         |    90 |     3 |   41 |
| [lib]/libc.a                   | 24469 |  2472 |   56 |
| [lib]/libgcc.a                 |  7134 |     0 |    0 |
| [lib]/libnosys.a               |    32 |     0 |    0 |
| [lib]/misc                     |   208 |     8 |   25 |
| drivers/I2C.o                  |    18 |     0 |    0 |
| drivers/InterruptIn.o          |    90 |     0 |    0 |
| drivers/RawSerial.o            |   392 |     0 |    0 |
| drivers/SerialBase.o           |   398 |     0 |    0 |
| drivers/Ticker.o               |   238 |     0 |    0 |
| drivers/Timeout.o              |   167 |     0 |    0 |
| drivers/Timer.o                |   202 |     0 |    0 |
| drivers/TimerEvent.o           |   180 |     0 |    0 |
| events/EventQueue.o            |    62 |     0 |    0 |
| events/equeue                  |  1514 |     0 |   93 |
| features/frameworks            |  7286 |    69 |  513 |
| hal/mbed_gpio.o                |   220 |     0 |    0 |
| hal/mbed_lp_ticker_api.o       |    56 |     0 |   24 |
| hal/mbed_sleep_manager.o       |   219 |     0 |    2 |
| hal/mbed_ticker_api.o          |   436 |     0 |    0 |
| hal/mbed_us_ticker_api.o       |    56 |     0 |   24 |
| platform/mbed_alloc_wrappers.o |    44 |     0 |    0 |
| platform/mbed_assert.o         |    85 |     0 |    0 |
| platform/mbed_board.o          |   366 |     0 |    0 |
| platform/mbed_critical.o       |    62 |     0 |    0 |
| platform/mbed_error.o          |    44 |     0 |    1 |
| platform/mbed_retarget.o       |   647 |     4 |  264 |
| platform/mbed_wait_api_rtos.o  |    64 |     0 |    0 |
| rtos/TARGET_CORTEX             | 10647 |   180 | 6036 |
| rtos/Thread.o                  |     8 |     0 |    0 |
| targets/TARGET_NORDIC          |  7528 |    20 |  780 |
| Subtotals                      | 72713 |  2760 | 8272 |
+--------------------------------+-------+-------+------+
Total Static RAM memory (data + bss): 11032 bytes
Total Flash memory (text + data): 75473 bytes

real    0.093222451s

@theotherjimmy
Copy link
Contributor Author

theotherjimmy commented Sep 15, 2017

Arm Compiler 5/6 performance

Speedup: 14.67

Before

# git checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
# time python2 tools/memap.py -t ARM BUILD/tests/nrf51_dk/arm/TESTS/events/queue/queue.map -e table
+-----------------+-------+-------+------+
| Module          | .text | .data | .bss |
+-----------------+-------+-------+------+
| [lib]/c_p.l     | 11603 |    16 |  348 |
| [lib]/cpprt_p.l |    44 |     0 |    0 |
| [lib]/fz_ps.l   |    32 |     0 |    0 |
| [lib]/m_ps.l    |    44 |     0 |    0 |
| anon$$obj.o     |    32 |     0 |    0 |
| blinky/BUILD    | 22684 |   395 | 5115 |
| fat-fs/BUILD    |  1680 |  2369 |   76 |
| main.o          | 10902 |     8 |  408 |
| mbed-os/BUILD   | 10677 |    74 | 1104 |
| Subtotals       | 57698 |  2862 | 7051 |
+-----------------+-------+-------+------+
Total Static RAM memory (data + bss): 9913 bytes
Total Flash memory (text + data): 60560 bytes

real    0.884907055s

After

# git checkout improve-memap-performance
Switched to branch 'improve-memap-performance'
Your branch is up-to-date with 'theotherjimmy/improve-memap-performance'.
# time python2 tools/memap.py -t ARM BUILD/tests/nrf51_dk/arm/TESTS/events/queue/queue.map -e table
+--------------------------------+-------+-------+------+
| Module                         | .text | .data | .bss |
+--------------------------------+-------+-------+------+
| ./main.o                       | 10902 |     8 |  408 |
| [lib]/c_p.l                    | 11603 |    16 |  348 |
| [lib]/cpprt_p.l                |    44 |     0 |    0 |
| [lib]/fz_ps.l                  |    32 |     0 |    0 |
| [lib]/m_ps.l                   |    44 |     0 |    0 |
| anon$$obj.o                    |    32 |     0 |    0 |
| drivers/I2C.o                  |    76 |     0 |    0 |
| drivers/InterruptIn.o          |    85 |     0 |    0 |
| drivers/RawSerial.o            |   324 |     0 |    0 |
| drivers/SerialBase.o           |   460 |     0 |    0 |
| drivers/Ticker.o               |   198 |     0 |    0 |
| drivers/Timeout.o              |    64 |     0 |    0 |
| drivers/Timer.o                |   188 |     0 |    0 |
| drivers/TimerEvent.o           |   144 |     0 |    0 |
| events/EventQueue.o            |    76 |     0 |    0 |
| events/equeue                  |  1838 |     8 |   88 |
| features/frameworks            |  8049 |   140 |  512 |
| hal/mbed_gpio.o                |   256 |     0 |    0 |
| hal/mbed_lp_ticker_api.o       |    56 |     0 |   24 |
| hal/mbed_sleep_manager.o       |   216 |     2 |    0 |
| hal/mbed_ticker_api.o          |   464 |     0 |    0 |
| hal/mbed_us_ticker_api.o       |    56 |     0 |   24 |
| platform/FileBase.o            |   244 |     4 |   52 |
| platform/FilePath.o            |   170 |     0 |    0 |
| platform/mbed_alloc_wrappers.o |    16 |     0 |    0 |
| platform/mbed_assert.o         |    84 |     0 |    0 |
| platform/mbed_board.o          |   380 |     0 |    0 |
| platform/mbed_critical.o       |    64 |     0 |    0 |
| platform/mbed_error.o          |    44 |     1 |    0 |
| platform/mbed_retarget.o       |  1191 |     8 |  376 |
| platform/mbed_wait_api_rtos.o  |    56 |     0 |    0 |
| rtos/Mutex.o                   |   252 |     0 |    0 |
| rtos/TARGET_CORTEX             | 11987 |  2569 | 4300 |
| rtos/Thread.o                  |     8 |     0 |    0 |
| targets/TARGET_NORDIC          |  7995 |   106 |  919 |
| Subtotals                      | 57698 |  2862 | 7051 |
+--------------------------------+-------+-------+------+
Total Static RAM memory (data + bss): 9913 bytes
Total Flash memory (text + data): 60560 bytes

real    0.060380164s

@theotherjimmy
Copy link
Contributor Author

theotherjimmy commented Sep 15, 2017

IAR Performance

Speedup: 11.38

Before

# git checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
# time python2 tools/memap.py -t IAR BUILD/tests/nrf51_dk/iar/TESTS/events/queue/queue.map
+----------------------+-------+-------+------+
| Module               | .text | .data | .bss |
+----------------------+-------+-------+------+
| [lib]/dl6M_tlf.a     | 11882 |   480 |  716 |
| [lib]/dlpp6M_tl_fc.a |    64 |     0 |    0 |
| [lib]/m6M_tl.a       |  1368 |     0 |    0 |
| [lib]/rt6M_tl.a      |   974 |     0 |    0 |
| [misc]               |   239 |     0 |    0 |
| blinky/BUILD         | 20712 |   488 | 5514 |
| fat-fs/BUILD         |  1520 |     0 | 1700 |
| main.o               | 13206 |     0 |  413 |
| mbed-os/BUILD        |  8870 |     8 |  818 |
| Subtotals            | 58835 |   976 | 9161 |
+----------------------+-------+-------+------+
Total Static RAM memory (data + bss): 10137 bytes
Total Flash memory (text + data): 59811 bytes

real    0.757136647s

After

git checkout improve-memap-performance
Switched to branch 'improve-memap-performance'
Your branch is up-to-date with 'theotherjimmy/improve-memap-performance'.
# time python2 tools/memap.py -t IAR BUILD/tests/nrf51_dk/iar/TESTS/events/queue/queue.map
+--------------------------------+-------+-------+------+
| Module                         | .text | .data | .bss |
+--------------------------------+-------+-------+------+
| ./main.o                       | 13206 |     0 |  413 |
| [lib]/dl6M_tlf.a               | 11882 |   480 |  716 |
| [lib]/dlpp6M_tl_fc.a           |    64 |     0 |    0 |
| [lib]/m6M_tl.a                 |  1368 |     0 |    0 |
| [lib]/rt6M_tl.a                |   974 |     0 |    0 |
| [misc]                         |   239 |     0 |    0 |
| drivers/RawSerial.o            |   276 |     0 |    0 |
| drivers/SerialBase.o           |   420 |     0 |    0 |
| drivers/Ticker.o               |   206 |     0 |    0 |
| drivers/Timeout.o              |    64 |     0 |    0 |
| drivers/Timer.o                |   210 |     0 |    0 |
| drivers/TimerEvent.o           |   146 |     0 |    0 |
| events/EventQueue.o            |    72 |     0 |    0 |
| events/equeue                  |  1556 |     0 |   96 |
| features/frameworks            |  7076 |   196 |  356 |
| hal/mbed_gpio.o                |   220 |     0 |    0 |
| hal/mbed_lp_ticker_api.o       |    48 |     0 |   24 |
| hal/mbed_sleep_manager.o       |   208 |     0 |    2 |
| hal/mbed_ticker_api.o          |   660 |     0 |    0 |
| hal/mbed_us_ticker_api.o       |    48 |     0 |   24 |
| platform/ATCmdParser.o         |    32 |     0 |    0 |
| platform/FileBase.o            |    96 |     0 |   56 |
| platform/FilePath.o            |   100 |     0 |    0 |
| platform/mbed_alloc_wrappers.o |    10 |     0 |    0 |
| platform/mbed_assert.o         |    84 |     0 |    0 |
| platform/mbed_board.o          |   600 |     0 |    0 |
| platform/mbed_critical.o       |    54 |     0 |    0 |
| platform/mbed_error.o          |    40 |     0 |    1 |
| platform/mbed_retarget.o       |   400 |     0 |  124 |
| platform/mbed_rtc_time.o       |   156 |     0 |    0 |
| platform/mbed_wait_api_rtos.o  |    56 |     0 |    0 |
| rtos/Mutex.o                   |   192 |     0 |    0 |
| rtos/RtosTimer.o               |    24 |     0 |    0 |
| rtos/TARGET_CORTEX             | 10866 |   180 | 6432 |
| rtos/Thread.o                  |    68 |     0 |    0 |
| targets/TARGET_NORDIC          |  7114 |   120 |  917 |
| Subtotals                      | 58835 |   976 | 9161 |
+--------------------------------+-------+-------+------+
Total Static RAM memory (data + bss): 10137 bytes
Total Flash memory (text + data): 59811 bytes

real    0.066505842s

@theotherjimmy
Copy link
Contributor Author

I had received in person bug reports from:

So you guys might want to check this PR out.

@MarceloSalazar
Copy link

MarceloSalazar commented Sep 18, 2017

The speedup is great! :)
But I'm seeing that 'mbed-os' folder is gone, which was one of the things that help to understand where memory is going (things that are part of mbed-os vs other external libraries, and application).
Can we check this please?

@theotherjimmy
Copy link
Contributor Author

@MarceloSalazar That was part of the cause of testing being so horribly slow on some people's machines. Check #5124. I'm going to add a bit of detail as to why this is a problem.

@theotherjimmy
Copy link
Contributor Author

But I'm seeing that 'mbed-os' folder is gone,

Only for tests.

@theotherjimmy
Copy link
Contributor Author

Can we check this please?

This is the intended behavior of mbed OS tests.

@MarceloSalazar
Copy link

But I'm seeing that 'mbed-os' folder is gone,
Only for tests.
This is the intended behavior of mbed OS tests.

Could we add/concatenate the 'mbed-os' text, so the output is consistent with the apps' output as well? In the end, tests as also some kind of apps.

@theotherjimmy
Copy link
Contributor Author

so the output is consistent with the apps' output as well?

To some degree, this PR is already consistent between the apps and tests; As this PR stands, the output is always relative to the root of the app's containing directory. You are suggesting that it "Look the same" which is a different form of consistency.

@theotherjimmy
Copy link
Contributor Author

FYI, I'm going to be completely reworking this PR. There's yet another speedup to take advantage of: the resources object already contains all of the information that we could get from scanning. So let's not be silly and just use that.

@theotherjimmy
Copy link
Contributor Author

Hmmmm... bad rebase. Just a sec.

@theotherjimmy theotherjimmy force-pushed the improve-memap-performance branch 2 times, most recently from 2b6d4c6 to 7fe7eee Compare September 18, 2017 20:24
@theotherjimmy
Copy link
Contributor Author

There we go; that should be a clean history.

Instead of scanning.

Is a 8ms/15% speedup.
We just don't care if we don't know where they go
@theotherjimmy theotherjimmy force-pushed the improve-memap-performance branch from 07687aa to 6d135c2 Compare September 18, 2017 20:45
Now we don't have to scan!

This is a 20% speed improvement
@theotherjimmy theotherjimmy force-pushed the improve-memap-performance branch from 9375cb6 to fdd4ae3 Compare September 18, 2017 22:18
@theotherjimmy
Copy link
Contributor Author

Second round here: IAR is now down to 60ms, and ARM down to 55ms. No files were scanned.

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 20, 2017

What is the current status of this patch?

Any more reviews from the referenced ppl here?

@MarceloSalazar
Copy link

From the technical point of view, I'll let others comment, but the figures speak for themselves :)

From the usability point of view, I still believe we need consistency and generate output that looks the same (IMO tests are basically apps with main components: mbed-os, C/C++ library, and the app/tests itself). Also, there are consumers for this output that would like consistency, either CI systems that need to keep track of this information or others working on UI tools to display/analyze it.

@theotherjimmy
Copy link
Contributor Author

@0xc0170 I have to write a test for GCC parsing. Then we can run it through a morph test.

@theotherjimmy
Copy link
Contributor Author

@MarceloSalazar I have to say that I don't think it's "consistent" to do path mangling for only tests. As a user, If I saw a .o file in the output, I would expect to see it exactly where memap says it is, relative to the build directory.

IMO tests are basically apps with main components: mbed-os, C/C++ library, and the app/tests itself

I think this is the point of contention. They are built that way (to some extent) but they are not structured that way in the filesystem. Like I said in another comment, I think the form of consistency I am suggesting is good for users, as it can help them discover how tests are build and structured and does not hide either of those aspects of test.

@sg- Could you share your opinion?

@sg-
Copy link
Contributor

sg- commented Sep 20, 2017

if I created a folder test and imported mbed-os into it and then ran mbed test would that solve the problem?

@theotherjimmy
Copy link
Contributor Author

theotherjimmy commented Sep 20, 2017

if I created a folder test and imported mbed-os into it and then ran mbed test would that solve the problem?

That would create a build error. You would have 2 of each Mbed OS symbol.

If you also added a .mbed file, then ran mbed compile that would produce the output @MarceloSalazar is suggesting.

It's just not used anymore
@theotherjimmy
Copy link
Contributor Author

/morph test

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 21, 2017

@adbridge @MarceloSalazar Please review again

@MarceloSalazar
Copy link

Looks good! :)

@mbed-bot
Copy link

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 1350

All builds and test passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants