Moving some examples testing from K64F to K66F #8835

OPpuolitaival · 2018-11-22T08:21:12Z

Description

After analysing our CI I noticed that:

Slowest part now is examples testing
Slowest part of example testing is K64F because that is configured so many examples

I figured out multiple ways to fix this problem inside CI. Still fasters way is to do balancing between K64F and K66F in example testing. I think that we are not missing meaningful testing coverage with this change. This make our CI bottle neck a bit faster and give more time for better solution.

Before:
K64F:
nanostack-border-router
mbed-os-example-mesh-minimal
mbed-os-example-tls
mbed-os-example-ble
mbed-os-example-nvstore
mbed-os-example-devicekey
mbed-os-example-thread-statistics
mbed-os-example-sys-info
mbed-os-example-cpu-usage
mbed-os-example-cpu-stats
mbed-os-example-error-handling
mbed-os-example-bootloader
mbed-os-example-blockdevice
K66F:
nanostack-border-router
mbed-os-example-mesh-minimal

After:
K64F:
mbed-os-example-sys-info
mbed-os-example-cpu-usage
mbed-os-example-cpu-stats
mbed-os-example-error-handling
mbed-os-example-bootloader
mbed-os-example-blockdevice

K66F:
nanostack-border-router
mbed-os-example-mesh-minimal
mbed-os-example-tls
mbed-os-example-ble
mbed-os-example-nvstore
mbed-os-example-devicekey
mbed-os-example-thread-statistics

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[x] Test update
[ ] Breaking change

0xc0170 · 2018-11-22T08:33:32Z

How does this improve it ? This is just a build, so no matter what target is there, should not block it in our test farm (only building the code) ?

OPpuolitaival · 2018-11-22T09:03:05Z

@0xc0170 Exporter is now slower than greentea test job. And this is slowest part of exporters job

0xc0170 · 2018-11-22T09:30:54Z

I approve as quick fix if really required but still failing to understand why building examples is affected by what target is selected.

Exporters are not tests on device - we do not need actual device, so can fire as many builds as we can.

bulislaw · 2018-11-22T09:37:49Z

That doesn't make sense unless something is really broken. How balancing, build only, jobs through different devices types (?) helps speed the build? Do we have some weird limit on parallel builds for single device type?

cmonr · 2018-11-22T11:49:22Z

@ARMmbed/mbed-os-maintainers @bulislaw

This isn't a problem with the new CI. This problem existed as a quirk of the old CI as well.
The last target that would be building in the old export CI was almost always the K64F, because when this list was parsed, the K64F target would get the most examples to test.

Fyi: #8246 (comment)

0xc0170 · 2018-11-22T14:08:16Z

From examples.py, are we using this function from do_compile: results = lib.compile_repos(config, args.toolchains, args.mcu, args.profile, examples) ?

How faster this PR makes it? Can't we do something rather with the example script ? If we add more devices in examples.json file - might match the number of k64f, we will face the same problem again.

Let's get some numbers and path forward. As previously, if this decreases the number in great number, and we dont loose the coverage - can be temporary workaround. As a fix, we need to look deeper how to parallelize the examples test (using as much resources as we have/can).

bulislaw

As a temporary workaround.

OPpuolitaival · 2018-11-26T11:29:11Z

Job timeouted.. need to restart whole pipeline

0xc0170 · 2018-11-26T11:55:05Z

Job timeouted.. need to restart whole pipeline

Not just exporters? Is this sill up to date? If yes, please restart it

cmonr · 2018-11-26T15:49:27Z

The PR is active in CI.

studavekar · 2018-11-26T15:49:29Z

@ARMmbed/mbed-os-maintainers @bulislaw

This isn't a problem with the new CI. This problem existed as a quirk of the old CI as well.
The last target that would be building in the old export CI was almost always the K64F, because when this list was parsed, the K64F target would get the most examples to test.

Fyi: #8246 (comment)

@cmonr @OPpuolitaival why it's taking more than 5 hours now? when its supposed to complete within an hour. This looks like workaround which is been added to for slower exporter job. I do like the idea to split the build, however, we need to root cause fix actual issue of longer build time.

cmonr · 2018-11-26T15:50:14Z

we need to original longer build time.

@studavekar ?

studavekar · 2018-11-26T15:52:01Z

@cmonr updated. we need root cause and fix why its 5 hours to complete.

cmonr · 2018-11-26T15:58:40Z

@studavekar The root cause of the exporter issue was apparently Windows symlinking, which was mentioned in other PRs.

That being said, I'm still in favor if this change.

How faster this PR makes it? Can't we do something rather with the example script ? If we add more devices in examples.json file - might match the number of k64f, we will face the same problem again.

@0xc0170 The problem is that parallelization happens on the Compiler/Target level, which means that when a single exporter node is allocated, it will run all examples for a given Target/Compiler pairing.

The main modification that could be done to the script could be to build the exported jobs in parallel, which might not be a quick fix. Outside of this, the Jenkins pipeline itself would need a way to split the list of examples, which definitely would not be a quick fix, and I'd rather have @ARMmbed/mbed-os-test focus on other things.

studavekar · 2018-11-26T16:05:52Z

@cmonr old exporter used symlink too, it's difficult to comprehend windows symlink will bump the time to additional 4 hour.

However if that is been root caused and fixed then am happy with this change.

cmonr · 2018-11-26T16:09:21Z

@studavekar I think there was a subtle distinction to be made with the old CI.
Iirc, the symlinks were created in Linux, archived, and extracted when the Exporter job was started.

In the new CI, I think the symlinks were being created in the node itself. @OPpuolitaival could probably answer better.

studavekar · 2018-11-26T16:15:20Z

@studavekar I think there was a subtle distinction to be made with the old CI.
Iirc, the symlinks were created in Linux, archived, and extracted when the Exporter job was started.

In the new CI, I think the symlinks were being created in the node itself. @OPpuolitaival could probably answer better.

No that is not true, for windows node symlinks are created on the node itself even for old CI.

cmonr · 2018-11-26T16:17:06Z

@studavekar Oh. Well, that was my misunderstanding then.

OPpuolitaival · 2018-11-27T16:02:32Z

Exporters script is now fixed

0xc0170 · 2018-11-27T16:05:59Z

Exporters script is now fixed

What does this mean? Not needed this patch or ?

mbed-ci · 2018-11-27T18:11:44Z

Test run: SUCCESS

Summary: 4 of 4 test jobs passed
Build number : 4
Build artifacts
Build logs

cmonr · 2018-11-27T19:26:18Z

@0xc0170 I think that was him saying that he was starting CI on the PR?

cmonr · 2018-11-27T19:27:31Z

@OPpuolitaival By the way, I don't appreciate that this PR was started when we're still working to get the remaining five PRs needed for code freeze, and how we found out it was started.

Would have at least appreciated a heads up that this was being started and/or tested.

OPpuolitaival · 2018-11-28T12:44:27Z

@cmonr jenkins is totally in idle state whole morning in Finland time. Should not be big thing to warm up it a bit

cmonr · 2018-11-28T19:32:49Z

@cmonr jenkins is totally in idle state whole morning in Finland time. Should not be big thing to warm up it a bit

@OPpuolitaival Ah, sorry about that. I wasn't aware it had gone idle. We/I can get a bit antsy when it comes to tracking exactly what is in CI, especially during a release. A heads up/comment somewhere next time would be appreciated.

Moving some examples testing from K64F to K66F

dcafca7

0xc0170 added the needs: review label Nov 22, 2018

0xc0170 requested a review from a team November 22, 2018 08:32

bulislaw approved these changes Nov 22, 2018

View reviewed changes

cmonr approved these changes Nov 26, 2018

View reviewed changes

cmonr added the release-version: 5.11.1 label Nov 26, 2018

cmonr added ready for merge and removed needs: review labels Nov 27, 2018

cmonr merged commit d6b2a1a into ARMmbed:master Nov 29, 2018

cmonr removed the ready for merge label Nov 29, 2018

0xc0170 mentioned this pull request Feb 11, 2019

K64, 840_DK targets; Gatt and security examples, removed LED #9644

Closed

Moving some examples testing from K64F to K66F #8835

Moving some examples testing from K64F to K66F #8835

Uh oh!

Conversation

OPpuolitaival commented Nov 22, 2018

Description

Pull request type

Uh oh!

0xc0170 commented Nov 22, 2018

Uh oh!

OPpuolitaival commented Nov 22, 2018

Uh oh!

0xc0170 commented Nov 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bulislaw commented Nov 22, 2018

Uh oh!

cmonr commented Nov 22, 2018

Uh oh!

0xc0170 commented Nov 22, 2018

Uh oh!

bulislaw left a comment

Choose a reason for hiding this comment

Uh oh!

OPpuolitaival commented Nov 26, 2018

Uh oh!

0xc0170 commented Nov 26, 2018

Uh oh!

cmonr commented Nov 26, 2018

Uh oh!

studavekar commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmonr commented Nov 26, 2018

Uh oh!

studavekar commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmonr commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

studavekar commented Nov 26, 2018

Uh oh!

cmonr commented Nov 26, 2018

Uh oh!

studavekar commented Nov 26, 2018

Uh oh!

cmonr commented Nov 26, 2018

Uh oh!

OPpuolitaival commented Nov 27, 2018

Uh oh!

0xc0170 commented Nov 27, 2018

Uh oh!

mbed-ci commented Nov 27, 2018

Test run: SUCCESS

Uh oh!

cmonr commented Nov 27, 2018

Uh oh!

cmonr commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OPpuolitaival commented Nov 28, 2018

Uh oh!

cmonr commented Nov 28, 2018

Uh oh!

Uh oh!

0xc0170 commented Nov 22, 2018 •

edited

Loading

studavekar commented Nov 26, 2018 •

edited

Loading

studavekar commented Nov 26, 2018 •

edited

Loading

cmonr commented Nov 26, 2018 •

edited

Loading

cmonr commented Nov 27, 2018 •

edited

Loading