Skip to content

Moving some examples testing from K64F to K66F #8835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 29, 2018

Conversation

OPpuolitaival
Copy link
Contributor

Description

After analysing our CI I noticed that:

  • Slowest part now is examples testing
  • Slowest part of example testing is K64F because that is configured so many examples

I figured out multiple ways to fix this problem inside CI. Still fasters way is to do balancing between K64F and K66F in example testing. I think that we are not missing meaningful testing coverage with this change. This make our CI bottle neck a bit faster and give more time for better solution.

Before:
K64F:
nanostack-border-router
mbed-os-example-mesh-minimal
mbed-os-example-tls
mbed-os-example-ble
mbed-os-example-nvstore
mbed-os-example-devicekey
mbed-os-example-thread-statistics
mbed-os-example-sys-info
mbed-os-example-cpu-usage
mbed-os-example-cpu-stats
mbed-os-example-error-handling
mbed-os-example-bootloader
mbed-os-example-blockdevice
K66F:
nanostack-border-router
mbed-os-example-mesh-minimal

After:
K64F:
mbed-os-example-sys-info
mbed-os-example-cpu-usage
mbed-os-example-cpu-stats
mbed-os-example-error-handling
mbed-os-example-bootloader
mbed-os-example-blockdevice

K66F:
nanostack-border-router
mbed-os-example-mesh-minimal
mbed-os-example-tls
mbed-os-example-ble
mbed-os-example-nvstore
mbed-os-example-devicekey
mbed-os-example-thread-statistics

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[x] Test update
[ ] Breaking change

@0xc0170 0xc0170 requested a review from a team November 22, 2018 08:32
@0xc0170
Copy link
Contributor

0xc0170 commented Nov 22, 2018

How does this improve it ? This is just a build, so no matter what target is there, should not block it in our test farm (only building the code) ?

@OPpuolitaival
Copy link
Contributor Author

@0xc0170 Exporter is now slower than greentea test job. And this is slowest part of exporters job

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 22, 2018

I approve as quick fix if really required but still failing to understand why building examples is affected by what target is selected.

Exporters are not tests on device - we do not need actual device, so can fire as many builds as we can.

@bulislaw
Copy link
Member

That doesn't make sense unless something is really broken. How balancing, build only, jobs through different devices types (?) helps speed the build? Do we have some weird limit on parallel builds for single device type?

@cmonr
Copy link
Contributor

cmonr commented Nov 22, 2018

@ARMmbed/mbed-os-maintainers @bulislaw

This isn't a problem with the new CI. This problem existed as a quirk of the old CI as well.
The last target that would be building in the old export CI was almost always the K64F, because when this list was parsed, the K64F target would get the most examples to test.

Fyi: #8246 (comment)

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 22, 2018

From examples.py, are we using this function from do_compile: results = lib.compile_repos(config, args.toolchains, args.mcu, args.profile, examples) ?

How faster this PR makes it? Can't we do something rather with the example script ? If we add more devices in examples.json file - might match the number of k64f, we will face the same problem again.

Let's get some numbers and path forward. As previously, if this decreases the number in great number, and we dont loose the coverage - can be temporary workaround. As a fix, we need to look deeper how to parallelize the examples test (using as much resources as we have/can).

Copy link
Member

@bulislaw bulislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a temporary workaround.

@OPpuolitaival
Copy link
Contributor Author

Job timeouted.. need to restart whole pipeline

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 26, 2018

Job timeouted.. need to restart whole pipeline

Not just exporters? Is this sill up to date? If yes, please restart it

@cmonr
Copy link
Contributor

cmonr commented Nov 26, 2018

The PR is active in CI.

@studavekar
Copy link
Contributor

studavekar commented Nov 26, 2018

@ARMmbed/mbed-os-maintainers @bulislaw

This isn't a problem with the new CI. This problem existed as a quirk of the old CI as well.
The last target that would be building in the old export CI was almost always the K64F, because when this list was parsed, the K64F target would get the most examples to test.

Fyi: #8246 (comment)

@cmonr @OPpuolitaival why it's taking more than 5 hours now? when its supposed to complete within an hour. This looks like workaround which is been added to for slower exporter job. I do like the idea to split the build, however, we need to root cause fix actual issue of longer build time.

@cmonr
Copy link
Contributor

cmonr commented Nov 26, 2018

we need to original longer build time.

@studavekar ?

@studavekar
Copy link
Contributor

studavekar commented Nov 26, 2018

@cmonr updated. we need root cause and fix why its 5 hours to complete.

@cmonr
Copy link
Contributor

cmonr commented Nov 26, 2018

@studavekar The root cause of the exporter issue was apparently Windows symlinking, which was mentioned in other PRs.

That being said, I'm still in favor if this change.

How faster this PR makes it? Can't we do something rather with the example script ? If we add more devices in examples.json file - might match the number of k64f, we will face the same problem again.

@0xc0170 The problem is that parallelization happens on the Compiler/Target level, which means that when a single exporter node is allocated, it will run all examples for a given Target/Compiler pairing.

The main modification that could be done to the script could be to build the exported jobs in parallel, which might not be a quick fix. Outside of this, the Jenkins pipeline itself would need a way to split the list of examples, which definitely would not be a quick fix, and I'd rather have @ARMmbed/mbed-os-test focus on other things.

@studavekar
Copy link
Contributor

@cmonr old exporter used symlink too, it's difficult to comprehend windows symlink will bump the time to additional 4 hour.

However if that is been root caused and fixed then am happy with this change.

@cmonr
Copy link
Contributor

cmonr commented Nov 26, 2018

@studavekar I think there was a subtle distinction to be made with the old CI.
Iirc, the symlinks were created in Linux, archived, and extracted when the Exporter job was started.

In the new CI, I think the symlinks were being created in the node itself. @OPpuolitaival could probably answer better.

@studavekar
Copy link
Contributor

@studavekar I think there was a subtle distinction to be made with the old CI.
Iirc, the symlinks were created in Linux, archived, and extracted when the Exporter job was started.

In the new CI, I think the symlinks were being created in the node itself. @OPpuolitaival could probably answer better.

No that is not true, for windows node symlinks are created on the node itself even for old CI.

@cmonr
Copy link
Contributor

cmonr commented Nov 26, 2018

@studavekar Oh. Well, that was my misunderstanding then.

@OPpuolitaival
Copy link
Contributor Author

Exporters script is now fixed

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 27, 2018

Exporters script is now fixed

What does this mean? Not needed this patch or ?

@mbed-ci
Copy link

mbed-ci commented Nov 27, 2018

Test run: SUCCESS

Summary: 4 of 4 test jobs passed
Build number : 4
Build artifacts
Build logs

@cmonr
Copy link
Contributor

cmonr commented Nov 27, 2018

@0xc0170 I think that was him saying that he was starting CI on the PR?

@cmonr
Copy link
Contributor

cmonr commented Nov 27, 2018

@OPpuolitaival By the way, I don't appreciate that this PR was started when we're still working to get the remaining five PRs needed for code freeze, and how we found out it was started.

Would have at least appreciated a heads up that this was being started and/or tested.

@OPpuolitaival
Copy link
Contributor Author

@cmonr jenkins is totally in idle state whole morning in Finland time. Should not be big thing to warm up it a bit

@cmonr
Copy link
Contributor

cmonr commented Nov 28, 2018

@cmonr jenkins is totally in idle state whole morning in Finland time. Should not be big thing to warm up it a bit

@OPpuolitaival Ah, sorry about that. I wasn't aware it had gone idle. We/I can get a bit antsy when it comes to tracking exactly what is in CI, especially during a release. A heads up/comment somewhere next time would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants