Skip to content

Adjustment of netsocket tests to the lossy mesh network limitation. #11654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 20, 2019

Conversation

tymoteuszblochmobica
Copy link
Contributor

@tymoteuszblochmobica tymoteuszblochmobica commented Oct 8, 2019

Description

Netsocket tests are modified according to loosy mesh network environment.
Some tests are skipped.
The failures need to be investigated by the infra team.
Local network setup, passes all tests

Pull request type

[x ] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@mikter
@SeppoTakalo
@michalpasztamobica

Release Notes

@ciarmcom
Copy link
Member

ciarmcom commented Oct 8, 2019

@tymoteuszblochmobica, thank you for your changes.
@SeppoTakalo @mikter @michalpasztamobica @ARMmbed/mbed-os-test @ARMmbed/mbed-os-ipcore @ARMmbed/mbed-os-maintainers please review.

Comment on lines 167 to 177
printf("MBED: IP address is '%s'\n", address ? address : "null");
#if MBED_CONF_TARGET_NETWORK_DEFAULT_INTERFACE_TYPE == MESH
uint8_t addr_hex[16];
stoip6(address, strlen(address), addr_hex);
if ((addr_hex[0] & 0xe0) != 0x20) {
printf("Adress is not globlal Please check Border Router TEST SKIP \n");
GREENTEA_TESTSUITE_RESULT(false);
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same code block has now been copied on many places. Can it me just one function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could follow the way this is done with a function+macro for SKIP_IF_TCP_UNSUPPORTED?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @tymoteuszblochmobica pointed out to me, that this is already only copied once per test directory. We can wrap it with a function, but we'll have to declare and define this function for every test directory (tcp, tls, dns, udp) anyway. We've been through similar discussion in #10037 and we agreed that there is no function sharing between different test suites, unfortunately.

So I'm afraid this cannot be just one function.

Comment on lines 228 to 231
#if MBED_CONF_TARGET_NETWORK_DEFAULT_INTERFACE_TYPE != MESH
Case("TCPSOCKET_RECV_100K", TCPSOCKET_RECV_100K),
Case("TCPSOCKET_RECV_100K_NONBLOCK", TCPSOCKET_RECV_100K_NONBLOCK),
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this makes sense?

Does it really help if we start skipping test cases?

Copy link
Contributor Author

@tymoteuszblochmobica tymoteuszblochmobica Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a big challenge to pass 100k recv in mesh.

After increasing timeouts up to 20 minutes tests mostly pass but still sometimes fails.
Smaller timeouts causes 100% fails

Its rfc864 tests so receiving DUT cant slow down transmision if mesh network is unable to deliver packets on time.

So problem with TCP retransmision traffic will arise.

Maybe the solution it to keep this increased timeouts for mesh and run tests only in RF box under no other devices "RF traffic "and "clear channel" condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test skip reverted. It is successful under local mesh BR. Testing mesh performance under heavy RF traffic is beyond the scope of netsockets tests.

Comment on lines 194 to 196
#if MBED_CONF_TARGET_NETWORK_DEFAULT_INTERFACE_TYPE != MESH
Case("UDPSOCKET_ECHOTEST_BURST_NONBLOCK", UDPSOCKET_ECHOTEST_BURST_NONBLOCK),
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we should not skip.
Its basically close to what DTLS handshake is sending. Not accurate, but still emulates it.
Now we are just hiding problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test skip reverted. It is successful under local mesh BR. Testing mesh performance under heavy RF traffic is beyond the scope of netsockets tests.

Comment on lines 199 to 201
#if MBED_CONF_TARGET_NETWORK_DEFAULT_INTERFACE_TYPE != MESH
Case("UDPSOCKET_ECHOTEST_BURST", UDPSOCKET_ECHOTEST_BURST),
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.. I don't see the point of skipping this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test skip reverted. It is successful under local mesh BR. Testing mesh performance under heavy RF traffic is beyond the scope of netsockets tests.

Comment on lines 52 to 54
static const int pkt_sizes[PKTS] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, \
100, 200, 300
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then who tests that those bigger packets can pass if we are not going to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test skip reverted. It is successful under local mesh BR. Testing mesh performance under heavy RF traffic is beyond the scope of netsockets tests.

@0xc0170
Copy link
Contributor

0xc0170 commented Oct 18, 2019

@SeppoTakalo @tymoteuszblochmobica What is the status for this PR? Looks like the review has not been completed

@tymoteuszblochmobica tymoteuszblochmobica force-pushed the mesh branch 2 times, most recently from 472d254 to 512e237 Compare October 28, 2019 12:59
@tymoteuszblochmobica
Copy link
Contributor Author

Seppo can you review again?

@tymoteuszblochmobica tymoteuszblochmobica force-pushed the mesh branch 2 times, most recently from 3703305 to 7c8abc5 Compare October 28, 2019 13:05
uint8_t addr_hex[16];
stoip6(address, strlen(address), addr_hex);
if ((addr_hex[0] & 0xe0) != 0x20) {
printf("Adress is not globlal Please check Border Router TEST SKIP \n");
Copy link

@AnttiKauppila AnttiKauppila Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix typos adress -> Address and globlal -> global (+ 2 other files where you have copy pasted same mistake)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 6, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Nov 6, 2019

Test run: FAILED

Summary: 1 of 5 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_greentea-test

@michalpasztamobica
Copy link
Contributor

@tymoteuszblochmobica , apparently your check if the test is running for mesh network didn't work, although I can't see any obvious problem with it.
The logs all say:

[1573059766.57][CONN][RXD] MBED: IP address is '192.168.64.108'
[1573059766.76][CONN][RXD] Address is not global Please check Border Router TEST SKIP

I doubt that our CI is running any mesh network tests. This shouldn't trigger.

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 7, 2019

@tymoteuszblochmobica Please talk to @VeliMattiLahtela

@tymoteuszblochmobica
Copy link
Contributor Author

tymoteuszblochmobica commented Nov 7, 2019

Strange
Found in NetworkInterfaceDefaults.cpp
/* Finally the dispatch from the JSON default interface type to the specific

  • subclasses. It's our job to configure - the default NetworkInterface is
  • preconfigured - the specific subtypes' defaults are not (necessarily).
    */
    #define ETHERNET 1
    #define WIFI 2
    #define MESH 3
    #define CELLULAR 4
    #if MBED_CONF_TARGET_NETWORK_DEFAULT_INTERFACE_TYPE == 3 or prvevious
    #define MESH 3
    in main.cpp for each test is working.

@SeppoTakalo
Copy link
Contributor

What strange there is about that?
This is how default network interface is selected.

For example test configuration on mesh, see: https://github.com/ARMmbed/mbed-os/blob/master/tools/test_configs/6lowpanInterface_router.json#L26..L36

stoip6(address, strlen(address), addr_hex);
if ((addr_hex[0] & 0xe0) != 0x20) {
printf("Address is not global Please check Border Router TEST SKIP \n");
GREENTEA_TESTSUITE_RESULT(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be TEST_SKIP_MESSAGE("Mesh not working")

I don't really know the difference why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to skip the tests? We just want them to fail with a reasonable message, pointing to the border router issue, to avoid unnecessary debugging/bug reports.

@tymoteuszblochmobica
Copy link
Contributor Author

Fixed

#if MBED_CONF_TARGET_NETWORK_DEFAULT_INTERFACE_TYPE == MESH
uint8_t addr_hex[16];
stoip6(address, strlen(address), addr_hex);
if ((addr_hex[0] & 0xe0) != 0x20) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these why is there a check for address for mesh

To me this looks like bug in code that is circumvented in tests?

In applications we should wait for Global UP event and then everything should work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikter , this PR is now only trying to improve error reporting due to mesh network in RAAS being heavily loaded and often unable to perform UPD tests. We can tell that this situation occurs by observing the IP address our DUT got. If we did not receive a global address, the test will fail anyway, so we just want to minimize the amount of unnecessary debugging and print a reasonable error message suggesting

Now that you mentioned it, perhaps it is neater to wait for GLOBAL_UP event instead of checking if the address is global or not, but the essence of the changes remains the same - don't run the tests if the global connection failed and print a reasonable error message, instead.

@tymoteuszblochmobica , can we try observing the events instead of observing the IP address? I imagine we could set a blocking semaphore after connect() returns and only release it in a GLOBAL_UP event callback. If the semaphore is not released in time, the test will timeout. If the last message printed is "Waiting for GLOBAL_UP" I think that will be a clear indication that there is a connectivity problem. Does this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with GLOBAL_UP wait.

Copy link

@mikter mikter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test checking that address is 20**** meaning some default public IPv6 address and if not skip it.

This should be fixed in mesh API if there is a bug that ULA prefix is global

@mikter
Copy link

mikter commented Nov 8, 2019

If you check file meshinterfacenanostack or any other mesh there is a check when local or Global is UP and there Global UP is given even for ULA prefixes

How is mesh.connect supposed to work wait until GLOBAL_UP or LOCAL_UP. if it should wait for GLOBAL_UP the bug is in the mesh apis

if it should stop even at LOCAL_UP the bug is in test case and it should follow the same check as mesh_minimal_application and do following while loop where it waits the Global Address. not failure

@michalpasztamobica
Copy link
Contributor

Regarding what connect() should do... The connect() function will try to bringup() a particular interface. The bringup() implementations (Thread, for example) will block on the connect_semaphore, which is released in Nanostack::Interface::network_handler whenever the status is either CONNECTED, LOCAL or GLOBAL. So any of them means that connect() succeeded. This makes sense to me. The network may not always require global connectivity. This is also in line with @kjbracey-arm 's comment in #11714 - a local IPv6 address is perfectly valid, so to me connect() should return when local connectivity is present.

I agree with you, @mikter , that it is better to wait for GLOBAL_UP event until test times out. @tymoteuszblochmobica , do you also agree? Can you modify the code so tests wait until GLOBAL_UP is present?

@kjbracey
Copy link
Contributor

kjbracey commented Nov 8, 2019

There's no clearly right behaviour here.

It's similar to lwIP's dilemma about what do do when asked for IPv4+IPv6, preferring IPv6, and it gets IPv4 first. How long should "connect" wait for IPv6 before returning? Should it wait, even?

There we did decide to make it wait a bit longer to see if IPv6 came up, on the assumption that apps might just be written to do a simple "connect" then check "IPv4 or IPv6 address"?

You could do the same thing here - wait a bit longer after the local up, to see if we get a global address, for simple apps that are going to check the address immediately after connect return. It would be consistent.

@tymoteuszblochmobica
Copy link
Contributor Author

Replaced with GLOBAL_UP waiting loop.
@mikter Is it OK now?

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 15, 2019

@mikter Please review

@tymoteuszblochmobica
Copy link
Contributor Author

Rebased

Copy link
Contributor

@michalpasztamobica michalpasztamobica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xc0170 , it seems that this is now also ready to go (as soon as the current release is over). Would you please adjust the labels?

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 20, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Nov 20, 2019

Test run: SUCCESS

Summary: 5 of 5 test jobs passed
Build number : 2
Build artifacts

@0xc0170 0xc0170 removed the needs: CI label Nov 20, 2019
@0xc0170 0xc0170 merged commit 48f38a4 into ARMmbed:master Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants