Skip to content

rand() and srand() are made sparse to not have same block. #9820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

deepikabhavnani
Copy link

Description

Random number generation API's - rand() and srand() are made sparse to not have same block.

Failure was noted when 2 threads getting same random number were trying to write to same block and rechecking resulted in error (since block was over-written by another thread).
Random number generation in case of IAR 8, requires TLS support and without TLS it may result into same random number since same main thread space is used for all threads.

This may happen in other cases as well when rand() gives same values for for 2-threads.
Future solution recommendation will be:
Instead of checking the same random data written, checksum logic can be used in future. Threads are allowed to overwrite the data (block commands are not in mutex) in test, so instead of verifying the exact data some checksum of data should be checked to make sure data was not jumbled up 2-threads accessing it.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@ARMmbed/mbed-os-storage

@ARMmbed/mbed-os-maintainers - Commit in applied to #9818 PR to test with IAR 8 and here to test with IAR 7 as well.

Please consider merging this when approved.

Random number generation in case of IAR 8, requires TLS support.
Thread local storage is not part of Mbed OS, test is updated to have random
numbers sparse, but in future random number creation should be moved to main
thread, or use some other logic for randomization instead of rand() call.
@deepikabhavnani
Copy link
Author

Started CI to test the fix

@mbed-ci
Copy link

mbed-ci commented Feb 23, 2019

Test run: SUCCESS

Summary: 12 of 12 test jobs passed
Build number : 1
Build artifacts

@offirko
Copy link
Contributor

offirko commented Feb 24, 2019

PR #9650 (delivered 2 weeks ago) fixed block overwrite by different threads.
Did the "Failure was noted" happen on the modified test?
#9650

If the failure did not happen on the updated test then Pls see my remark to Martin's question at #9650, and let me know if its sufficient.

AFAIK, seeding at +1 or +2 shouldn't make a difference, because a +1 difference in seed results in a "random" difference in the rand series (and this test doesn't really require a random series, only a set of 50 different numbers).

@deepikabhavnani
Copy link
Author

Error was noted after #9650. I just tried different seed as +1 was not working with IAR8 but the proper solution I will leave it to storage team as stated in description. And +2 worked for all compilers so it will be stable for some time

‘Time’ should not be used for rand it will have its own issue and with time we will have sec precision.

Generating a random number table for threads to use will be a good option.

@kjbracey
Copy link
Contributor

kjbracey commented Feb 25, 2019

Simplest solution is to make sure the threads can never collide by interleaving their accesses:

int threaded_rand_number = (rand() * SPIF_TEST_NUM_OF_THREADS) + this_thread_number;
bd_addr_t block = (threaded_rand_number * block_size) % block_device.size();

(This might be too simplisitic if trying to stress test the actual storage device and its controller - the striped access pattern wouldn't fully exercise a cache - but for a general a test of our software and all the block device drivers should be fine).

@offirko
Copy link
Contributor

offirko commented Feb 25, 2019

@deepikabhavnani - can you please supply logs for the failing in IAR8, probably requires a JIRA as well

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 25, 2019

@deepikabhavnani - can you please supply logs for the failing in IAR8, probably requires a JIRA as well

This one failed http://mbed-os-ci.s3-website-eu-west-1.amazonaws.com/?prefix=jenkins-ci/ARMmbed/mbed-os/mbed-os-ci/PR-9811/artifacts/9811/1/greentea-test/1239/FAIL/, test can be found #9811 (run twice to confirm the failures)

@offirko
Copy link
Contributor

offirko commented Feb 25, 2019

@0xc0170 : The change suggested by @deepikabhavnani uses a different series of random numbers. It resolves the problem for IAR8 but might later on introduce more conflicts under a different setup.
The change suggested by @kjbracey-arm will resolve the conflicts all together. @theamirocohen is currently working on removing the random block test for FlashIAP because it often has sectors of different sizes and the test doesn't support this scenario. I believe for the long run the plan is to remove this random block test for SPIF and QSPIF as well, because they too often have sectors of different sizes.

@deepikabhavnani
Copy link
Author

Closing this PR and will wait for @theamirocohen to add the fix.

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 26, 2019

The change suggested by @kjbracey-arm will resolve the conflicts all together. @theamirocohen is currently working on removing the random block test for FlashIAP because it often has sectors of different sizes and the test doesn't support this scenario. I believe for the long run the plan is to remove this random block test for SPIF and QSPIF as well, because they too often have sectors of different sizes.

OK, but IAR8 feature branch is now ready and includes this fix. Let me verify this is OK with you to go ahead and fix on master asap.

@theamirocohen
Copy link
Contributor

@0xc0170
#9721 takes care of both issues (removing the random block test for FlashIAP & @kjbracey-arm fix for random issues).
For SPIF and QSPIF components test they will be soon merged into the general block device tests.

@deepikabhavnani
Copy link
Author

@0xc0170
Please go ahead with fix in #9721. I will add a PR to revert change in IAR 8 feature branch.

@theamirocohen
Copy link
Contributor

@deepikabhavnani @0xc0170
#9855 handles the SPIF and QSPIF components test merge to general block device tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants