Skip to content

Add thread safety to CRC class #7781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 29, 2018
Merged

Conversation

deepikabhavnani
Copy link

Description

Add thread safety to CRC class. Only commit for this PR is 839f46a

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Feature
[ ] Breaking change

Dependent on : #7735

CC @geky

}
return 0;
unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The series of if statements was cleaner IMO. Not a block, but in the future this may be better written with a series of unlocks in each return statement, or a goto to a cleanup block if the cleanup is too expensive.

Generally gotos are considered harmful, but this is one of the places where a goto makes the intention clearer than a series of nested if statements.

Copy link
Author

@deepikabhavnani deepikabhavnani Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, had a look again. I would prefer this way as it looks neat, instead of repeated cleanup code / goto. I would have agreed to your point if if statements were 4/5 level deeper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @geky, I personally prefer when the expected/successful path of the algorithm is not buried inside nested if statements.

    int32_t error = compute_partial_start(crc);
    if (error) { goto cleanup; }
    
    error = compute_partial(buffer, size, crc);
    if (error) { goto cleanup; }

    error = compute_partial_stop(crc);
    if (error) { goto cleanup; }

    return 0;

cleanup:
    *crc = 0;
    return error;

Copy link
Contributor

@geky geky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, nice test 👍

@cmonr
Copy link
Contributor

cmonr commented Aug 13, 2018

@deepikabhavnani Please take a look at some of the docs. Looks like Travis is back to normal.

Copy link
Member

@pan- pan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand the design or the use case of the change.

  • Global/Singleton lock will induce unnecessary contention if multiples different CRCs are computed in parallel.
  • CRCs are usually not calculated from different sources (do you get some bytes from a thread and other bytes from others threads ?); it is usually not a shared resource either. Why would it be thread safe by default ?

I would have split this PR into two prs: one that handle optimized CRC and the other that handles the locking part.

@@ -22,6 +22,8 @@ namespace mbed {
/** \addtogroup drivers */
/** @{*/

SingletonPtr<PlatformMutex> mbed_crc_mutex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a singleton ? It is not possible to have a mutex per MbedCRC instance ?
If the mutex is global then all crc calculations are serialized across threads; I'm not sure this is what we want.

Copy link
Author

@deepikabhavnani deepikabhavnani Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking is needed for HW, since we will have one HW module performing CRC computation for all the instances. I understand this is an overhead for software only computations or combination of software and hardware.

We can have lock/unlock only in case of hardware, does that look like correct solution or will cause confusion?

*crc = 0;
return status;
int32_t status = 0;
lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use RAII lock objects ? It would reduce the code size and ensure no unlock are forgotten.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes RAII reduce the code size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly referring to the line of code required but in the final binary; we can observe code size reduction too as the compiler is very good at optimizing destruction of objects on the stack.

Here you go: https://godbolt.org/g/nXarPA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it will behave the same if you have a cleanup label to jump to: https://godbolt.org/z/rgagh-

But good to know gcc can do that automatically for RAII objects. It's a shame it can't figure out that the unlock() calls can be shared.

}
return 0;
unlock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @geky, I personally prefer when the expected/successful path of the algorithm is not buried inside nested if statements.

    int32_t error = compute_partial_start(crc);
    if (error) { goto cleanup; }
    
    error = compute_partial(buffer, size, crc);
    if (error) { goto cleanup; }

    error = compute_partial_stop(crc);
    if (error) { goto cleanup; }

    return 0;

cleanup:
    *crc = 0;
    return error;

@@ -264,6 +288,20 @@ class MbedCRC {
return width;
}

/** Acquire exclusive access to CRC hardware/software
*/
virtual void lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this virtual ?
Note: same question for the destructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm guessing to match the other driver classes.

Unrelated, but what are your thoughts on a "Lockable" abstract class to standardize this interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the polymorphic use case for such thing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the RAII mutex generic for these classes for one.

I was more thinking of this as extra documentation. And that way we make the API is standard.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case in my mind was: application if sure of using software only CRC can overwrite these locks to be empty functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the RAII mutex generic for these classes for one.

I'm slow today; I still don't see the use case; these class lock and unlock functions aren't supposed to be internal and self contained ?

@deepikabhavnani
Copy link
Author

Global/Singleton lock will induce unnecessary contention if multiples different CRCs are computed in parallel.

Global/Singleton lock is needed when using HW for CRC computation, we cannot do different algorithms in parallel with HW

CRCs are usually not calculated from different sources (do you get some bytes from a thread and other bytes from others threads ?); it is usually not a shared resource either. Why would it be thread safe by default ?

Agree for software point of view, but it is a share resource in case of HW. Please see https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_Freescale/TARGET_MCUXpresso_MCUS/TARGET_MCU_K64F/mbed_crc_api.c and newly added test case https://github.com/ARMmbed/mbed-os/pull/7781/files#diff-add91f4d5e17cbab3ddf004660cf1d4bR137

CRC computation is splitted in 3 parts start compute_partial and stop, if another thread configures hw with different polynomial in between than results won't be correct

@pan-
Copy link
Member

pan- commented Aug 14, 2018

I do understand that we need to protect the hardware resource that compute CRC from concurrent access with a single mutex as access to that resource must be serialized. However, what I do not understand is why this requirement impacts software computation of the CRC. lock and unlock should do nothing if the mode is not hardware.

CRC computation is splitted in 3 parts start compute_partial and stop, if another thread configures hw with different polynomial in between than results won't be correct

It ain't pretty but it make sense .

@deepikabhavnani
Copy link
Author

lock and unlock should do nothing if the mode is not hardware.

👍 Will correct the behavior for software in that case and change lock/unlock functions to be non-virtual.

@geky
Copy link
Contributor

geky commented Aug 14, 2018

It's not entirely clear, but the HAL CRC API is designed to keep the current state of the CRC inside hardware registers. I'm guessing this is because fetching the CRC value from the hardware adds a cost that isn't always needed. @scartmell-arm

Here's an example from the user's perspective:

void super_duper_fast_barcode_scanner() {
    Scanner scanner;
    MbedCRC<0xabcdabcd, 32> crc;

    crc.lock(); // lock so no one else messes with the crc hardware while we're using it
    crc.compute_partial_start();

    for (int i = 0; i < 100000000; i++) { // barcode is really long
        uint8_t c = scanner.scan(); // scans one character at a time
        crc.compute_partial(&c, 1);
    }

    uint32_t crc;
    crc.compute_partial_stop(&crc);
    crc.unlock();

    MBED_ASSERT(crc == 0);
}

If this example is correct, the lock/unlock functions are very much a part of the public API. This matches the current API for Serial/SPI and other streaming classes were function calls can end up with interspersed data.

Actually looking at this example raises a few other questions

  1. Why have crc be an argument for start and compute partial? Isn't it only available at the stop stage?
  2. Can start/stop lock/unlock the hardware for the user?

@pan-
Copy link
Member

pan- commented Aug 15, 2018

I thought again about it and I came to the conclusion that lock/unlock functions should not be exposed to the application side as they are redundant and error prone:

When a CRC is computed by the hardware, the hardware lock must be held for the entirety of the computation. The following sequence would be a programming error if hardware crc is used:

crc.lock(); 
crc.compute_partial_start(...);
crc.unlock()

// get the data

crc.lock(); 
crc.compute_partial(...);
crc.compute_partial_stop(...);
crc.unlock()

The lock should not be released until the computation ends.

Therefore when a CRC is computed with hardware, the function compute_partial_start should acquire the lock and the function compute_partial_stop should release it. If an error happen in compute_partial; the lock can be released too.

@deepikabhavnani
Copy link
Author

I don't agree with this approach. Scenario above will be a programming error if hardware CRC is used and user should be aware of that. Hence user should have control of lock/unlock.

Also in all the drivers, we give that advantage to users. I would like to be consistent with all the drivers.
https://github.com/ARMmbed/mbed-os/blob/master/drivers/SPI.h#L140
https://github.com/ARMmbed/mbed-os/blob/master/drivers/I2C.h#L148

@pan-
Copy link
Member

pan- commented Aug 15, 2018

I don't think the comparison with classes such as SPI is right: the SPI class doesn't needs to be locked to be used as locking is internal and the class itself keeps enough data to reconfigure the hardware whenever it acquires the bus. The CRC class doesn't offer that.

In CRC - unlike the SPI API - the flow of a transaction is well defined with prologue and epilogue functions that must be called by the user. Why not use these functions to lock and unlock the mutex associated with the hardware ? They have to be called anyway.

That would make the class thread safe like the other drivers we propose; it isn't at the moment.

@deepikabhavnani
Copy link
Author

deepikabhavnani commented Aug 15, 2018

the SPI class doesn't needs to be locked to be used as locking is internal

I believe this is not correct, though you can reconfigure hardware during acquire bus, external locks are still needed. https://github.com/ARMmbed/mbed-os/blob/master/drivers/SPI.h#L65

In case of SD driver as well https://github.com/ARMmbed/sd-driver/blob/master/SDBlockDevice.h#L215,
we have locks as few operations on SPI bus should be done sequentially before device can free the bus.

Consider a application where I am sure only single type of CRC (say CRC-16) for two different modules. In this case start will be called just once during initialization and combination of compute_partial and stop multiple times by both modules and guarded.

If user needs complete serialization then compute should be used. Else I don;t see purpose of other 3 API's.

@deepikabhavnani
Copy link
Author

deepikabhavnani commented Aug 15, 2018

Consider a application where I am sure only single type of CRC (say CRC-16) for two different modules. In this case start will be called just once during initialization and combination of compute_partial and stop multiple times by both modules and guarded.

I did testing for this use case and it didn't worked from HAL side. HAL API;s does not pass the previous CRC (which could be used to configure the initial seed).

@geky - Please note compute_partial for LFS will not work for below sequence.

uint32_t crc = 0xffffffff;
compute_partial(&test, sizeof(test), &crc);
crc = 0xffffffff;
compute_partial(&test, sizeof(test), &crc);

It has to be

uint32_t crc = 0xffffffff;
compute(&test, sizeof(test), &crc);
crc = 0xffffffff;
compute(&test, sizeof(test), &crc);

Since we can't do much with compute_partial in hardware without compute_start. It make sense to have 'lock' and 'unlock' as private and set lock in start and only unlock in stop/error. Unless we are planning to update hal to allow setting seed value in compute_partial.

Copy link
Member

@pan- pan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the hal would change to accept an opaque state in the function that handles partial computation.

That would allow chunked parallel or interleaved hardware computation of different type of CRCs. As it is right now, the following use case is still broken without changes in the HAL (it has nothing to do with locking ...):

MbedCRC crc_a;
MbedCRC crc_b;

crc_a.compute_partial_start(...);
crc_b.compute_partial_start(...);

crc_a.compute_partial(...);
crc_b.compute_partial(...);

crc_a.compute_partial_stop(...);
crc_b.compute_partial_stop(...);

@scartmell-arm @bulislaw

@cmonr cmonr requested review from a user and bulislaw August 16, 2018 12:45
@cmonr cmonr removed the needs: work label Aug 16, 2018
@bulislaw
Copy link
Member

Ideally the hal would change to accept an opaque state in the function that handles partial computation.
That would allow chunked parallel or interleaved hardware computation of different type of CRCs. As it is right now, the following use case is still broken without changes in the HAL (it has nothing to do with locking ...):

We considered it, but it wasn't an usecase. The driver should be able to protect the access to HW without bothering user to know when it's necessary to lock the object.

@cmonr
Copy link
Contributor

cmonr commented Aug 21, 2018

/morph build

@cmonr
Copy link
Contributor

cmonr commented Aug 21, 2018

@deepikabhavnani Please take a a look at the CI failures.

Thread safety is added to serialize the hardware CRC and will not
impact the software CRC.
@deepikabhavnani
Copy link
Author

deepikabhavnani commented Aug 21, 2018

@deepikabhavnani Please take a a look at the CI failures.

Forgot to update test application (lock/unlock API's are private in MbedCRC class)

/morph build

@geky
Copy link
Contributor

geky commented Aug 21, 2018

We considered it, but it wasn't an usecase. The driver should be able to protect the access to HW without bothering user to know when it's necessary to lock the object.

How in the world could this work in a multithreaded system? Is only one thread allowed to perform CRC at a time?

@cmonr
Copy link
Contributor

cmonr commented Aug 22, 2018

@bulislaw Would you mind answering @geky's question?

@deepikabhavnani
Copy link
Author

@cmonr @geky - With current limitation in hal, yes only one thread allowed to perform CRC at a time.
However, this should be considered in future CRC hal updates.

@deepikabhavnani
Copy link
Author

@cmonr - Can we start build on this?

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 27, 2018

/morph build

@mbed-ci
Copy link

mbed-ci commented Aug 27, 2018

Build : SUCCESS

Build number : 2932
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7781/

Triggering tests

/morph test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented Aug 27, 2018

@mbed-ci
Copy link

mbed-ci commented Aug 28, 2018

@cmonr
Copy link
Contributor

cmonr commented Aug 28, 2018

Will need to retest when CI load is lower.

@NirSonnenschein
Copy link
Contributor

/morph test

1 similar comment
@cmonr
Copy link
Contributor

cmonr commented Aug 29, 2018

/morph test

@mbed-ci
Copy link

mbed-ci commented Aug 29, 2018

Test : SUCCESS

Build number : 2697
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/7781/2697

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants