Nuvoton: Fix mbedtls crypto ECC AC management failed #8985

ccli8 · 2018-12-06T08:44:57Z

Description

In mbed-os 5.11, Mbed TLS doesn't guarantee mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free()
are paired. When they are not paired, system would hang in Nuvoton's ECC AC management. This PR majorly tries to release the limit by narrwoing ECC AC open period to just real ECC AC operation. With this modification, pairing mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free() is not required, for example, multiple mbedtls_internal_ecp_init() and finally one mbedtls_internal_ecp_free() is allowed.

Related target

NUMAKER_PFM_NUC472
NUMAKER_PFM_M487/NUMAKER_IOT_M487

Related issue

#8927

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Mbed TLS doesn't guarantee mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free() are paired. To avoid multiple operations to the same ECC accelerator simultaneously, we narrow ECC accelerator open period to just real ECC accelerator operation in internal_run_eccop()/internal_run_modop().

ciarmcom · 2018-12-06T10:00:28Z

@ccli8, thank you for your changes.
@ARMmbed/mbed-os-crypto @ARMmbed/mbed-os-maintainers please review.

0xc0170 · 2018-12-06T10:30:31Z

@ccli8 Is this fixing the issue #8927 completely or there are still failures reported?

yanesca · 2018-12-06T10:46:49Z

features/mbedtls/targets/TARGET_NUVOTON/TARGET_M480/ecp/ecp_internal_alt.c

-    MBEDTLS_MPI_CHK(ecc_done ? 0 : -1);
-
+
+    MBEDTLS_MPI_CHK(ecc_done ? 0 : MBEDTLS_ERR_SSL_HW_ACCEL_FAILED);


MBEDTLS_ERR_SSL_HW_ACCEL_FAILED is for hardware accelerators that do TLS record processing, could we please return MBEDTLS_ERR_PLATFORM_HW_ACCEL_FAILED instead?

@yanesca I fix it in 406d480.

yanesca · 2018-12-06T10:46:57Z

features/mbedtls/targets/TARGET_NUVOTON/TARGET_M480/ecp/ecp_internal_alt.c

-    MBEDTLS_MPI_CHK(ecc_done ? 0 : -1);
-
+
+    MBEDTLS_MPI_CHK(ecc_done ? 0 : MBEDTLS_ERR_SSL_HW_ACCEL_FAILED);


MBEDTLS_ERR_SSL_HW_ACCEL_FAILED is for hardware accelerators that do TLS record processing, could we please return MBEDTLS_ERR_PLATFORM_HW_ACCEL_FAILED instead?

@yanesca As above

…ED when ECC H/W acceleratioin is failed MBEDTLS_ERR_SSL_HW_ACCEL_FAILED is for hardware accelerators that do TLS record processing. Replace it with MBEDTLS_ERR_PLATFORM_HW_ACCEL_FAILED when ECC H/W acceleratioin is failed.

ccli8 · 2018-12-07T02:22:35Z

Is this fixing the issue #8927 completely or there are still failures reported?

@0xc0170 No. This PR just fixes tls connection failure (simple-mbed-cloud-client-tests-dev_mgmt-connect/Initialize Simple PDMC or simple-mbed-cloud-client-tests-dev_mgmt-update/Initialize Simple PDMC) on mbed-os 5.11 rc or master branch. Other PDMC failures are under checking.

kjbracey · 2018-12-07T08:09:15Z

In mbed-os 5.11, Mbed TLS doesn't guarantee mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free()
are paired.

Is that by design, or an error? It to me like the intent is that they be paired, and by code inspection they seem to be for a single operation.

Is this a thread-safety issue? If it is, then taking a mutex in init and releasing it in free might be the way to go?

Looking at the patch, you are implementing your own locking on the internal op. That locking is unpleasant because it isn't using a mutex, just busy-waiting. That busy-waiting will fail the instant anyone uses different thread priorities. It would be better to use a mutex, but then that may be a bit heavyweight if it's in an inner loop. (Not sure if it is).

Given that contention on the ECP unit is likely to be low, I think you really would be best off just taking a mutex in init, and releasing it in free, minimising the overhead.

yanesca · 2018-12-07T08:45:28Z

Is that by design, or an error?

It is by design. The calls are not necessarily paired. If there are more than one mbedtls_ecp_point context in the application (or even in Mbed TLS itself) their lifetime can overlap in any way. Therefore for example calling init on both before calling free on either is perfectly acceptable.

kjbracey · 2018-12-07T08:50:56Z

Looking at the code, the mbedtls_internal_ecp_init/free calls being discussed here are made as a pair within a single operation (mbedtls_ecp_mul_restartable or mbedtls_ecp_muladd_restartable).

That's distinct from the mbedtls_ecp_point_init/free which are indeed arbitary overlapping lifetime.

kjbracey · 2018-12-07T08:55:25Z

The config file description for MBEDTLS_ECP_INTERNAL_ALT says:

The functions mbedtls_internal_ecp_init and mbedtls_internal_ecp_deinit are called before and after each point operation and provide an opportunity to implement optimized set up and tear down instructions.

(Someone has apparently renamed deinit to free without updating the comment. Possibly using init/free is confusing, because it does make it look like an arbitrary lifetime thing like a point or context).

yanesca · 2018-12-07T09:18:19Z

Indeed these are completely different functions than the ones that can overlap. These should be called in pairs and as far as I can tell they are.

yanesca · 2018-12-07T09:45:28Z

I have found an issue that might be the reason for this. I'll put up a PR with the fix soon.

kjbracey · 2018-12-07T09:54:17Z

Rereading the patch more closely, I see that the busy-wait loop was already there, and does follow the locking pattern I was suggesting. If there is indeed an mbed TLS issue, then you may not need to move it now, but I'd still like to see it turned into a real mutex as soon as possible.

That loop is going to bite us at some point - I know some of the PDMC code likes using different thread priorities, and it's just asking for deadlock.

yanesca · 2018-12-07T11:27:47Z

I have raised a PR with the fix: #9005

@ccli8 Can you please test if it resolves the issue you are having?

ccli8 · 2018-12-10T05:44:27Z

@yanesca The ECC double initialization issue is produced in PDMC Greentea test. After applying #9005, the issue disappears.

@kjbracey-arm This PR will close because #9005 has fixed the issue. I will follow your suggestion and raise another PR which will replace busy-wait with mutex.

ccli8 · 2018-12-11T06:20:15Z

@kjbracey-arm I am replacing busy-wait loop with mutex, but fail to pass PDMC Greentest. The error scenario is that for the same mbedtls_sha256_context, mbedtls_sha256_init() is called in one thread but mbedtls_sha256_free() is called in another thread, so I meet error in unlock mutex in mbedtls_sha256_free(). It seems reasonable that mbedtls_sha256_init()/mbedtls_sha256_free() for the same mbedtls_sha256_context can be called in different threads. Note Nuvoton's SHA accelerator doesn't support context save & restore, so I need to lock mutex for the whole lifetime of ctx and cannot for just partial SHA computing. Or I can just roll back to busy-wait loop?

void mbedtls_sha256_init(mbedtls_sha256_context *ctx)
{
    /* Try locking mutex. On success, go H/W SHA, else go S/W SHA. */
}

void mbedtls_sha256_free(mbedtls_sha256_context *ctx)
{
    /* If H/W SHA, unlock mutex */
}

kjbracey · 2018-12-13T09:18:17Z

If you really can't context save and restore, rather than use a mutex, you could use a semaphore. However, if you wait for the semaphore, that opens you up to priority inversion problems - a little better than the busy wait loop, but can still deadlock.

Another alternative is a separate task that picks up SHA operations from a queue. Gets rid of deadlock. But...

Regardless of mechanism, if the hardware is exclusively held for the entire duration of a SHA context, that could cause significant scheduling problems. A long-running normal priority SHA operation could totally lock out some high-priority time-critical small SHA from a high-priority thread. That's conceptually the sort of thing that would kill WiSUN, say - unable to send a secure ack because someone is encrypting a big packet payload to send. (We hit that bug in Nanostack - failed because had only 1 context. Added second context to make it work).

So I think the solution may have to be that you use a semaphore, but don't wait, and if the semaphore is not available, use a software fallback.

But then if you never wait for the semaphore, then it doesn't have to be an actual RTOS semaphore - a simple flag like you have now would do.

ccli8 added 3 commits December 5, 2018 17:57

[Nuvoton] Fix possible hang in crypto_submodule_release()

e75c34c

[M487] Fix return error code when ECC H/W acceleratioin is failed

bed46e3

ciarmcom requested review from a team December 6, 2018 10:00

ciarmcom added the needs: review label Dec 6, 2018

yanesca reviewed Dec 6, 2018

View reviewed changes

0xc0170 added needs: work and removed needs: review labels Dec 6, 2018

yanesca mentioned this pull request Dec 7, 2018

Mbed TLS: Fix ECC hardware double initialization #9005

Merged

cmonr added the needs: preceding PR label Dec 7, 2018

ccli8 closed this Dec 10, 2018

studavekar removed the needs: work label Dec 10, 2018

ccli8 mentioned this pull request Dec 13, 2018

Nuvoton: Fix crypto AC management #9081

Merged

ccli8 deleted the nuvoton_fix_crypto_ecc branch December 20, 2018 06:40

cmonr removed the needs: preceding PR label Dec 20, 2018

		MBEDTLS_MPI_CHK(ecc_done ? 0 : -1);


		MBEDTLS_MPI_CHK(ecc_done ? 0 : MBEDTLS_ERR_SSL_HW_ACCEL_FAILED);

Nuvoton: Fix mbedtls crypto ECC AC management failed #8985

Nuvoton: Fix mbedtls crypto ECC AC management failed #8985

Uh oh!

Conversation

ccli8 commented Dec 6, 2018

Description

Related target

Related issue

Pull request type

Uh oh!

ciarmcom commented Dec 6, 2018

Uh oh!

0xc0170 commented Dec 6, 2018

Uh oh!

yanesca Dec 6, 2018

Choose a reason for hiding this comment

Uh oh!

ccli8 Dec 7, 2018

Choose a reason for hiding this comment

Uh oh!

yanesca Dec 6, 2018

Choose a reason for hiding this comment

Uh oh!

ccli8 Dec 7, 2018

Choose a reason for hiding this comment

Uh oh!

ccli8 commented Dec 7, 2018

Uh oh!

kjbracey commented Dec 7, 2018

Uh oh!

yanesca commented Dec 7, 2018

Uh oh!

kjbracey commented Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjbracey commented Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yanesca commented Dec 7, 2018

Uh oh!

yanesca commented Dec 7, 2018

Uh oh!

kjbracey commented Dec 7, 2018

Uh oh!

yanesca commented Dec 7, 2018

Uh oh!

ccli8 commented Dec 10, 2018

Uh oh!

ccli8 commented Dec 11, 2018

Uh oh!

kjbracey commented Dec 13, 2018

Uh oh!

Uh oh!

kjbracey commented Dec 7, 2018 •

edited

Loading

kjbracey commented Dec 7, 2018 •

edited

Loading