Skip to content

mbedtls/F439ZI: add md5 HASH feature #4160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 13, 2017

Conversation

adustm
Copy link
Member

@adustm adustm commented Apr 11, 2017

Description

Enable MD5 for STM32F439ZI
Add HW acceleratoin for MD5 on STM32F439ZI
is #3950

Status

DEVELOPMENT

Steps to test or reproduce

To test this feature, you have to modify TESTS/mbedtls/selfttest/main.cpp in order to call md5 self test:
add:
#include "mbedtls/md5.h"
then

#if defined(MBEDTLS_MD5_C)
MBEDTLS_SELF_TEST_TEST_CASE(mbedtls_md5_self_test)
#endif

then

#if defined(MBEDTLS_MD5_C)
    Case("mbedtls_md5_self_test", mbedtls_md5_self_test_test_case),
#endif

Copy link
Contributor

@RonEld RonEld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added my comments. It seems there are commits not related to MD5. Please check

*/
void mbedtls_md5_update( mbedtls_md5_context *ctx, const unsigned char *input, size_t ilen )
{
HAL_HASH_MD5_Accumulate(&ctx->hhash_md5, (uint8_t *)input, ilen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if ilen < 64? Do you need accumulation buffer in the context?

*/
typedef struct
{
uint32_t total[2]; /*!< number of bytes processed */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this referenced?

typedef struct
{
uint32_t total[2]; /*!< number of bytes processed */
uint32_t state[4]; /*!< intermediate digest state */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this referenced?

{
uint32_t total[2]; /*!< number of bytes processed */
uint32_t state[4]; /*!< intermediate digest state */
unsigned char buffer[64]; /*!< data block being processed */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this referenced?

#endif

#ifdef __cplusplus
extern "C" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

#endif

#ifdef __cplusplus
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

@adustm adustm force-pushed the STM_md5_f439zi branch from 1b1eafe to 98f3074 Compare May 4, 2017 17:00
@adustm
Copy link
Member Author

adustm commented May 4, 2017

rebase done + remove unused includes + handle every MD5 sizes (including the <4 bytes that are not supported by the HW)
Kind regards
Armelle

Copy link
Contributor

@RonEld RonEld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments which I need addressing

void mbedtls_md5_update( mbedtls_md5_context *ctx, const unsigned char *input, size_t ilen )
{
unsigned char i=0;
int currentlen = ilen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t is usually unsigned. If ilen would be a very large number, you will get a negative value for the signed currentlen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

memcpy(ctx->sbuf,input+(i+1)*64-ctx->sbuf_len, ctx->sbuf_len);
// ctx->sbuf_len remains the same
}
currentlen -= 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to check if currentlen >= 64 before decreasing 64 from it, instead of later adding 64 in case we reached a negative value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my problem is the loop condition : it is dependent on the size of (sbuf + currentlen).
what would you write ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write something like:

if (currentlen >= 64)
    currentlen -= 64;

But I see your point now, because ctx->sbuf_len doesn't change. I think it can be improved though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not applicable anymore

i++;
}
if (currentlen <0) {
currentlen +=64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

}
ctx->hhash_md5.Phase = HAL_HASH_PHASE_PROCESS;
}
while ((currentlen+ctx->sbuf_len) >=64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this exactly the same functionality as HAL_HASH_MD5_Accumulate?
Would it be more efficient to call HAL_HASH_MD5_Accumulate which is done in HW, instead of proccessing every MD5 block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment : is it on line 96 or line 98 ?
If line 96: it is related to md5_selftest #1 where the buffer length is 0. The new memory optimization mechanism skips the call to HAL_HASH_MD5_Accumulate, then the HASH HW block is not aware that we are in MD5 mode (see line 94)., and the md5_finish function will not work.

If line 98: your comment is not clear to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm I meant lines 98 - 105. Doesn't HAL_HASH_MD5_Accumulate behave the same way? Doesn't it do a HW hash on the whole input buffer, and the residue accumulates in an internal buffer? If so, wouldn't it be more efficient to let the HW do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RonEld those lines are for the specific case when the input length is zero (as in md5_selftest #1 )
I have extracted from HAL_HASH_MD5_Accumulate function the HW flag to set when the size is 0 (we just need to set the MD5 mode and exit the phase = READY status).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm I think you mean lines 89-97.
If I understand correct, if, for example the input length is 128, then (currentlen+ctx->sbuf_len) >=64 , and then mbedtls_md5_process would be called twice, with 64 bytes each time. Wouldn't it be enough to call HAL_HASH_MD5_Accumulate once with 128 bytes? Also, if the input length is 129 bytes, then mbedtls_md5_process would be called twice, with 64 bytes each time, and another byte will be stored in ctx->sbuf. TO my understanding of HAL_HASH_MD5_Accumulate, it can be called with 129 bytes, and it will accumulate the extra byte, until HAL_HASH_MD5_Finish is called. Am I right?

Copy link
Member Author

@adustm adustm May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RonEld
I could call HAL_HASH_MD5_Accumulate once with 128 bytes only if the ctx->sbuf does not yet contain anything.
What if mbedtls_md5_update is called with a 50 size (which would lead to bufferize 50 bytes in ctx->sbuf), then another call to md5_update with a size of 128 ?
I would need 1st to bufferize the 14 1st samples, then Accumulate 64, then store etc etc...

Isn't it what you had in mind when asking this 64bytes storing mechanism ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm What I had in mind is sending to HAL_HASH_MD5_Accumulate a buffer size of 64*x, and the residue store in a 64 byte buffer. But, this is before I saw\thought that HAL_HASH_MD5_Accumulate has this mechanism already. I would check how HAL_HASH_MD5_Accumulate behaves if the input is not a multiple of 64.

Copy link
Member Author

@adustm adustm May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RonEld HAL_HASH_MD5_Accumulate behaves badly only if we want to accumulate less than 4 bytes and call again the update function after (ie not the end of the buffer).
64 bytes and more is not an issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm In this case, I think we should have a special case only for 4 bytes accumulation. Or at least have a sw buffer to accumulate up to 64 bytes, process it when finished, after that, call HAL_HASH_MD5_Accumulate with the rest of the input, as long as it is a multiple of 64, and then store in sbuf the residue of the input.
This suggestion has:

  • memcpy to sbuf from the input, length min( 64 - ctx->sbuf_len , ilen )
  • optionally call mbedtls_md5_process if sbuf_len is 64
  • optionally call HAL_HASH_MD5_Accumulate if the ilen - ctx->sbuf_len > 64, with a length of (ilen - ctx->sbuf_len % 64) * 64)', and offset ctx->sbuf_len`
  • optionally memcpy the residue to sbuf, and update ctx->sbuf_len to 'ilen - (ilen - ctx - sbuf_len % 64) * 64)'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new rework of the function :

  • if the sbuf already contains something, I fill it and launch HASH_MD5_accumulate
  • then I launch the HW Accumulation for every multiple of 4 bytes of the input buffer
  • if needed I only store the 1 2 or 3 remaining bytes of the input.

@@ -970,6 +970,7 @@
"extra_labels": ["STM", "STM32F7", "STM32F756", "STM32F756xG", "STM32F756ZG"],
"supported_toolchains": ["ARM", "uARM", "GCC_ARM", "IAR"],
"default_toolchain": "ARM",
"macros": ["TRANSACTION_QUEUE_SIZE_SPI=2", "USBHOST_OTHER", "MBEDTLS_CONFIG_HW_SUPPORT", "MBEDTLS_MD5_C"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding MD5 here is still pending to be resolved and agreed

}
}

void mbedtls_md5_process( mbedtls_md5_context *ctx, const unsigned char data[64] )
Copy link
Contributor

@RonEld RonEld May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 is used all over this code. I think it should be changed to a define, for example MBEDTLS_MD5_BLOCK_SIZE, in the beginning of md5_alt.c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@adustm
Copy link
Member Author

adustm commented May 18, 2017

Hello @RonEld
You can have a look at commit 1726556 that in my opinion contains a better handling of the storing part
cheers

Copy link
Contributor

@RonEld RonEld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot better.
We still need to resolve the discussion on the medtls features in targets.json

mbedtls_md5_process(ctx, ctx->sbuf);
// now process every input as long as it is %4 bytes
size_t iter = currentlen / 4;
HAL_HASH_MD5_Accumulate(&ctx->hhash_md5, (uint8_t *)(input+MBEDTLS_MD5_BLOCK_SIZE-ctx->sbuf_len), (iter*4));
Copy link
Contributor

@RonEld RonEld May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho will HAL_HASH_MD5_Accumulate behave if iter = 0 ?

Copy link
Member Author

@adustm adustm May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @RonEld ,
It will call HASH_WriteData in stm32f4xx_hal_hash.c line 225

  for(buffercounter = 0U; buffercounter < Size; buffercounter+=4U)
  {
    HASH->DIN = *(uint32_t*)inputaddr;
    inputaddr+=4U;
  }

It will do nothing in case the Size is =0 (I just tested it for the 3 toolchains GCC_ARM / IAR / ARM)

@adustm
Copy link
Member Author

adustm commented May 23, 2017

Hello @yanesca, you have been added as a reviewer by @0xc0170 .
Could you also review this PR, please ?

Thanks in advance,
Armelle

@gilles-peskine-arm
Copy link

Almost identical to PR #4156, see my comments there.

@andresag01
Copy link

This is equivalent to #4156, so same comments.

@adustm
Copy link
Member Author

adustm commented Jun 2, 2017

Hello @gilles-peskine-arm @andresag01 ,
I have reworked this branch to allow multiple context.
Let me know if you think it's ok.
Kind regards

Copy link

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, see my comments in PR #4162

Copy link

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, see my comments in PR #4162

ctx->ctx_save_str = HASH->STR;
for (i=0;i<38;i++) {
ctx->ctx_save_csr[i] = HASH->CSR[i];
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm: Thanks for making these changes. I have a couple of questions regarding this functionality though:

  • Why is it necessary to copy the CR and STR registers? I tried looking for the description of these registers in the reference manual and under CSR it states:

These registers contain the complete internal register states of the hash processor, and are
useful when a context swap has to be done because a high-priority task has to use the hash
processor while it is already in use by another task.

Therefore, my understanding is that only the values from the CSR register are required.

  • Why is it necessary to only copy the first 38 words? Shouldnt it be possible to just copy the registers HASH_CSRx that are needed or the full 54 words?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @andresag01
Here is the full content of the Reference Manual for the context swapping chapter:
Procedure where the data are loaded by software

The context can be saved only when no block is currently being processed. That is, you
must wait for DINIS = 1 (the last block has been processed and the input FIFO is empty) or
NBW ≠ 0 (the FIFO is not full and no processing is ongoing).
• Context saving:
Store the contents of the following registers into memory:
HASH_IMR
HASH_STR
HASH_CR
– HASH_CSR0 to HASH_CSR50 on STM32F415/417xx, and HASH_CSR0 to
HASH_CSR53 on STM32F43xxx.
• Context restoring:
[...]

and a note:

Note: If context swapping does not involve HMAC operations, the HASH_CSR38 to
HASH_CSR50 (STM32F415/417xx) and HASH_CSR38 to HASH_CSR53 (STM32F43xxx)
registers do not have to be saved and restored.

We are not using the interrupts in mbed-os: I did not save HASH_IMR register. But STR and CR are required to be saved-restored.

I don't save HASH_CSR38 to HASH_CSR53 because we don't use the hardware acceleration for HMAC (it is done by SW at mbedtls level)

HASH->CR = (ctx->ctx_save_cr | HASH_CR_INIT);
for (i=0;i<38;i++) {
HASH->CSR[i] = ctx->ctx_save_csr[i];
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it necessary here to check the status register to ensure that we are not affecting any ongoing operation of the accelerator? For example, what happens if the DMA bit is set because some data is being processed and the accelerator's state is simultaneously being changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DMA is not used in mbed-os.
Aren't you planning some kind of mutex system later on ? It should be handled at a higher level, shouldn't it ?
You are right : I assume that there are not multiple instances of operation at the same time.

What if I cannot run the function and I cannot return any error code anyway ? Should I wait with a timeout or exit as soon as it does not work... ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @adustm
I agree that with the current framework/interface, there is not much we can do here. I think the best way is to return immediately, and record in a comment that an error code should be returned here, like you did it elsewhere in similar situation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply. I think to answer these planning questions it would be good to include @sbutcher-arm and @yanesca in the discussion.

{
st_md5_restore_hw_context(ctx);
if (ctx->sbuf_len > 0) {
if (HAL_HASH_MD5_Accumulate(&ctx->hhash_md5, ctx->sbuf, ctx->sbuf_len) != 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm: In similar PR you commented that:

the HW is not able to handle less that 32bits input (4 bytes). I then need to buffer 1, 2 or 3 bytes of any input whose length is not modulo 4.

Does the MD5 accelerator have a similar constraints? In other words, if mbedtls_md5_finish() is called and ctx->sbuf_len is either 1, 2 or 3 then this call to accumulate will probably not have the desired effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @andresag01
You are right : MD5 / SHA1 and SHA256 are using the same HW block (HASH).
FYI with the new version, I only manage 512 bits blocks (to allow the context swap). ctx->sbuf_len can now have a value up to 63.

This is ok to have such sizes in the mbedtls_md5_finish function, because when calling the HAL_DIGEST after Accumulate, the hardware will handle automatic padding. (this is not inserting zeroes in the middle of a message, only at its end).

kind regards

Copy link

@andresag01 andresag01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm: Many thanks for considering our suggestions and reworking your PR. I think the changes are looking good, I just had a few questions/comments.

On a separate note, I assume that similar changes will need to be adapted for some of your other PRs, such as #4160. Right?

@adustm
Copy link
Member Author

adustm commented Jun 6, 2017

Hello @andresag01
I was requested to split into several PRs at the start. So I did 1 pull request for each feature (MD5 / SHA1 / SHA256) and for each platform (NUCLE0_F439ZI / NUCLEO_F756ZG). I use cherry pick from 1 PR to the other, so that PRs are at the same level of functionality and merge will be easy.

@andresag01
Copy link

@adustm: many thanks for replying to my questions. I understand that it is a bit tedious to cherry-pick the commits between the PRs all the time, I was just wanted to clarify...

Copy link
Contributor

@yanesca yanesca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @adustm

Thank you for bearing with us and making all these changes!

Could you please add that last little check @andresag01 suggested at line 39?

I think after that we can formally approve this PR and it is ready to merge. (The same holds for all the similar PRs that are up to date and in sync with this one)

@adustm
Copy link
Member Author

adustm commented Jun 8, 2017

@yanesca @andresag01
I just added a check that the HW is not busy in save and restore functions.

static int st_md5_restore_hw_context(mbedtls_md5_context *ctx)
{
    uint32_t i;
    uint32_t tickstart;
    /* allow multi-instance of HASH use: save context for HASH HW module CR */
    /* Check that there is no HASH activity on going */
    tickstart = HAL_GetTick();
    while ((HASH->SR & (HASH_FLAG_BUSY | HASH_FLAG_DMAS)) != 0) {
        if ((HAL_GetTick() - tickstart) > ST_MD5_TIMEOUT) {
            return 0; // timeout: HASH processor is busy
        }
    }

The code is ready to return the timeout error when it will be available.

void mbedtls_md5_process( mbedtls_md5_context *ctx, const unsigned char data[ST_MD5_BLOCK_SIZE] )
{
    if (st_md5_restore_hw_context(ctx) != 1) {
        return; // Return HASH_BUSY timout error here
    }

md5 selftest + md5multi instance test are ready
Kind regards

@adustm
Copy link
Member Author

adustm commented Jun 8, 2017

Hello, could someone check what is wrong in the Cam-CI uvisor Build & Test ?

@adustm
Copy link
Member Author

adustm commented Jun 12, 2017

Hello,
Thanks @yanesca for approving.
@andresag01 do you approve the PR or is something missing ?

@0xc0170 do you know what is failing in the Cam-CI uvisor Build & Test script ?
Kind regards
Armelle

@adustm
Copy link
Member Author

adustm commented Jun 13, 2017

Hello @0xc0170
FYI this PR is almost the same as #4156 but for NUCLEO_F439ZI

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 13, 2017

uvisor is already moved to the new codebase, and this branch is far behind, we will cherry-pick rebase, thus good to merge tot the branch .

@0xc0170 0xc0170 merged commit ae59925 into ARMmbed:mbed-os-workshop-17q2 Jun 13, 2017
@adustm
Copy link
Member Author

adustm commented Jun 23, 2017

Hello @adbridge @0xc0170
Any idea when this will be integrated in the master branch ?

cc @screamerbg

@adbridge
Copy link
Contributor

@adustm This PR needs to be redirected to master and then rebased when done .

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 28, 2017

@adustm This PR needs to be redirected to master and then rebased when done .

Can you please look at this? I tried locally to see it, there are depedencies (other pull requests), it was not clear to me what they are, I found at least one, but based on the conflicts I have seen, I missed some other. Here's a view of my try to get this up to master: https://github.com/0xc0170/mbed-os/commits/fix_4160

@adustm
Copy link
Member Author

adustm commented Jun 29, 2017

Hello @0xc0170 , I fetched your fix_4160 branch. It's ok. Thank you.
Armelle

@RobMeades
Copy link
Contributor

@adustm : when do you think this PR might be redirected towards master? I see that SHA1, SHA256 and AES are all there, MD5 would make the set :-).

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 3, 2017

Hello @0xc0170 , I fetched your fix_4160 branch. It's ok. Thank you.

Shall I send it as PR @adustm ?

@adustm
Copy link
Member Author

adustm commented Jul 3, 2017

Shall I send it as PR @adustm ?

LGTM @0xc0170 thanks

@adustm adustm deleted the STM_md5_f439zi branch October 11, 2018 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants