-
Notifications
You must be signed in to change notification settings - Fork 3k
mbedtls/F439ZI: add md5 HASH feature #4160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mbedtls/F439ZI: add md5 HASH feature #4160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added my comments. It seems there are commits not related to MD5. Please check
*/ | ||
void mbedtls_md5_update( mbedtls_md5_context *ctx, const unsigned char *input, size_t ilen ) | ||
{ | ||
HAL_HASH_MD5_Accumulate(&ctx->hhash_md5, (uint8_t *)input, ilen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if ilen < 64? Do you need accumulation buffer in the context?
*/ | ||
typedef struct | ||
{ | ||
uint32_t total[2]; /*!< number of bytes processed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this referenced?
typedef struct | ||
{ | ||
uint32_t total[2]; /*!< number of bytes processed */ | ||
uint32_t state[4]; /*!< intermediate digest state */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this referenced?
{ | ||
uint32_t total[2]; /*!< number of bytes processed */ | ||
uint32_t state[4]; /*!< intermediate digest state */ | ||
unsigned char buffer[64]; /*!< data block being processed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this referenced?
#endif | ||
|
||
#ifdef __cplusplus | ||
extern "C" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be removed
#endif | ||
|
||
#ifdef __cplusplus | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be removed
rebase done + remove unused includes + handle every MD5 sizes (including the <4 bytes that are not supported by the HW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments which I need addressing
void mbedtls_md5_update( mbedtls_md5_context *ctx, const unsigned char *input, size_t ilen ) | ||
{ | ||
unsigned char i=0; | ||
int currentlen = ilen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t
is usually unsigned. If ilen
would be a very large number, you will get a negative value for the signed currentlen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
memcpy(ctx->sbuf,input+(i+1)*64-ctx->sbuf_len, ctx->sbuf_len); | ||
// ctx->sbuf_len remains the same | ||
} | ||
currentlen -= 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to check if currentlen >= 64
before decreasing 64 from it, instead of later adding 64 in case we reached a negative value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my problem is the loop condition : it is dependent on the size of (sbuf + currentlen).
what would you write ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write something like:
if (currentlen >= 64)
currentlen -= 64;
But I see your point now, because ctx->sbuf_len doesn't change. I think it can be improved though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not applicable anymore
i++; | ||
} | ||
if (currentlen <0) { | ||
currentlen +=64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment
} | ||
ctx->hhash_md5.Phase = HAL_HASH_PHASE_PROCESS; | ||
} | ||
while ((currentlen+ctx->sbuf_len) >=64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this exactly the same functionality as HAL_HASH_MD5_Accumulate
?
Would it be more efficient to call HAL_HASH_MD5_Accumulate
which is done in HW, instead of proccessing every MD5 block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment : is it on line 96 or line 98 ?
If line 96: it is related to md5_selftest #1 where the buffer length is 0. The new memory optimization mechanism skips the call to HAL_HASH_MD5_Accumulate
, then the HASH HW block is not aware that we are in MD5 mode (see line 94)., and the md5_finish function will not work.
If line 98: your comment is not clear to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adustm I meant lines 98 - 105. Doesn't HAL_HASH_MD5_Accumulate
behave the same way? Doesn't it do a HW hash on the whole input buffer, and the residue accumulates in an internal buffer? If so, wouldn't it be more efficient to let the HW do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RonEld those lines are for the specific case when the input length is zero (as in md5_selftest #1
)
I have extracted from HAL_HASH_MD5_Accumulate
function the HW flag to set when the size is 0 (we just need to set the MD5 mode and exit the phase = READY status).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adustm I think you mean lines 89-97.
If I understand correct, if, for example the input length is 128, then (currentlen+ctx->sbuf_len) >=64
, and then mbedtls_md5_process
would be called twice, with 64 bytes each time. Wouldn't it be enough to call HAL_HASH_MD5_Accumulate
once with 128 bytes? Also, if the input length is 129 bytes, then mbedtls_md5_process
would be called twice, with 64 bytes each time, and another byte will be stored in ctx->sbuf. TO my understanding of HAL_HASH_MD5_Accumulate
, it can be called with 129 bytes, and it will accumulate the extra byte, until HAL_HASH_MD5_Finish
is called. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RonEld
I could call HAL_HASH_MD5_Accumulate once with 128 bytes only if the ctx->sbuf does not yet contain anything.
What if mbedtls_md5_update is called with a 50 size (which would lead to bufferize 50 bytes in ctx->sbuf), then another call to md5_update with a size of 128 ?
I would need 1st to bufferize the 14 1st samples, then Accumulate 64, then store etc etc...
Isn't it what you had in mind when asking this 64bytes storing mechanism ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adustm What I had in mind is sending to HAL_HASH_MD5_Accumulate
a buffer size of 64*x, and the residue store in a 64 byte buffer. But, this is before I saw\thought that HAL_HASH_MD5_Accumulate
has this mechanism already. I would check how HAL_HASH_MD5_Accumulate
behaves if the input is not a multiple of 64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RonEld HAL_HASH_MD5_Accumulate
behaves badly only if we want to accumulate less than 4 bytes and call again the update function after (ie not the end of the buffer).
64 bytes and more is not an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adustm In this case, I think we should have a special case only for 4 bytes accumulation. Or at least have a sw buffer to accumulate up to 64 bytes, process it when finished, after that, call HAL_HASH_MD5_Accumulate
with the rest of the input, as long as it is a multiple of 64, and then store in sbuf
the residue of the input.
This suggestion has:
- memcpy to sbuf from the input, length
min( 64 - ctx->sbuf_len , ilen )
- optionally call
mbedtls_md5_process
if sbuf_len is 64 - optionally call
HAL_HASH_MD5_Accumulate
if theilen - ctx->sbuf_len > 64
, with a length of(ilen - ctx->sbuf_len % 64) * 64)', and offset
ctx->sbuf_len` - optionally memcpy the residue to
sbuf
, and updatectx->sbuf_len
to 'ilen - (ilen - ctx - sbuf_len % 64) * 64)'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new rework of the function :
- if the sbuf already contains something, I fill it and launch HASH_MD5_accumulate
- then I launch the HW Accumulation for every multiple of 4 bytes of the input buffer
- if needed I only store the 1 2 or 3 remaining bytes of the input.
targets/targets.json
Outdated
@@ -970,6 +970,7 @@ | |||
"extra_labels": ["STM", "STM32F7", "STM32F756", "STM32F756xG", "STM32F756ZG"], | |||
"supported_toolchains": ["ARM", "uARM", "GCC_ARM", "IAR"], | |||
"default_toolchain": "ARM", | |||
"macros": ["TRANSACTION_QUEUE_SIZE_SPI=2", "USBHOST_OTHER", "MBEDTLS_CONFIG_HW_SUPPORT", "MBEDTLS_MD5_C"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding MD5 here is still pending to be resolved and agreed
} | ||
} | ||
|
||
void mbedtls_md5_process( mbedtls_md5_context *ctx, const unsigned char data[64] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64
is used all over this code. I think it should be changed to a define, for example MBEDTLS_MD5_BLOCK_SIZE
, in the beginning of md5_alt.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hello @RonEld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot better.
We still need to resolve the discussion on the medtls features in targets.json
mbedtls_md5_process(ctx, ctx->sbuf); | ||
// now process every input as long as it is %4 bytes | ||
size_t iter = currentlen / 4; | ||
HAL_HASH_MD5_Accumulate(&ctx->hhash_md5, (uint8_t *)(input+MBEDTLS_MD5_BLOCK_SIZE-ctx->sbuf_len), (iter*4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho will HAL_HASH_MD5_Accumulate
behave if iter = 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @RonEld ,
It will call HASH_WriteData
in stm32f4xx_hal_hash.c line 225
for(buffercounter = 0U; buffercounter < Size; buffercounter+=4U)
{
HASH->DIN = *(uint32_t*)inputaddr;
inputaddr+=4U;
}
It will do nothing in case the Size
is =0
(I just tested it for the 3 toolchains GCC_ARM / IAR / ARM)
Almost identical to PR #4156, see my comments there. |
This is equivalent to #4156, so same comments. |
+ remove unused includes files
Hello @gilles-peskine-arm @andresag01 , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, see my comments in PR #4162
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, see my comments in PR #4162
ctx->ctx_save_str = HASH->STR; | ||
for (i=0;i<38;i++) { | ||
ctx->ctx_save_csr[i] = HASH->CSR[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adustm: Thanks for making these changes. I have a couple of questions regarding this functionality though:
- Why is it necessary to copy the CR and STR registers? I tried looking for the description of these registers in the reference manual and under CSR it states:
These registers contain the complete internal register states of the hash processor, and are
useful when a context swap has to be done because a high-priority task has to use the hash
processor while it is already in use by another task.
Therefore, my understanding is that only the values from the CSR register are required.
- Why is it necessary to only copy the first 38 words? Shouldnt it be possible to just copy the registers HASH_CSRx that are needed or the full 54 words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @andresag01
Here is the full content of the Reference Manual for the context swapping chapter:
Procedure where the data are loaded by software
The context can be saved only when no block is currently being processed. That is, you
must wait for DINIS = 1 (the last block has been processed and the input FIFO is empty) or
NBW ≠ 0 (the FIFO is not full and no processing is ongoing).
• Context saving:
Store the contents of the following registers into memory:
– HASH_IMR
– HASH_STR
– HASH_CR
– HASH_CSR0 to HASH_CSR50 on STM32F415/417xx, and HASH_CSR0 to
HASH_CSR53 on STM32F43xxx.
• Context restoring:
[...]
and a note:
Note: If context swapping does not involve HMAC operations, the HASH_CSR38 to
HASH_CSR50 (STM32F415/417xx) and HASH_CSR38 to HASH_CSR53 (STM32F43xxx)
registers do not have to be saved and restored.
We are not using the interrupts in mbed-os: I did not save HASH_IMR register. But STR and CR are required to be saved-restored.
I don't save HASH_CSR38
to HASH_CSR53
because we don't use the hardware acceleration for HMAC (it is done by SW at mbedtls level)
HASH->CR = (ctx->ctx_save_cr | HASH_CR_INIT); | ||
for (i=0;i<38;i++) { | ||
HASH->CSR[i] = ctx->ctx_save_csr[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is it necessary here to check the status register to ensure that we are not affecting any ongoing operation of the accelerator? For example, what happens if the DMA bit is set because some data is being processed and the accelerator's state is simultaneously being changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DMA is not used in mbed-os.
Aren't you planning some kind of mutex system later on ? It should be handled at a higher level, shouldn't it ?
You are right : I assume that there are not multiple instances of operation at the same time.
What if I cannot run the function and I cannot return any error code anyway ? Should I wait with a timeout or exit as soon as it does not work... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @adustm
I agree that with the current framework/interface, there is not much we can do here. I think the best way is to return immediately, and record in a comment that an error code should be returned here, like you did it elsewhere in similar situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply. I think to answer these planning questions it would be good to include @sbutcher-arm and @yanesca in the discussion.
{ | ||
st_md5_restore_hw_context(ctx); | ||
if (ctx->sbuf_len > 0) { | ||
if (HAL_HASH_MD5_Accumulate(&ctx->hhash_md5, ctx->sbuf, ctx->sbuf_len) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adustm: In similar PR you commented that:
the HW is not able to handle less that 32bits input (4 bytes). I then need to buffer 1, 2 or 3 bytes of any input whose length is not modulo 4.
Does the MD5 accelerator have a similar constraints? In other words, if mbedtls_md5_finish()
is called and ctx->sbuf_len
is either 1, 2 or 3 then this call to accumulate will probably not have the desired effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello @andresag01
You are right : MD5 / SHA1 and SHA256 are using the same HW block (HASH).
FYI with the new version, I only manage 512 bits blocks (to allow the context swap). ctx->sbuf_len can now have a value up to 63.
This is ok to have such sizes in the mbedtls_md5_finish
function, because when calling the HAL_DIGEST
after Accumulate
, the hardware will handle automatic padding. (this is not inserting zeroes in the middle of a message, only at its end).
kind regards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @andresag01 |
@adustm: many thanks for replying to my questions. I understand that it is a bit tedious to cherry-pick the commits between the PRs all the time, I was just wanted to clarify... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @adustm
Thank you for bearing with us and making all these changes!
Could you please add that last little check @andresag01 suggested at line 39?
I think after that we can formally approve this PR and it is ready to merge. (The same holds for all the similar PRs that are up to date and in sync with this one)
@yanesca @andresag01
The code is ready to return the timeout error when it will be available.
md5 selftest + md5multi instance test are ready |
Hello, could someone check what is wrong in the Cam-CI uvisor Build & Test ? |
Hello, @0xc0170 do you know what is failing in the Cam-CI uvisor Build & Test script ? |
uvisor is already moved to the new codebase, and this branch is far behind, we will cherry-pick rebase, thus good to merge tot the branch . |
Hello @adbridge @0xc0170 cc @screamerbg |
@adustm This PR needs to be redirected to master and then rebased when done . |
Can you please look at this? I tried locally to see it, there are depedencies (other pull requests), it was not clear to me what they are, I found at least one, but based on the conflicts I have seen, I missed some other. Here's a view of my try to get this up to master: https://github.com/0xc0170/mbed-os/commits/fix_4160 |
Hello @0xc0170 , I fetched your fix_4160 branch. It's ok. Thank you. |
@adustm : when do you think this PR might be redirected towards |
Description
Enable MD5 for STM32F439ZI
Add HW acceleratoin for MD5 on STM32F439ZI
is #3950
Status
DEVELOPMENT
Steps to test or reproduce
To test this feature, you have to modify TESTS/mbedtls/selfttest/main.cpp in order to call md5 self test:
add:
#include "mbedtls/md5.h"
then
then