Skip to content

Change the fb buffer as 16 bytes aligned #419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jason-mao
Copy link
Contributor

@jason-mao jason-mao commented Jul 13, 2022

The 128 bit aligned buffer have good performance for esp_jpeg encoding. The aligned data to improve the memory load time, combined with esp32-S3 128bit SIMD instructions optimization, we got about 10 fps on OV3660 camera with YUV422 and VGA.

@kxbin
Copy link

kxbin commented Jul 19, 2022

Hello, may I ask if there is test data, how much fps has been improved?

Copy link
Member

@igrr igrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @jason-mao!

Have left one note about the usage of heap_caps_aligned_alloc in older releases.

Also would you mind sharing, what is the observed performance improvement?
Perhaps you could add a test case or some other way to benchmark the improvement you have observed?

@dongmenhai
Copy link

@jason-mao Hello,How can I test this feature to take effect? I use esp32s3 to run a demo of pic_server, call the frame2jpg function 30 times and calculate the time. By comparison, I found that the method of modifying cam_obj->frames[x].fb.buf to 128 bit alignment did not change the speed of jpeg encoding

@jason-mao
Copy link
Contributor Author

Hello, may I ask if there is test data, how much fps has been improved?

This modification just to make the 128bit SIMD instructions to load data at theoretical performance, the unaligned data has extra time. This is only the first step. Most of the real improvement time is in code assembly optimization. I think I can add some code on esp-adf to test it.

@jason-mao
Copy link
Contributor Author

Thank you for the PR @jason-mao!

Have left one note about the usage of heap_caps_aligned_alloc in older releases.

Also would you mind sharing, what is the observed performance improvement? Perhaps you could add a test case or some other way to benchmark the improvement you have observed?

I have update on #419 (comment)

@jason-mao
Copy link
Contributor Author

@jason-mao Hello,How can I test this feature to take effect? I use esp32s3 to run a demo of pic_server, call the frame2jpg function 30 times and calculate the time. By comparison, I found that the method of modifying cam_obj->frames[x].fb.buf to 128 bit alignment did not change the speed of jpeg encoding

Sorry, maybe I caused some misunderstanding. I explanation here is clearly #419 (comment)

@jason-mao jason-mao force-pushed the bugfix/change_fb_to_aligned_buffer branch from 846a3f4 to 8b7416a Compare August 23, 2022 07:38
@jason-mao jason-mao force-pushed the bugfix/change_fb_to_aligned_buffer branch from 8b7416a to 03b1eab Compare August 23, 2022 07:45
Copy link
Member

@igrr igrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version LGTM.

Given the typical framebuffer sizes, loosing a few bytes of heap due to alignment doesn't seem to be a big issue. So I'm okay with this change even without adding SIMD optimizations to the rest of this project.

@me-no-dev PTAL as well.

@me-no-dev me-no-dev merged commit 36121e1 into espressif:master Aug 23, 2022
@me-no-dev
Copy link
Member

Thanks @jason-mao

@weilian1977
Copy link

The 128 bit aligned buffer have good performance for esp_jpeg encoding. The aligned data to improve the memory load time, combined with esp32-S3 128bit SIMD instructions optimization, we got about 10 fps on OV3660 camera with YUV422 and VGA.

can provide test code?

@weilian1977
Copy link

Hello, may I ask if there is test data, how much fps has been improved?

This modification just to make the 128bit SIMD instructions to load data at theoretical performance, the unaligned data has extra time. This is only the first step. Most of the real improvement time is in code assembly optimization. I think I can add some code on esp-adf to test it.

https://github.com/espressif/esp-adf-libs/blob/master/esp_codec/include/codec/esp_jpeg_version.h??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants