Skip to content

[rocm6.4_internal_testing] [ROCm] Improvements to non-vectorized elementwise kernels #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 31, 2025
Merged

Conversation

jerrymannil
Copy link

  • Unroll loops manually to hide memory access latency
  • Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli

@rocm-repo-management-api
Copy link

Jenkins build for 6424cdb5f52ff53bffcdae4fffc4381b7f143587 commit is in progress
Links: Blue Ocean view / Build artifacts

@pruthvistony pruthvistony merged commit 2e48656 into ROCm:rocm6.4_internal_testing Jan 31, 2025
1 check was pending
@BLOrange-AMD BLOrange-AMD changed the title [ROCm] Improvements to non-vectorized elementwise kernels [rocm6.4_internal_testing] [ROCm] Improvements to non-vectorized elementwise kernels Feb 7, 2025
dnikolaev-amd pushed a commit that referenced this pull request Apr 17, 2025
* Unroll loops manually to hide memory access latency
*  Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli
(cherry picked from commit 2e48656)
dnikolaev-amd pushed a commit that referenced this pull request Apr 24, 2025
* Unroll loops manually to hide memory access latency
*  Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli
(cherry picked from commit 2e48656)
dnikolaev-amd pushed a commit that referenced this pull request Apr 24, 2025
* Unroll loops manually to hide memory access latency
*  Strided access for coalesced memory acesses

Co-authors: @akadutta @doru1004 @amd-hhashemi @carlobertolli
(cherry picked from commit 2e48656)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants