Skip to content

[Compiler-rt] Add AArch64 routines for __arm_agnostic("sme_za_state") #120059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 20, 2024

Conversation

sdesmalen-arm
Copy link
Collaborator

Copy link
Contributor

@kmclaughlin-arm kmclaughlin-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added two more minor comments, but otherwise this LGTM.

Copy link
Collaborator

@paulwalker-arm paulwalker-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more recommendations but otherwise this looks good to me.

There's still some room for improvement not least regarding __arm_sme_state_size where we go to pains to ensure the returned value is the minimum necessary.

I wonder if we need to be this strict because we could for example always return the current maximum when SME is available regardless of the state of the lazy save or SME2 availability. With that said, I've no evidence these routines have performance concerns so am happy to see how things play out.

Comment on lines 296 to 297
stur wzr, [x0, #10]
strh wzr, [x0, #14]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an alignment perspective it would be better as:

strh    wzr, [x0, #10]
str     wzr, [x0, #12]

// Store ZT0
str zt0, [x18]
add x18, x18, #64
b 1f
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not necessary because we should fall through.

@sdesmalen-arm sdesmalen-arm merged commit 811f2a6 into llvm:main Dec 20, 2024
7 checks passed
@nico
Copy link
Contributor

nico commented Dec 24, 2024

This broke our our chromium/android build: https://issues.chromium.org/issues/385758019

[124576/130655] SOLINK ./libcapture_unittests__library.so
python3 ../../build/toolchain/gcc_solink_wrapper.py --readelf=../../third_party/llvm-build/Release+Asserts/bin/llvm-readelf --nm=../../third_p...(too long)
ld.lld: error: undefined symbol: __arm_tpidr2_save
>>> referenced by rotate_sme.cc:32 (../../third_party/libyuv/source/rotate_sme.cc:32)
>>>               libyuv_sme/rotate_sme.o:(TransposeWxH_SME) in archive obj/third_party/libyuv/libyuv_sme.a
>>> referenced by rotate_sme.cc:101 (../../third_party/libyuv/source/rotate_sme.cc:101)
>>>               libyuv_sme/rotate_sme.o:(TransposeUVWxH_SME) in archive obj/third_party/libyuv/libyuv_sme.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

Is that expected? Are we holding something wrong? Or is that too little information to say something useful?

@MacDue
Copy link
Member

MacDue commented Dec 24, 2024

Is that expected? Are we holding something wrong? Or is that too little information to say something useful?

I don't think it's expected (looks like the chrome build is clang trunk).

It looks like the +sme2 support detection is not working:

// Fails to build on clang trunk (+sme2 not set)
void foo(void)  __arm_streaming_compatible {
  asm(".arch armv9-a+sme2");
  asm("smstart");
  asm("ldr zt0, [sp]");
}
// (side note: It seems smstart does not require +sme to be assembled)

See: https://godbolt.org/z/osdfGd6dn


It looks like either of these would work:

// Builds successfully on clang trunk.
__attribute__((target("arch=armv9-a+sme2")))
void foo(void)  __arm_streaming_compatible {
  asm("smstart");
  asm("ldr zt0, [sp]");
}

// OR:

void foo2(void)  __arm_streaming_compatible {
  asm(".arch armv9-a+sme2\n"
      "smstart\n"
      "ldr zt0, [sp]\n");
}

See: https://godbolt.org/z/aqTTvKcE5

@ZequanWu
Copy link
Contributor

Is that expected? Are we holding something wrong? Or is that too little information to say something useful?

I don't think it's expected (looks like the chrome build is clang trunk).

It looks like the +sme2 support detection is not working:

// Fails to build on clang trunk (+sme2 not set)
void foo(void)  __arm_streaming_compatible {
  asm(".arch armv9-a+sme2");
  asm("smstart");
  asm("ldr zt0, [sp]");
}
// (side note: It seems smstart does not require +sme to be assembled)

See: https://godbolt.org/z/osdfGd6dn

It looks like either of these would work:

// Builds successfully on clang trunk.
__attribute__((target("arch=armv9-a+sme2")))
void foo(void)  __arm_streaming_compatible {
  asm("smstart");
  asm("ldr zt0, [sp]");
}

// OR:

void foo2(void)  __arm_streaming_compatible {
  asm(".arch armv9-a+sme2\n"
      "smstart\n"
      "ldr zt0, [sp]\n");
}

See: https://godbolt.org/z/aqTTvKcE5

Is it a regression introduced by this change? If so, please revert it if it takes a while to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants