Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SGEMM_DIRECT Kernel based on SME1 #5084

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

vaiskv
Copy link

@vaiskv vaiskv commented Jan 19, 2025

This PR contains support for sgemm_direct kernel based on SME1 architecture.
sgemm_direct kernel handles a special case of cblas_sgemm() level 3 API where aplha =1 and beta=0.

@martin-frbg
Copy link
Collaborator

Thanks. Convenient that this special case does not imply debugging TRMM and SYMM as with the general case SGEMM kernel in #5011 that I still hope to get to soon. :/
Having unguarded -march flags that require a recent compiler is not going to work for all users however, I think we'll need the same kludge as used on x86_64 - ifdef the source on HAVE_SME and use an empty function declaration in the #else branch if not available, then rely on the required compiler flags being provided by Makefile.arm64 (or kernel/Makefile in the DYNAMIC_ARCH case, as currently for SkylakeX and newer) . Also I don't think one can use HWCAP on Apple, there should be a feature flag HAVE_SME provided by the build scripts (and "eventually" by the runtime detection code in dynamic_arm64.c too)

@martin-frbg
Copy link
Collaborator

HarmonyOS doesn't seem to support HWCAP either, and AppleClang balks at the "else if" introduced in common_s.h. I'll see if I can unravel and test locally.

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

Sure, Thanks!. I am working on restructuring the code to HAVE_SME flag as per your suggestion.

common_s.h Outdated
@@ -213,9 +213,9 @@
#ifdef ARCH_X86_64
#define SGEMM_DIRECT_PERFORMANT gotoblas -> sgemm_direct_performant
#define SGEMM_DIRECT gotoblas -> sgemm_direct
#else
#else if ARCH_ARM64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#elif ARCH_ARM64

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@martin-frbg
Copy link
Collaborator

Embarassingly it looks as if I nuked the HAVE_SME I had put in cpuid_arm64.c back in november (#4971) with some later change... bad things still sometimes happen when I jump between machines that are not always on the internet :(

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

I have a question:
I am compiling the library with the following command:

make BINARY=64 CC=aarch64-linux-android33-clang ONLY_CBLAS=1 HOSTCC=gcc TARGET=ARMV9SME DYNAMIC_ARCH=1

From the interface/gemm.c file, SGEMM_DIRECT kernel gets compiled only when DYNAMIC_ARCH=1. So when the library is compiled with DYNAMIC_ARCH=1 and if the TARGET is set to ARMV8 instead of AMRV9SME, does the function defined in kernel/arm64/sgemm_direct_arm64_sme1.c also be part of the library? I am assuming it won't be part of it as we have guarded the file with HAVE_SME. But then how can we ensure the library is supported on all Arm targets (Arm v8 , v9 etc)?

@martin-frbg
Copy link
Collaborator

With DYNAMIC_ARCH, TARGET is only used for the common code (interface/gemm.c and all the other interfaces, driver/level3 and so on), and the codes under kernel/arm64 are compiled in a loop with TARGET_CORE set to each of the individual models ARMV8, ARMV8SVE, etc. supported in DYNAMIC_ARCH configuration. So what I tried to express is that your kernel/arm64/sgemm_direct_arm64_sme1.c should look roughly like

#ifdef HAVE_SME
void CNAME (... 
{
your sme code goes here
}
#else 
void CNAME(...)
{}
#endif

so that the compiler finds something to compile (even if it is an empty function) whether it is running for a target with -march=armv9+sme... temporarily defined or not.
And maybe instead of string comparing the current cpu name in the DYNAMIC_ARCH situation at runtime, we should have a supports_sme() function in driver/others/dynamic_arm64.c that does the AT_HWCAP call (or its alternatives for other platforms) and returns 0 or 1, like the support_avx512 is used in interface/gemm.c for x86_64

@vaiskv
Copy link
Author

vaiskv commented Jan 20, 2025

Got it. I will update the code and push the updated patch. Thanks!

@martin-frbg
Copy link
Collaborator

Umm, TARGET=ARMV9SME tells me you're already building on AymenQ's unmerged #5011 ? In that case I might refrain from putting back the HAVE_SME and we could just live with the TARGET name(s) like in the Skylakex sgemm_direct kernel.
(I'm just unsure if the Apple M4 is going to be that representative of other SME-capable Arm64 targets to come)

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch 3 times, most recently from e875f09 to 3bce73c Compare January 23, 2025 05:37
@martin-frbg
Copy link
Collaborator

Needs

#ifndef HWCAP2_SME
#define HWCAP2_SME 1<<23
#endif

in dynamic_arm64.c

@martin-frbg
Copy link
Collaborator

Also, kernel/setparam-ref.c will need a small expansion of the ifdef block referring to sgemm_direct - currently this has only the "ifdef x86_64" part, which is why DYNAMIC_ARCH fails to build on all arm64 platforms in CI.

@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from 3bce73c to d3ef3a4 Compare January 24, 2025 07:34
@martin-frbg
Copy link
Collaborator

Hmm, seems that your "conflicting" changes in cmake/system.cmake have already been in the develop branch since early december (PR 5003), that part can probably be removed from this PR ?
The other conflict that git cannot resolve automatically appears to be caused by my merging the "thread throttling profile for NEOVERSEV1" into interface/gemm.c - I think it is just the line differences that are throwing the automation off.

* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
@vaiskv vaiskv force-pushed the topic/sgemm_direct_sme1 branch from d3ef3a4 to da50c80 Compare January 27, 2025 05:52
@vaiskv
Copy link
Author

vaiskv commented Jan 27, 2025

Resolved merge conflicts in interface/gemm.c file. Also rebased cmake/system.cmake with develop branch.

@vaiskv vaiskv requested a review from martin-frbg January 27, 2025 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants