-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numpy should use Apple Accelerate instead of OpenBLAS on Apple Silicon #181251
Comments
Accelerate update in macOS 13.3 could allow us to reconsider https://docs.brew.sh/Formula-Cookbook#linear-algebra-libraries (perhaps from Sonoma onward in case user is on 13.0 - 13.2). This would require documentation change and switching all formulae to Accelerate for Sonoma onward. May also be worth exploring optimization opportunities in OpenBLAS if anyone wants to try tuning it. EDIT:
Just noting that I've noticed worse SVD performance with whatever default OMP threads is (maybe total P+E cores). Still worse after adjusting than Accelerate but not as bad as 10x. |
May also try experimenting with FlexiBLAS, which allows runtime selection. Fedora decided to switch in Fedora 33 (https://fedoraproject.org/wiki/Changes/FlexiBLAS_as_BLAS/LAPACK_manager) though it does restrict some licensing due to LGPLv3 EDIT: FlexiBLAS formula request - #181938 |
Sample results. venv is PyPI numpy wheel. Others are locally rebuilt brew ❯ hyperfine --runs 2 'venv/bin/python3 mysvd.py' 'FLEXIBLAS=APPLE python3.12 mysvd.py' 'FLEXIBLAS=OPENBLASOPENMP python3.12 mysvd.py' 'FLEXIBLAS=OPENBLASOPENMP OMP_NUM_THREADS=4 python3.12 mysvd.py'
Benchmark 1: venv/bin/python3 mysvd.py
Time (mean ± σ): 11.344 s ± 0.977 s [User: 10.263 s, System: 3.067 s]
Range (min … max): 10.653 s … 12.034 s 2 runs
Benchmark 2: FLEXIBLAS=APPLE python3.12 mysvd.py
Time (mean ± σ): 10.436 s ± 0.129 s [User: 10.094 s, System: 3.084 s]
Range (min … max): 10.344 s … 10.527 s 2 runs
Benchmark 3: FLEXIBLAS=OPENBLASOPENMP python3.12 mysvd.py
Time (mean ± σ): 67.164 s ± 1.838 s [User: 133.651 s, System: 36.703 s]
Range (min … max): 65.865 s … 68.464 s 2 runs
Benchmark 4: FLEXIBLAS=OPENBLASOPENMP OMP_NUM_THREADS=4 python3.12 mysvd.py
Time (mean ± σ): 27.695 s ± 0.139 s [User: 26.499 s, System: 9.204 s]
Range (min … max): 27.596 s … 27.793 s 2 runs
Summary
FLEXIBLAS=APPLE python3.12 mysvd.py ran
1.09 ± 0.09 times faster than venv/bin/python3 mysvd.py
2.65 ± 0.04 times faster than FLEXIBLAS=OPENBLASOPENMP OMP_NUM_THREADS=4 python3.12 mysvd.py
6.44 ± 0.19 times faster than FLEXIBLAS=OPENBLASOPENMP python3.12 mysvd.py EDIT: Also a bit sad that OpenBLAS is worse than NETLIB. As previously mentioned, OpenMP threads is maybe a reason ❯ FLEXIBLAS=NETLIB python3.12 mysvd.py
mean of 10 runs: 1.58209s
❯ FLEXIBLAS=OPENBLASOPENMP OMP_NUM_THREADS=1 python3.12 mysvd.py
mean of 10 runs: 1.74133s |
A couple notes: Accelerate mishandles single-precision floats -- see e.g.: |
brew gist-logs <formula>
link ORbrew config
ANDbrew doctor
outputVerification
brew doctor
output saysYour system is ready to brew.
and am still able to reproduce my issue.brew update
and am still able to reproduce my issue.brew doctor
and that did not fix my problem.What were you trying to do (and why)?
Initially I used miniconda for python on my Mac M1 but now, python provided by brew is compatible with Apple Silicon.
I am trying to use homebrew python on my mac
What happened (include all command output)?
Numpy is installed as a dependency for different packages on my system, however provided installation is not optimized for apple silicon.
see numpy/numpy#24961
On my machine :
Running performance test is up to 10 time more efficient in my .env than in my system. I suppose this can slow up all packages depending on system numpy
What did you expect to happen?
Numpy provided by brew should be compiled with Apple Silicon optimizations.
Step-by-step reproduction instructions (by running
brew
commands)The text was updated successfully, but these errors were encountered: