Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't keep box muller transform state between kernel launches #649

Merged
merged 6 commits into from
Jan 28, 2025

Conversation

neworderofjamie
Copy link
Contributor

@neworderofjamie neworderofjamie commented Jan 13, 2025

This builds on the new hybrid HIP/CUDA backend from #647 so review that first!

As described in #648, beyond its (arguably excessive) 192 bit of state, the curand XORWOW RNG used to provide randomness for neuron and custom connectivity updates also stores 160 bits of box muller transform state in the curandState struct (BM draws two numbers and produces two normally distributed values so this state is used to cache one of those results for subsequent calls to curand_normal).

In this PR, when using CUDA and HIP backends, we create our own XORWowStateInternal struct in definitions.h without the BM state and store this in memory. At the start of the neuron and custom connectivity update kernels, we copy the fields from the XORWowStateInternal struct into a local curandState and, at the end, we copy them back.

Excitingly, because we are very memory bandwidth bound, this makes the neuron kernel on the cortical microcircuit model about 60% faster. On my A5000 (running for 1 second):

Before[seconds] After [seconds]
neuron update 0.18 0.11
presynaptic update 0.23 0.24

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.97%. Comparing base (54101ca) to head (beb19d4).
Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
include/genn/genn/code_generator/environment.h 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #649      +/-   ##
==========================================
+ Coverage   88.62%   88.97%   +0.35%     
==========================================
  Files         111      108       -3     
  Lines       15062    14768     -294     
==========================================
- Hits        13348    13140     -208     
+ Misses       1714     1628      -86     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@neworderofjamie neworderofjamie force-pushed the xorwow_mem_reduce branch 3 times, most recently from e61dc4b to 1294128 Compare January 15, 2025 12:55
…ase`` entries as well as initialisers for cleanup code
…IP backend

* Write struct to definitions
* Use new class for allocating memory and struct fields
* Reimplemented population RNG preamble and postamble in ``BackendSIMT::getPopulationRNG`` using new destructor mechanism to copy from and to internal struct
Copy link
Member

@tnowotny tnowotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow ... what a detail and what an effect. It doesn't necessarily look very elegant but the savings seem worth it!

Base automatically changed from hip to master January 28, 2025 19:13
@neworderofjamie neworderofjamie merged commit e9e40b9 into master Jan 28, 2025
3 checks passed
@neworderofjamie neworderofjamie deleted the xorwow_mem_reduce branch January 28, 2025 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants