-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework initialization of constants & class variables #15333
Merged
straight-shoota
merged 12 commits into
crystal-lang:master
from
ysbaddaden:refactor/crystal-once
Jan 20, 2025
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
b55fc14
Improve `__crystal_once` performance
ysbaddaden 55228fd
LLVM optimization
ysbaddaden 9c1da89
LLVM optimization: reduce complexity at call site
ysbaddaden 6ebfcf4
Fix: macro 'class_property' must be defined
ysbaddaden 71a1c4d
Fix: only enabled for 1.16.0-dev (not 1.15.0-dev)
ysbaddaden 2296069
Add LLVM optimization to legacy __crystal_once
ysbaddaden 2c1c20b
Fix: LLVM 14 and below need explicit pointer cast (i8* -> i1*)
ysbaddaden 273978b
Fix: missing :nodoc:
ysbaddaden e7527d3
Extract Intrinsics.unreachable
ysbaddaden e42a575
Only cast i8* to i1* on LLVM < 15
ysbaddaden 4f5092e
No need for `Crystal.once_mutex` getter
ysbaddaden cf58312
Fix: crystal tool format
ysbaddaden File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,54 +1,142 @@ | ||
# This file defines the functions `__crystal_once_init` and `__crystal_once` expected | ||
# by the compiler. `__crystal_once` is called each time a constant or class variable | ||
# has to be initialized and is its responsibility to verify the initializer is executed | ||
# only once. `__crystal_once_init` is executed only once at the beginning of the program | ||
# and the result is passed on each call to `__crystal_once`. | ||
|
||
# This implementation uses an array to store the initialization flag pointers for each value | ||
# to find infinite loops and raise an error. In multithread mode a mutex is used to | ||
# avoid race conditions between threads. | ||
|
||
# :nodoc: | ||
class Crystal::OnceState | ||
@rec = [] of Bool* | ||
|
||
def once(flag : Bool*, initializer : Void*) | ||
unless flag.value | ||
if @rec.includes?(flag) | ||
raise "Recursion while initializing class variables and/or constants" | ||
# This file defines two functions expected by the compiler: | ||
# | ||
# - `__crystal_once_init`: executed only once at the beginning of the program | ||
# and, for the legacy implementation, the result is passed on each call to | ||
# `__crystal_once`. | ||
# | ||
# - `__crystal_once`: called each time a constant or class variable has to be | ||
# initialized and is its responsibility to verify the initializer is executed | ||
# only once and to fail on recursion. | ||
|
||
# In multithread mode a mutex is used to avoid race conditions between threads. | ||
# | ||
# On Win32, `Crystal::System::FileDescriptor#@@reader_thread` spawns a new | ||
# thread even without the `preview_mt` flag, and the thread can also reference | ||
# Crystal constants, leading to race conditions, so we always enable the mutex. | ||
|
||
{% if compare_versions(Crystal::VERSION, "1.16.0-dev") >= 0 %} | ||
# This implementation uses an enum over the initialization flag pointer for | ||
# each value to find infinite loops and raise an error. | ||
|
||
module Crystal | ||
# :nodoc: | ||
enum OnceState : Int8 | ||
Processing = -1 | ||
Uninitialized = 0 | ||
Initialized = 1 | ||
end | ||
|
||
{% if flag?(:preview_mt) || flag?(:win32) %} | ||
@@once_mutex = uninitialized Mutex | ||
|
||
ysbaddaden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# :nodoc: | ||
def self.once_mutex=(@@once_mutex : Mutex) | ||
end | ||
@rec << flag | ||
{% end %} | ||
|
||
Proc(Nil).new(initializer, Pointer(Void).null).call | ||
flag.value = true | ||
# :nodoc: | ||
# Using @[NoInline] so LLVM optimizes for the hot path (var already | ||
# initialized). | ||
@[NoInline] | ||
def self.once(flag : OnceState*, initializer : Void*) : Nil | ||
{% if flag?(:preview_mt) || flag?(:win32) %} | ||
@@once_mutex.synchronize { once_exec(flag, initializer) } | ||
{% else %} | ||
once_exec(flag, initializer) | ||
{% end %} | ||
|
||
@rec.pop | ||
# safety check, and allows to safely call `Intrinsics.unreachable` in | ||
# `__crystal_once` | ||
unless flag.value.initialized? | ||
System.print_error "BUG: failed to initialize constant or class variable\n" | ||
LibC._exit(1) | ||
end | ||
end | ||
|
||
private def self.once_exec(flag : OnceState*, initializer : Void*) : Nil | ||
case flag.value | ||
in .initialized? | ||
return | ||
in .uninitialized? | ||
flag.value = :processing | ||
Proc(Nil).new(initializer, Pointer(Void).null).call | ||
flag.value = :initialized | ||
in .processing? | ||
raise "Recursion while initializing class variables and/or constants" | ||
end | ||
end | ||
end | ||
|
||
# on Win32, `Crystal::System::FileDescriptor#@@reader_thread` spawns a new | ||
# thread even without the `preview_mt` flag, and the thread can also reference | ||
# Crystal constants, leading to race conditions, so we always enable the mutex | ||
# TODO: can this be improved? | ||
{% if flag?(:preview_mt) || flag?(:win32) %} | ||
@mutex = Mutex.new(:reentrant) | ||
# :nodoc: | ||
fun __crystal_once_init : Nil | ||
{% if flag?(:preview_mt) || flag?(:win32) %} | ||
Crystal.once_mutex = Mutex.new(:reentrant) | ||
{% end %} | ||
end | ||
|
||
# :nodoc: | ||
# | ||
# Using `@[AlwaysInline]` allows LLVM to optimize const accesses. Since this | ||
# is a `fun` the function will still appear in the symbol table, though it | ||
# will never be called. | ||
@[AlwaysInline] | ||
fun __crystal_once(flag : Crystal::OnceState*, initializer : Void*) : Nil | ||
return if flag.value.initialized? | ||
|
||
Crystal.once(flag, initializer) | ||
|
||
# tell LLVM that it can optimize away repeated `__crystal_once` calls for | ||
# this global (e.g. repeated access to constant in a single funtion); | ||
# this is truly unreachable otherwise `Crystal.once` would have panicked | ||
Intrinsics.unreachable unless flag.value.initialized? | ||
end | ||
{% else %} | ||
# This implementation uses a global array to store the initialization flag | ||
# pointers for each value to find infinite loops and raise an error. | ||
|
||
# :nodoc: | ||
class Crystal::OnceState | ||
@rec = [] of Bool* | ||
|
||
@[NoInline] | ||
def once(flag : Bool*, initializer : Void*) | ||
unless flag.value | ||
@mutex.synchronize do | ||
previous_def | ||
if @rec.includes?(flag) | ||
raise "Recursion while initializing class variables and/or constants" | ||
end | ||
@rec << flag | ||
|
||
Proc(Nil).new(initializer, Pointer(Void).null).call | ||
flag.value = true | ||
|
||
@rec.pop | ||
end | ||
end | ||
{% end %} | ||
end | ||
|
||
# :nodoc: | ||
fun __crystal_once_init : Void* | ||
Crystal::OnceState.new.as(Void*) | ||
end | ||
|
||
# :nodoc: | ||
fun __crystal_once(state : Void*, flag : Bool*, initializer : Void*) | ||
state.as(Crystal::OnceState).once(flag, initializer) | ||
end | ||
|
||
{% if flag?(:preview_mt) || flag?(:win32) %} | ||
@mutex = Mutex.new(:reentrant) | ||
|
||
@[NoInline] | ||
def once(flag : Bool*, initializer : Void*) | ||
unless flag.value | ||
@mutex.synchronize do | ||
previous_def | ||
end | ||
end | ||
end | ||
{% end %} | ||
ysbaddaden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
|
||
# :nodoc: | ||
fun __crystal_once_init : Void* | ||
Crystal::OnceState.new.as(Void*) | ||
end | ||
|
||
# :nodoc: | ||
@[AlwaysInline] | ||
fun __crystal_once(state : Void*, flag : Bool*, initializer : Void*) | ||
return if flag.value | ||
state.as(Crystal::OnceState).once(flag, initializer) | ||
Intrinsics.unreachable unless flag.value | ||
end | ||
{% end %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nit-picky, but in my v2 branch, I replaced this trinary enum with a
Int8
so all comparisons are done with 0 (== Initialized
becomes> 0
).Comparing with
0
results in fewer (or smaller) assembly instructions on most CPUsThe only arch I'm really familiar with is risc-v, so here's an example in risc-v:
Of course this is micro-optimization, but if this is inlined into every const access, it could be noticable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: I fixed the examples below... I stupidly used a UInt8 🤦
On x86_64 the only difference is in the jump instruction:
Same for ARM32:
But AArch64 indeed only needs one instruction instead of two for the equality check:And same for AArch64:NOTE: I'm not fluent in the assembly of each arch. I compiled a tiny program with
--cross-compile --target=...
then usedobjdump --disassemble
from a crosschain build ofbinutils
to compare the LLVM generated assembly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me that looks like an unsigned comparison. So it only checks
x != 0
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wrote a commit, but didn't push it (yet): I'm wondering if we'd really benefit from the change in practice 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested again, it's actually a
jle
!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BlobCodes I run my tests again and updated my comment above.
We can probably do better in manually written assembly, but LLVM actually generates the same assembly for all 3 architectures. The only difference stands in the jump instruction 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see in a follow up if we can squeeze even more performance with this idea.