-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Improve iteration speed of Region.Register objects #4583
base: main
Are you sure you want to change the base?
Core: Improve iteration speed of Region.Register objects #4583
Conversation
Without implementing __iter__ directly, calling iter() on a Region.Register on Python 3.12 would return a new generator implemented as follows: ```py def __iter__(self) -> int: i = 0 try: while True: v = self[i] yield v i += 1 except IndexError: return None ``` This was determined by disassembling the returned generator with dis.dis() and then constructing a function that disassembles into the same bytecode. The iterator returned by `iter(self._list)` is faster than this generator, so using it slightly improves generation performance on average. Iteration of Region.Register objects is used a lot in `CollectionState.update_reachable_regions` in both of the private _update methods that get called. The performance gain here will vary depending on how many regions a world has and how many exits those regions have on average. For a game like Blasphemous, with a lot of regions and exits, generation of 10 template Blasphemous yamls with `--skip_output --seed 1` and progression balancing disabled went from 19.0s to 16.4s (14.2% reduction in generation duration).
Bytecode printed from
|
Calling the dunder method has to: 1. Look up the dunder method for that object/class 2. Bind a new method instance to the object instance 3. Call the method with its arguments 4. Run the appropriate operation on the object Whereas running the appropriate operation on the object from the start skips straight to step 4. Region.Register.__getitem__ is called a lot without ArchipelagoMW#4583. In that case, generation of 10 template Blasphemous yamls with `--skip_output --seed 1` and progression balancing disabled went from 19.0s to 18.8s (1.3% reduction in generation duration). From profiling with `timeit` ```py def __getitem__(self, index: int) -> Location: return self._list[index] ``` appears to be about twice as fast as the old code: ```py def __getitem__(self, index: int) -> Location: return self._list.__getitem__(index) ``` Besides this, there is not expected to be any noticeable difference in performance, and there is not expected to be any difference in semantics with these changes.
From looking at the CPython source code, https://github.com/python/cpython/blob/3.12/Modules/_collectionsmodule.c#L413 ( Line 775 in 1fe8024
Lines 802 to 806 in 1fe8024
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh
What is this fixing or adding?
Without implementing
__iter__
directly, calling iter() on a Region.Register on Python 3.12 would return a new generator implemented as follows:This was determined by disassembling the returned generator with dis.dis() and then constructing a function that disassembles into the same bytecode.
The iterator returned by
iter(self._list)
is faster than this generator, so using it slightly improves generation performance on average.Iteration of Region.Register objects is used a lot in
CollectionState.update_reachable_regions
in both of the private _update methods that get called. The performance gain here will vary depending on how many regions a world has and how many exits those regions have on average.For a game like Blasphemous, with a lot of regions and exits, generation of 10 template Blasphemous yamls with
--skip_output --seed 1
and progression balancing disabled went from 19.0s to 16.4s (14.2% reduction in generation duration).How was this tested?
Generations were run before and afterwards. Comparing duration and output of the same seeds.