Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Improve iteration speed of Region.Register objects #4583

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Mysteryem
Copy link
Contributor

What is this fixing or adding?

Without implementing __iter__ directly, calling iter() on a Region.Register on Python 3.12 would return a new generator implemented as follows:

        def __iter__(self) -> int:
            i = 0
            try:
                while True:
                    v = self[i]
                    yield v
                    i += 1
            except IndexError:
                return None

This was determined by disassembling the returned generator with dis.dis() and then constructing a function that disassembles into the same bytecode.

The iterator returned by iter(self._list) is faster than this generator, so using it slightly improves generation performance on average.

Iteration of Region.Register objects is used a lot in CollectionState.update_reachable_regions in both of the private _update methods that get called. The performance gain here will vary depending on how many regions a world has and how many exits those regions have on average.

For a game like Blasphemous, with a lot of regions and exits, generation of 10 template Blasphemous yamls with --skip_output --seed 1 and progression balancing disabled went from 19.0s to 16.4s (14.2% reduction in generation duration).

How was this tested?

Generations were run before and afterwards. Comparing duration and output of the same seeds.

Without implementing __iter__ directly, calling iter() on a
Region.Register on Python 3.12 would return a new generator implemented
as follows:
```py
        def __iter__(self) -> int:
            i = 0
            try:
                while True:
                    v = self[i]
                    yield v
                    i += 1
            except IndexError:
                return None
```
This was determined by disassembling the returned generator with
dis.dis() and then constructing a function that disassembles into the
same bytecode.

The iterator returned by `iter(self._list)` is faster than this
generator, so using it slightly improves generation performance on
average.

Iteration of Region.Register objects is used a lot in
`CollectionState.update_reachable_regions` in both of the private
_update methods that get called. The performance gain here will vary
depending on how many regions a world has and how many exits those
regions have on average.

For a game like Blasphemous, with a lot of regions and exits, generation
of 10 template Blasphemous yamls with `--skip_output --seed 1` and
progression balancing disabled went from 19.0s to 16.4s (14.2% reduction
in generation duration).
@github-actions github-actions bot added affects: core Issues/PRs that touch core and may need additional validation. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. labels Jan 30, 2025
@Mysteryem
Copy link
Contributor Author

Bytecode printed from dis.dis(iter(world.get_region("Menu").exits)) on Python 3.12:

1022           0 RETURN_GENERATOR
               2 POP_TOP
               4 RESUME                   0

1023           6 LOAD_CONST               1 (0)
               8 STORE_FAST               1 (i)

1024          10 NOP

1025          12 NOP

1026     >>   14 LOAD_FAST                0 (self)
              16 LOAD_FAST                1 (i)
              18 BINARY_SUBSCR
              22 STORE_FAST               2 (v)

1027          24 LOAD_FAST                2 (v)
              26 YIELD_VALUE              2
              28 RESUME                   1
              30 POP_TOP

1028          32 LOAD_FAST                1 (i)
              34 LOAD_CONST               2 (1)
              36 BINARY_OP               13 (+=)
              40 STORE_FAST               1 (i)

1025          42 JUMP_BACKWARD           15 (to 14)
         >>   44 PUSH_EXC_INFO

1029          46 LOAD_GLOBAL              0 (IndexError)
              56 CHECK_EXC_MATCH
              58 POP_JUMP_IF_FALSE        3 (to 66)
              60 POP_TOP

1030          62 POP_EXCEPT
              64 RETURN_CONST             0 (None)

1029     >>   66 RERAISE                  0
         >>   68 COPY                     3
              70 POP_EXCEPT
              72 RERAISE                  1
         >>   74 CALL_INTRINSIC_1         3 (INTRINSIC_STOPITERATION_ERROR)
              76 RERAISE                  1
ExceptionTable:
  4 to 8 -> 74 [0] lasti
  12 to 42 -> 44 [0]
  44 to 60 -> 68 [1] lasti
  62 to 64 -> 74 [0] lasti
  66 to 66 -> 68 [1] lasti
  68 to 72 -> 74 [0] lasti

Mysteryem added a commit to Mysteryem/Archipelago-ahit that referenced this pull request Jan 30, 2025
Calling the dunder method has to:
1. Look up the dunder method for that object/class
2. Bind a new method instance to the object instance
3. Call the method with its arguments
4. Run the appropriate operation on the object

Whereas running the appropriate operation on the object from the start
skips straight to step 4.

Region.Register.__getitem__ is called a lot without ArchipelagoMW#4583. In that case,
generation of 10 template Blasphemous yamls with
`--skip_output --seed 1` and progression balancing disabled went from
19.0s to 18.8s (1.3% reduction in generation duration).

From profiling with `timeit`
```py
        def __getitem__(self, index: int) -> Location:
            return self._list[index]
```
appears to be about twice as fast as the old code:
```py
        def __getitem__(self, index: int) -> Location:
            return self._list.__getitem__(index)
```

Besides this, there is not expected to be any noticeable difference in
performance, and there is not expected to be any difference in semantics
with these changes.
@Mysteryem
Copy link
Contributor Author

From looking at the CPython source code, deque.extend() does not appear to implement any optimisations for being passed list/tuple objects (would use the https://docs.python.org/3/c-api/sequence.html#c.PySequence_Fast family of functions), so providing it a fast iterator is probably the best that can be done currently.

https://github.com/python/cpython/blob/3.12/Modules/_collectionsmodule.c#L413 (deque_extend_impl as of CPython 3.13)

queue.extend(new_region.exits)

Archipelago/BaseClasses.py

Lines 802 to 806 in 1fe8024

queue.extend(new_region.exits)
self.path[new_region] = (new_region.name, self.path.get(connection, None))
new_connection = True
# sweep for indirect connections, mostly Entrance.can_reach(unrelated_Region)
queue.extend(blocked_connections)

@NewSoupVi NewSoupVi added the is: enhancement Issues requesting new features or pull requests implementing new features. label Jan 30, 2025
Copy link
Member

@NewSoupVi NewSoupVi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: core Issues/PRs that touch core and may need additional validation. is: enhancement Issues requesting new features or pull requests implementing new features. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants