Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid hwloc version conflicts by forcing pocl to load before mpi4py #169

Merged
merged 2 commits into from
Dec 8, 2020

Conversation

matthiasdiener
Copy link
Member

@matthiasdiener matthiasdiener commented Dec 6, 2020

Fixes illinois-ceesd/emirge#73

Fixes pocl-related crashes of the type:

Not enough memory to run on this device.

at startup.

These happen only when running with mpi4py, e.g.

$ python wave-eager.py

works fine with the "wrong" libhwloc version, but

$ python -m mpi4py wave-eager-mpi.py

does not.

The reason for the error is that since we are building mpi4py from source against the system MPI and hwloc, but pocl against conda's hwloc, the two hwloc versions might be different (v1 vs. v2). Since libhwloc v1/v2 internally use the same version number, only one version can be loaded at a time, which means that the first package that loads hwloc determines the hwloc version for everyone else.

This seems to be especially problematic when pocl is built against hwloc v1, but runs with hwloc v2 (e.g. on dunkel).

This PR makes sure that we are running with pocl's hwloc version.

Tested on dunkel and lassen.

@matthiasdiener matthiasdiener self-assigned this Dec 6, 2020
Copy link
Contributor

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works? That's amazing, and it's much cleaner than what I proposed. Thanks!

@matthiasdiener
Copy link
Member Author

matthiasdiener commented Dec 7, 2020

This works? That's amazing, and it's much cleaner than what I proposed. Thanks!

It works for me, but I'd be happy if you could check if it works for you too.

The test is just:

$ ./install.sh --branch=fix-no-mem
$ source config/activate_env.sh
$ cd mirgecom/examples
$ python -m mpi4py ./wave-eager-mpi.py

@inducer
Copy link
Contributor

inducer commented Dec 8, 2020

Yup, can confirm that this works (on koelsch). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapting to the system hwloc situation
2 participants