Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI image types not supported on clusters with enroot < 3.4.1 #310

Open
terrykong opened this issue Oct 16, 2023 · 5 comments
Open

OCI image types not supported on clusters with enroot < 3.4.1 #310

terrykong opened this issue Oct 16, 2023 · 5 comments
Assignees

Comments

@terrykong
Copy link
Contributor

If you try to start a slurm job on a cluster with enroot < 3.4.1 using an image of the OCI type, you will hit this error:

pyxis: importing docker image ...
slurmstepd: error: pyxis: child 261774 failed with error code: 1
slurmstepd: error: pyxis: failed to import docker image
slurmstepd: error: pyxis: printing contents of log file ...
slurmstepd: error: pyxis:     [INFO] Querying registry for permission grant
slurmstepd: error: pyxis:     [INFO] Authenticating with user: XXXXXXXXX
slurmstepd: error: pyxis:     [INFO] Using credentials from file: XXXXXXXXXX
slurmstepd: error: pyxis:     [INFO] Authentication succeeded
slurmstepd: error: pyxis:     [INFO] Fetching image manifest list
slurmstepd: error: pyxis:     [INFO] Fetching image manifest
slurmstepd: error: pyxis:     [ERROR] URL https://ghcr.io/v2/nvidia/jax/manifests/latest returned error code: 404 Not Found

You can check if the image is an OCI type by running:

docker manifest inspect $image | jq .mediaType
# Returning application/vnd.oci.image.index.v1+json for OCI type
# Returns application/vnd.docker.distribution.manifest.v2+json for docker type

Aside from updating the enroot version on the cluster, we should investigate if we can support both.

FWIW, as a work-around, you can convert the image from OCI type to the docker (schemaversion=2) format. Here's how I was able to do it via the skopeo tool:

docker run -v $HOME/.docker:/root/.docker:ro --rm quay.io/skopeo/stable copy --authfile /root/.docker/config.json --format v2s2 docker://ghcr.io/nvidia/jax:latest docker://my-private-registry/owner/repo/jax:latest
@nouiz nouiz assigned yhtang and unassigned nouiz Oct 16, 2023
@nouiz
Copy link
Collaborator

nouiz commented Oct 16, 2023

Re-asigning to @yhtang as he already worked on that.

@olupton
Copy link
Collaborator

olupton commented Oct 16, 2023

I believe I'm seeing a different side of the same thing while looking at #263 :

curl -s -H 'Authorization: Bearer xxx' https://ghcr.io/v2/nvidia/upstream-pax/manifests/latest
{"errors":[{"code":"MANIFEST_UNKNOWN","message":"OCI index found, but Accept header does not support OCI indexes"}]}

@yhtang
Copy link
Collaborator

yhtang commented Oct 16, 2023

Opened issue #311 as a solution.

@yhtang
Copy link
Collaborator

yhtang commented Oct 16, 2023

I believe I'm seeing a different side of the same thing while looking at #263 :

curl -s -H 'Authorization: Bearer xxx' https://ghcr.io/v2/nvidia/upstream-pax/manifests/latest
{"errors":[{"code":"MANIFEST_UNKNOWN","message":"OCI index found, but Accept header does not support OCI indexes"}]}

This could be addressed by adding -H 'Accept: application/vnd.oci.image.index.v1+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v2+json' to the CURL command line.

@yhtang
Copy link
Collaborator

yhtang commented Oct 20, 2023

This is already documented in our landing page README from a while ago:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants