RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #19

leezy0311 · 2024-11-21T07:17:02Z

This problem occurs when I use RTX4090：

Traceback (most recent call last):
File "main_qm9.py", line 298, in
main(args)
File "main_qm9.py", line 235, in main
train_err = train_one_epoch(model=model, criterion=criterion, norm_factor=norm_factor,
File "/home/zyli/equiformer/engine.py", line 63, in train_one_epoch
pred = model(f_in=data.x, pos=data.pos, batch=data.batch,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 885, in forward
node_features = blk(node_input=node_features, node_attr=node_attr,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 646, in forward
node_features = self.ga(node_input=node_features,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 508, in forward
alpha = torch_geometric.utils.softmax(alpha, edge_dst)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_sub_exp(float* tsrc_1, float* tsrc_max_9, float* output_1) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<140632ll ? 1 : 0) {
float v = __ldg(tsrc_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
float v_1 = __ldg(tsrc_max_9 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 35158ll + 4ll * (((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 35158ll));
output_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = expf(v - v_1);
}}
}

(equiformer) zyli@ubuntu:~/equiformer$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia
Collecting package metadata (current_repodata.json): failed

CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/pytorch-nightly/noarch/current_repodata.json
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
'https://conda.anaconda.org/pytorch-nightly/noarch'

I use the same environment with you, since I created the environment from env_equiformer.yml

yilunliao · 2024-11-22T06:04:03Z

Hi @leezy0311

Does the problem only happen on RTX4090 (or other newer GPUs)?

leezy0311 · 2024-11-22T06:25:49Z

Hi @leezy0311

Does the problem only happen on RTX4090 (or other newer GPUs)?

Yes,it only happens on the 40 series GPUs.

yilunliao · 2024-11-24T05:45:45Z

Hi @leezy0311

Sorry for the late reply.
I guess the issue might be CUDA version, and the newer GPUs might not be compatible with the CUDA version in the env file.
Can you try if upgrading CUDA to 11.8+ and the correpsonding torch (and other related packages) can solve the issue?
I don't have RTX 40 in hand, so I might not be able to test that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #19

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #19

leezy0311 commented Nov 21, 2024

yilunliao commented Nov 22, 2024

leezy0311 commented Nov 22, 2024

yilunliao commented Nov 24, 2024

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #19

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #19

Comments

leezy0311 commented Nov 21, 2024

yilunliao commented Nov 22, 2024

leezy0311 commented Nov 22, 2024

yilunliao commented Nov 24, 2024