You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "main_qm9.py", line 298, in
main(args)
File "main_qm9.py", line 235, in main
train_err = train_one_epoch(model=model, criterion=criterion, norm_factor=norm_factor,
File "/home/zyli/equiformer/engine.py", line 63, in train_one_epoch
pred = model(f_in=data.x, pos=data.pos, batch=data.batch,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 885, in forward
node_features = blk(node_input=node_features, node_attr=node_attr,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 646, in forward
node_features = self.ga(node_input=node_features,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 508, in forward
alpha = torch_geometric.utils.softmax(alpha, edge_dst)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
Sorry for the late reply.
I guess the issue might be CUDA version, and the newer GPUs might not be compatible with the CUDA version in the env file.
Can you try if upgrading CUDA to 11.8+ and the correpsonding torch (and other related packages) can solve the issue?
I don't have RTX 40 in hand, so I might not be able to test that.
This problem occurs when I use RTX4090:
Traceback (most recent call last):
File "main_qm9.py", line 298, in
main(args)
File "main_qm9.py", line 235, in main
train_err = train_one_epoch(model=model, criterion=criterion, norm_factor=norm_factor,
File "/home/zyli/equiformer/engine.py", line 63, in train_one_epoch
pred = model(f_in=data.x, pos=data.pos, batch=data.batch,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 885, in forward
node_features = blk(node_input=node_features, node_attr=node_attr,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 646, in forward
node_features = self.ga(node_input=node_features,
File "/home/zyli/anaconda3/envs/equiformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/equiformer/nets/graph_attention_transformer.py", line 508, in forward
alpha = torch_geometric.utils.softmax(alpha, edge_dst)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
extern "C" global
void fused_sub_exp(float* tsrc_1, float* tsrc_max_9, float* output_1) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<140632ll ? 1 : 0) {
float v = __ldg(tsrc_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
float v_1 = __ldg(tsrc_max_9 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 35158ll + 4ll * (((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 35158ll));
output_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = expf(v - v_1);
}}
}
(equiformer) zyli@ubuntu:~/equiformer$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia
Collecting package metadata (current_repodata.json): failed
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/pytorch-nightly/noarch/current_repodata.json
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
'https://conda.anaconda.org/pytorch-nightly/noarch'
I use the same environment with you, since I created the environment from env_equiformer.yml
The text was updated successfully, but these errors were encountered: