ACM Gordon Bell COVID 2022 Finalist

This is the official implement of our work "Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants". A PDF version of the published paper can be downloaded from here.

The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place.

Requirements

mindspore >=1.6.

Prepare data

Download UniRef90 dataset.
Convert downloaded data to txt files in which a line means a sequence.

.your_data_dir
  ├─01.txt
  ...
  └─09.txt

Convert data to mindrecord data

python prepare_data.py --data_url your_data_dir --save_dir your_save_dir

for more details, see google-bert.

Pretrain model

python run_pretrain.py --enable_modelarts True

Note: This code only tested on Pengcheng Cloud2. If you want run it on your own machine, you need do some modification

Pretrained Models Availability

Download

Generate mutations

python generate_mutation.py --generate_number 1000 --rbd_name wild_type --load_checkpoint_path pretrained_ckpt_path

Acknowledgments

This repository is based on Mindspore official BERT code. For more details see.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
src		src
ACMGordonBell.png		ACMGordonBell.png
IJHPCA_online.pdf		IJHPCA_online.pdf
LICENSE		LICENSE
README.md		README.md
generate_mutation.py		generate_mutation.py
prepared_data.py		prepared_data.py
pretrain_config.yaml		pretrain_config.yaml
run_pretrain.py		run_pretrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACM Gordon Bell COVID 2022 Finalist

Requirements

Prepare data

Pretrain model

Pretrained Models Availability

Generate mutations

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

ZhiweiNiepku/SARS-CoV-2_mutation_simulation

Folders and files

Latest commit

History

Repository files navigation

ACM Gordon Bell COVID 2022 Finalist

Requirements

Prepare data

Pretrain model

Pretrained Models Availability

Generate mutations

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages