Skip to content

Commit

Permalink
Merge branch 'dev' into nf-core-template-merge-3.1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
charles-plessy authored Jan 21, 2025
2 parents caaa3b3 + 75ea4d4 commit 53e3569
Show file tree
Hide file tree
Showing 93 changed files with 4,926 additions and 994 deletions.
1 change: 0 additions & 1 deletion .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ jobs:
- name: Launch workflow via Seqera Platform
uses: seqeralabs/action-tower-launch@v2
# TODO nf-core: You can customise AWS full pipeline tests as required
# Add full size test data (but still relatively small datasets for few samples)
# on the `test_full.config` test runs with only one set of parameters
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
strategy:
matrix:
NXF_VER:
- "24.04.2"
- "24.10.1"
- "latest-everything"
profile:
- "conda"
Expand Down
15 changes: 9 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,17 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.1.1 - [date]
## v2.0.0~dev - [unreleased]

Initial release of nf-core/pairgenomealign, created with the [nf-core](https://nf-co.re/) template.
## [v1.1.1](https://github.com/nf-core/pairgenomealign/releases/tag/1.1.1) "Kani nabe" - [December 17th, 2024]

This release brings the pipeline to the standards of Nextflow 24.10.1 and
nf-core 3.1.0. No changes were made to the alignment process.

### `Added`
## [v1.1.0](https://github.com/nf-core/pairgenomealign/releases/tag/1.1.0) "Nattou maki" - [September 27th, 2024]

### `Fixed`
Added a new `softmask` parameter, to optionally keep original softmasking.

### `Dependencies`
## [v1.0.0](https://github.com/nf-core/pairgenomealign/releases/tag/1.0.0) "Sweet potato" - [August 27th, 2024]

### `Deprecated`
Initial release of nf-core/pairgenomealign, created with the [nf-core](https://nf-co.re/) template.
18 changes: 15 additions & 3 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,27 @@

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
## Pipeline design

> Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe. Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species. Genome Res. 2024. 34: 426-440; doi: [10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)
## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [LAST](https://gitlab.com/mcfrith/last/)

> Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011 21(3):487-93. doi: 10.1101/gr.113985.110. PubMed PMID: 21209072 (This describes the main algorithms used by LAST.)
> Frith MC, Noé L. Improved search heuristics find 20,000 new alignments between human and mouse genomes. doi: 10.1093/nar/gku104 PubMed PMID: 24493737 (This describes sensitive DNA seeding (MAM8 and MAM4)
> Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biology. 2015 16:106. doi: 10.1186/s13059-015-0670-9 PubMed PMID: 25994148 (Describes the split alignment algorithm, and its application to whole genome alignment.)
> Hamada M, Ono Y, Asai K Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics. 2017 33(6):926-928. doi: 10.1093/bioinformatics/btw742 PubMed PMID: 28039163 (Describes last-train.)
> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
> Frith MC, Shaw J, Spouge JL. How to optimally sample a sequence for rapid analysis. doi: 10.1093/bioinformatics/btad057 PubMed PMID: 36702468 (Describes the lastdb -u RY sparsity options.)
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
## Software packaging/containerisation tools

Expand Down
59 changes: 33 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
</h1>

[![GitHub Actions CI Status](https://github.com/nf-core/pairgenomealign/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/pairgenomealign/actions/workflows/ci.yml)
[![GitHub Actions Linting Status](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/pairgenomealign/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
[![GitHub Actions Linting Status](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/pairgenomealign/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.13910535-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.13910535)
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.10.1-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
Expand All @@ -19,46 +19,44 @@

## Introduction

**nf-core/pairgenomealign** is a bioinformatics pipeline that ...
**nf-core/pairgenomealign** is a bioinformatics pipeline that aligns one or more _query_ genomes to a _target_ genome, and plots pairwise representations.

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->
![Tubemap workflow summary](docs/images/pairgenomealign-tubemap.png "Tubemap workflow summary")

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
The main steps of the pipeline are:

1. Genome QC ([`assembly-scan`](https://github.com/rpetit3/assembly-scan)).
2. Genome indexing ([`lastdb`](https://gitlab.com/mcfrith/last/-/blob/main/doc/lastdb.rst)).
3. Genome pairwise alignments ([`lastal`](https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst)).
4. Alignment plotting ([`last-dotplot`](https://gitlab.com/mcfrith/last/-/blob/main/doc/last-dotplot.rst)).

The pipeline can generate four kinds of outputs, called _many-to-many_, _many-to-one_, _one-to-many_ and _one-to-one_, depending on whether sequences of one genome are allowed match the other genome multiple times or not.

These alignments are output in [MAF](https://genome.ucsc.edu/FAQ/FAQformat.html#format5) format, and optional line plot representations are output in PNG format.

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):
First, prepare a samplesheet with your input data that looks as follows:

`samplesheet.csv`:

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
sample,fasta
query_1,path-to-query-genome-file-one.fasta
query_2,path-to-query-genome-file-two.fasta
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
-->
Each row represents a fasta file, this can also contain multiple rows to accomodate multiple query genomes in fasta format.

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run nf-core/pairgenomealign \
-profile <docker/singularity/.../institute> \
--target sequencefile.fa \
--input samplesheet.csv \
--outdir <OUTDIR>
```
Expand All @@ -76,11 +74,15 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/pairgenomealign was originally written by charles-plessy.
`nf-core/pairgenomealign` was originally written by [charles-plessy](https://github.com/charles-plessy); the original versions are available at <https://github.com/oist/plessy_pairwiseGenomeComparison>.

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
- [Mahdi Mohammed](https://github.com/U13bs1125) ported the original pipeline to _nf-core_ template 2.14.x.
- [Martin Frith](https://github.com/mcfrith/), the author of LAST, gave us extensive feedback and advices.
- [Michael Mansfield](https://github.com/mjmansfi) tested the pipeline and provided critical comments.
- [Aleksandra Bliznina](https://github.com/aleksandrabliznina) contributed to the creation of the initial `last/*` modules.
- [Jiashun Miao](https://github.com/miaojiashun) and [Huyen Pham](https://github.com/ngochuyenpham) tested the pipeline on vertebrate genomes.

## Contributions and Support

Expand All @@ -90,10 +92,15 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/pairgenomealign for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
If you use this pipeline, please cite:

> **Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species.**
> Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe.
> _Genome Res._ 2024. 34: 426-440; doi: [10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)
[OIST research news article](https://www.oist.jp/news-center/news/2024/4/25/oikopleura-who-species-identity-crisis-genome-community)

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
And also please cite the [LAST papers](https://gitlab.com/mcfrith/last/-/blob/main/doc/last-papers.rst).

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

Expand Down
1 change: 0 additions & 1 deletion assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "nf-core/pairgenomealign Methods Description"
section_href: "https://github.com/nf-core/pairgenomealign"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
Expand Down
49 changes: 49 additions & 0 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,52 @@ report_section_order:
export_plots: true

disable_version_detection: true

custom_data:
train:
file_format: "tsv"
section_name: "Training parameter statistics"
plot_type: "table"
headers:
id:
title: "ID"
description: "target___query"
substitution_percent_identity:
title: "Substitution Percent Identity"
"last -t":
title: "Temperature"
description: "Parameter for converting between scores and probability ratios.
This affects the column ambiguity estimates. A score is converted to a probability
ratio by this formula: exp(score / TEMPERATURE). The default value is 1/lambda,
where lambda is the scale factor of the scoring matrix, which is calculated
by the method of Yu and Altschul (YK Yu et al. 2003, PNAS 100(26):15688-93)."
"last -a":
title: "Gap existence"
description: "Gap existence cost (lastal -a)"
"last -b":
title: "Gap extension"
description: "Gap extension cost (lastal -b)"
"last -A":
title: "Insertion existence"
description: "Insertion existence cost (lastal -A)"
"last -B":
title: "Insertion extension"
description: "Insertion extension cost (lastal -B)"
last_o2o:
file_format: "tsv"
section_name: "Alignment statistics"
plot_type: "table"
headers:
id:
title: "ID"
description: "target__query"
TotalAlignmentLength:
title: "Total alignment length"
PercentSimilarity:
title: "Percent similarity"

sp:
last_o2o:
fn: "*o2o_aln.tsv"
train:
fn: "*train.tsv"
Loading

0 comments on commit 53e3569

Please sign in to comment.