generated from snakemake-workflows/snakemake-workflow-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* chore: release 1.0.0 * feat: create symlink for selected clusters * docs: add detailed descriptions of separate steps * feat: add bandage graph * feat: visualize tree cluster with ggtree * chore: remove threads for bandage * fix: correct logging * chore: release 1.0.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matin Nuhamunada <matinnu@biosustain.dtu.dk>
- Loading branch information
1 parent
d4dd5ed
commit eb90925
Showing
11 changed files
with
216 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Changelog | ||
|
||
## 1.0.0 (2022-04-20) | ||
|
||
|
||
### Features | ||
|
||
* add polishing step ([d16c791](https://www.github.com/matinnuhamunada/genome_assembly_tryouts/commit/d16c791b54e0cd447ba533fa65d0b35d93eda9a0)) | ||
* initial attempt of trycycler ([2ae19d4](https://www.github.com/matinnuhamunada/genome_assembly_tryouts/commit/2ae19d4003c09801c486748a6a09f32c2ecb3256)) | ||
* split steps ([0eed51f](https://www.github.com/matinnuhamunada/genome_assembly_tryouts/commit/0eed51ffe6c938bd630d82791051b1daf9dbfd8b)) | ||
|
||
|
||
### Bug Fixes | ||
|
||
* clean files ([796bc78](https://www.github.com/matinnuhamunada/genome_assembly_tryouts/commit/796bc78a7fc392b85cbbaee8ad85d17397be7e3b)) | ||
* correct input folder name ([cbad160](https://www.github.com/matinnuhamunada/genome_assembly_tryouts/commit/cbad1602c3f119331428f3300cf5bb2dcf29cae1)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
name: R_env | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- defaults | ||
dependencies: | ||
- r-base | ||
- bioconductor-ggtree | ||
- bioconductor-treeio | ||
- bioconductor-ggtreeextra | ||
- r-ggplot2 | ||
- r-ape | ||
- r-dplyr | ||
- r-argparser | ||
- r-aplot | ||
- r-tidytree |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
#library("treedataverse") | ||
library("argparser") | ||
library("aplot") | ||
#library("svglite") | ||
library("ape") | ||
library("dplyr") | ||
library("ggplot2") | ||
library("tidytree") | ||
library("treeio") | ||
library("ggtree") | ||
library("ggtreeExtra") | ||
|
||
# Parse command line arguments | ||
p <- arg_parser("Draw tree from trycycler cluster, with cluster names, sequence lengths, and read depths") | ||
|
||
p <- add_argument(p, "--input", | ||
short = "-i", | ||
help = "Newick file generated by trycycler", | ||
default = "contigs.newick") | ||
|
||
p <- add_argument(p, "--output", | ||
short = "-o", | ||
default = "tree_cluster.png", | ||
help = "Output image file (PNG)") | ||
|
||
argv <- parse_args(p) | ||
|
||
|
||
# READFILE | ||
newick <- read.newick(argv$input) | ||
|
||
# GENERATE EMPTY DATAFRAME | ||
metadata <- data.frame(old_label = vector(mode = "character"), | ||
label = vector(mode = "character"), | ||
cluster_name = vector(mode = "character"), | ||
length = vector(mode = "integer"), | ||
depth = vector(mode = "numeric")) | ||
|
||
# EXTRACT DATA FROM TREE | ||
label <- newick$tip.label | ||
for (i in seq_along(label)) { | ||
item <- sapply(strsplit(label[i],"_"), function(x){ | ||
old_label <- label[i] | ||
new_label <- paste(x[[2]], x[[3]], sep = "") | ||
cluster_name <- paste(x[[1]], x[[2]], sep = "_") | ||
length <- as.integer(x[length(x) - 2]) | ||
tip_depth <- tail(x, n=1) | ||
depth <- as.numeric(substring(tip_depth, 1, nchar(tip_depth) - 1)) | ||
output <- c(old_label, new_label, cluster_name, length, depth) | ||
}) | ||
metadata[nrow(metadata) + 1, ] <- item | ||
} | ||
metadata <- transform(metadata, length = as.integer(length), | ||
depth = as.integer(depth)) | ||
|
||
# RENAME TIP LABEL AND MERGE DATA INTO TREE | ||
tree <- rename_taxa(newick, metadata, old_label, label) | ||
drops <- c("old_label") | ||
tree_data <- metadata[ , !(names(metadata) %in% drops)] | ||
p <- ggtree(tree) %<+% tree_data | ||
|
||
|
||
# DRAW TREE | ||
g <- p + geom_tiplab(offset = .005, | ||
hjust = 0.005, | ||
align=TRUE, | ||
size = 3.2) + | ||
geom_tippoint(aes(shape = cluster_name, color = cluster_name)) + | ||
theme(legend.position = "right") + | ||
theme_tree2() | ||
|
||
# DRAW BUBBLES | ||
p2 <- ggplot(tree_data, aes(x=0, y = label, label=length)) + | ||
geom_point(aes(size = length, fill = depth), alpha = 0.9, shape = 21) + | ||
geom_text(hjust=-0.5, vjust=0.5, size=2) + | ||
scale_size(trans = "log10") + | ||
scale_fill_gradient(low = "white", high = "blue") + | ||
theme(axis.title = element_blank(), | ||
axis.ticks.x = element_blank(), | ||
axis.text = element_blank(), | ||
panel.background=element_blank(), | ||
panel.border=element_blank(), | ||
panel.grid.major=element_blank(), | ||
plot.background=element_blank(), | ||
) | ||
|
||
# CREATE COMPOSITE GRAPH | ||
p2 %>% insert_left(g, width = 2) | ||
|
||
ggsave(argv$output, dpi = 300) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
from pathlib import Path | ||
import yaml | ||
import sys | ||
|
||
def get_clusters(filepath): | ||
""" | ||
Read the clusters output and return a dictionary | ||
""" | ||
try: | ||
with open(filepath) as file: | ||
selected_cluster = yaml.load(file, Loader=yaml.FullLoader) | ||
return selected_cluster | ||
except FileNotFoundError as e: | ||
sys.stderr.write(f"No cluster selected. The file: <{filepath}> is not a valid cluster format. Check your config.yaml.\n") | ||
raise e | ||
|
||
def symlink_cluster(strain, cluster, cluster_file, source_dir = 'data/interim/02_trycycler_cluster', target_dir = 'data/interim/03_trycycler_consensus'): | ||
clusters = get_clusters(cluster_file) | ||
source_path = Path(source_dir) | ||
target_path = Path(target_dir) | ||
|
||
for contigs in clusters[strain][cluster]: | ||
source = source_path / strain / cluster / "1_contigs" / f"{contigs}.fasta" | ||
target = target_path / strain / cluster / "1_contigs" / f"{contigs}.fasta" | ||
target.parent.mkdir(parents=True, exist_ok=True) | ||
try: | ||
target.symlink_to(source.resolve(), target_is_directory=False) | ||
except: | ||
if target.is_symlink(): | ||
target.unlink(missing_ok=True) | ||
target.symlink_to(source.resolve(), target_is_directory=False) | ||
with open(str(target_path / strain / f"{cluster}_copy.log"), 'w') as f: | ||
f.write("") | ||
return | ||
|
||
if __name__ == "__main__": | ||
symlink_cluster(sys.argv[1], sys.argv[2], sys.argv[3]) |