AlexsLemonade · maud-p · Aug 9, 2024 · Aug 9, 2024 · Aug 9, 2024 · Aug 9, 2024
diff --git a/analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd b/analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
@@ -0,0 +1,240 @@
+---
+title: "Build azimuth compatible fetal kidney reference from the kidney cell atlas"
+author: "Maud PLASCHKA"
+date: '2024-08-07'
+output: 
+  html_document: 
+    toc: yes
+    toc_float: yes
+    code_folding: hide
+    highlight: pygments
+    df_print: paged
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE,
+                      message=FALSE,
+                      warnings=FALSE)
+```
+
+
+Introduction
+
+The aim is to build an azimuth compatible reference for fetal kidney from the kidney cell atlas.
+The rds data can be download using the download link https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds
+
+## Packages
+
+Load required packages in the following chunk, if needed.
+Do not install packages here; only load them with the `library()` function.
+
+```{r packages, message=FALSE, warning=FALSE}
+library("Seurat")
+library(sctransform)
+library(Azimuth)
+library(SCpubr)
+library(tidyverse)
+library(patchwork)
+library(SeuratWrappers)
+```
+
+
+## Base directories
+
+```{r base paths, eval=TRUE, include=TRUE}
+# The base path for the OpenScPCA repository, found by its (hidden) .git directory
+repository_base <- rprojroot::find_root(rprojroot::is_git_root)
+
+# The current data directory, found within the repository base directory
+data_dir <- file.path(repository_base, "data", "2024-07-08", "SCPCP000006")
+
+# The path to this module
+module_base <- file.path(repository_base, "analyses", "cell-type-wilms-tumor-06")
+```
+
+
+## Input files
+
+
+rds data can be downloaded using the URLhttps://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds
+
+Please note that this download link permanently references the current version of the dataset (08/2024). 
+If this dataset is updated, a new download link will be created that permanently references the next version of this dataset.
+
+We save it in the marker-sets folder of the module.
+Note to the DataLab: should we save it somewhere else? 
+I suggest this rds file could be placed here transiently and removed once the reference is build?
+
+
+```{r path_to_data}
+path_to_data <- file.path(module_base, "marker-sets/fetal_full.rds")
+
+url = "https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds"
+
+download.file(url, path_to_data)
+```
+## Output file
+
+The azimuth compatible fetal kidney reference will be saved in the marker-sets folder from the module.
+
+
+```{r path_to_output}
+path_to_output <- file.path(module_base, "marker-sets")
+```
+
+
+## Some config set-up / threshold
+
+We store in a config list names "cfg" parameters used all along the analysis to filter for p-value, log fold change, percentage of expression, etc.
+
+
+```{r cfg}
+cfg = list()
+cfg$padj_thershold = 0.05
+cfg$lfc_threshold = 1
+cfg$rate1_threshold = 0.5
+
+
+set.seed(12345)
+```
+
+
+# Create a azimuth compatible reference
+
+```{r pre_process, echo=TRUE, fig.height=7, fig.width=12, message=FALSE, warning=FALSE, out.width='100%'}
+seurat <- readRDS(path_to_data)
+
+d2 <- do_DimPlot(seurat, reduction = "umap", dims = c(1,2), group.by = "compartment", label = TRUE, repel = TRUE) + NoLegend() + ggtitle("umap reduction before SCTransform")
+d3 <- do_DimPlot(seurat, reduction = "umap", dims = c(1,2), group.by = "cell_type", label = TRUE, repel = TRUE) + NoLegend()
+
+d2 | d3
+
+options(future.globals.maxSize= 891289600000)
+s <- SCTransform(seurat, verbose = FALSE, method = "glmGamPoi", conserve.memory = TRUE)
+s <- RunPCA(s, npcs = 50, verbose = FALSE)
+s <- RunUMAP(s, dims = 1:50, verbose = FALSE, return.model = TRUE)
+```
+
+```{r create_ref, echo=TRUE, fig.height=7, fig.width=12, message=FALSE, warning=FALSE, out.width='100%'}
+options(future.globals.maxSize= 891289600000)
+Fetal_kidney <- AzimuthReference(
+  s,
+  refUMAP = "umap",
+  refDR = "pca",
+  refAssay = "SCT",
+  dims = 1:50,
+  k.param = 31,
+  plotref = "umap",
+  plot.metadata = NULL,
+  ori.index = NULL,
+  colormap = NULL,
+  assays = NULL,
+  metadata = c("compartment", "cell_type"),
+  reference.version = "0.0.0",
+  verbose = FALSE
+)
+
+
+d2 <- do_DimPlot(Fetal_kidney, reduction = "refUMAP", dims = c(1,2), group.by = "compartment", label = TRUE, repel = TRUE) + NoLegend() + ggtitle("umap reduction after SCTransform and Seurat workflow")
+d3 <- do_DimPlot(Fetal_kidney, reduction = "refUMAP", dims = c(1,2), group.by = "cell_type", label = TRUE, repel = TRUE) + NoLegend()
+
+d2 | d3
+
+# save reference in rds file
+SaveAnnoyIndex(object = Fetal_kidney[["refdr.annoy.neighbors"]], file = file.path(path_to_output, "idx.annoy"))
+saveRDS(object = Fetal_kidney, file = file.path(path_to_output, "ref.Rds"))
+
+# finally, we can remove the fetal_full.rds file from the marker-sets directory
+file.remove(path_to_data)
+```
+## Characterization of compartment and cell types in the reference
+
+Here, we use an unbiased approach to find transcripts that characterized the different compartments and cell types.
+
+This is just to get markers genes of the different population, in case some could be of interest for the Wilms tumor annotations. 
+
+We run DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense. 
+DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method. 
+Please check the preprint from Chistoph Hafemeister: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v1
+and tool described here: https://github.com/cancerbits/DElegate 
+
+### Find marker genes for each of the compartment
+
+
+```{r markers_compatment, fig.width=8, fig.height=7, out.width='100%'}
+de_results   <- DElegate::FindAllMarkers2(s, group_column = "compartment")
+
+#filter the most relevant markers
+s.markers <- de_results[de_results$padj < cfg$padj_thershold & de_results$log_fc > cfg$lfc_threshold & de_results$rate1 > cfg$rate1_threshold,]
+
+
+DT::datatable(s.markers, caption = ("marker genes"), 
+              extensions = 'Buttons', 
+              options = list(  dom = 'Bfrtip',
+
+                               buttons = c( 'csv', 'excel')))
+
+# Select top 5 genes for heatmap plotting
+s.markers <- na.omit(s.markers)
+s.markers %>%
+    group_by(group1) %>%
+    top_n(n =  5, wt = log_fc) -> top5
+
+# subset for plotting
+Idents(s) <- s$compartment
+cells <- WhichCells(s, downsample = 100)
+ss <- subset(s, cells = cells)
+ss <- ScaleData(ss, features = top5$feature)
+
+p1 <- SCpubr::do_DimPlot(s, reduction="umap", group.by = "compartment", label = TRUE, repel = TRUE) + ggtitle("compartment")
+p2 <- DoHeatmap(ss, features = top5$feature,  cells = cells, group.by = "compartment") + NoLegend() + 
+  scale_fill_gradientn(colors =  c("#01665e","#35978f",'darkslategray3', "#f7f7f7", "#fee391","#fec44f","#F9AD03")) 
+p3 <- ggplot(s@meta.data, aes(compartment, fill = compartment)) + geom_bar() + NoLegend()
+
+
+common_title <- sprintf("Unsupervised clustering %s, %d cells", s@meta.data$orig.ident[1], ncol(s))
+show((((p1 / p3) + plot_layout(heights = c(3,2)) | p2) ) + plot_layout(widths = c(1, 2)) + plot_layout(heights = c(3,1)) + plot_annotation(title = common_title))
+
+
+```
+
+
+### Find marker genes for each of the cell types
+
+
+```{r markers_cell, fig.width=15, fig.height=17, out.width='100%'}
+de_results   <- DElegate::FindAllMarkers2(s, group_column = "cell_type")
+
+#filter the most relevant markers
+s.markers <- de_results[de_results$padj < cfg$padj_thershold & de_results$log_fc > cfg$lfc_threshold & de_results$rate1 > cfg$rate1_threshold,]
+
+
+DT::datatable(s.markers, caption = ("marker genes"), 
+              extensions = 'Buttons', 
+              options = list(  dom = 'Bfrtip',
+
+                               buttons = c( 'csv', 'excel')))
+
+# Select top 5 genes for heatmap plotting
+s.markers <- na.omit(s.markers)
+s.markers %>%
+    group_by(group1) %>%
+    top_n(n =  5, wt = log_fc) -> top5
+
+# subset for plotting
+Idents(s) <- s$cell_type
+cells <- WhichCells(s, downsample = 100)
+ss <- subset(s, cells = cells)
+ss <- ScaleData(ss, features = top5$feature)
+
+p1 <- SCpubr::do_DimPlot(s, reduction="umap", group.by = "cell_type", label = TRUE, repel = TRUE) + ggtitle("cell_type") + NoLegend()
+p2 <- DoHeatmap(ss, features = top5$feature,  cells = cells, group.by = "cell_type") + NoLegend() + 
+  scale_fill_gradientn(colors =  c("#01665e","#35978f",'darkslategray3', "#f7f7f7", "#fee391","#fec44f","#F9AD03")) 
+p3 <- ggplot(s@meta.data, aes(cell_type, fill = cell_type)) + geom_bar() + NoLegend() + scale_x_discrete(guide = guide_axis(angle = 90))
+
+
+common_title <- sprintf("Unsupervised clustering %s, %d cells", s@meta.data$orig.ident[1], ncol(s))
+show((((p1 / p3) + plot_layout(heights = c(3,2)) | p2) ) + plot_layout(widths = c(1, 1)) + plot_layout(heights = c(3,1)) + plot_annotation(title = common_title))
+
+
+```
@@ -67,6 +67,25 @@ Some differenices are expected, some marker genes or pathways are associated wit
 
 ## Output files
 
+## Human fetal kidney reference 
+
+Wilms tumors can contain up to three histologies that resemble fetal kidney: blastema, stroma, and epithelia [1-2].  
+Because of their histological similarity to fetal kidneys, Wilms tumors are thought to arise from developmental derangements in embryonic renal progenitors.
+
+We thus decided to use the human fetal kidney atlas to transfer label into the Wilms tumor samples using azimuth. 
+You can find more about the human kidney atlas here: https://www.kidneycellatlas.org/ [3]
+
+REF: 
+[1] https://www.ncbi.nlm.nih.gov/books/NBK373356/
+[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915828/
+[3] https://www.science.org/doi/10.1126/science.aat5031
+
+The first step of the module is thus to download and build the azimuth compatible reference. 
+This can be achieved running 00_fetal_reference_kidney.Rmd. 
+Briefly, this will download the reference data from the cellxgene platform: https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds
+and create an azimuth compatible Seurat object that will be saved in the marker-sets forlder as ref.Rds and idx.annoy files. 
+
+
 ## Marker sets
 
 This folder is a resource for later validation of the annotated cell types.