Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 00-reference to build azimuth kidney reference #706

Closed
wants to merge 61 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
6a23d16
Adding 00-reference to build azimuth kidney reference
maud-p Aug 9, 2024
20f04e0
Update analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
maud-p Aug 9, 2024
4691f7d
Update analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
maud-p Aug 9, 2024
bd2a7e0
Update analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
maud-p Aug 9, 2024
7fb795d
Update analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
maud-p Aug 9, 2024
0f2bc73
Update analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
maud-p Aug 9, 2024
557df2c
Update analyses/cell-type-wilms-tumor-06/results/README.md
maud-p Aug 9, 2024
d02eaa8
add to PR #706
maud-p Aug 13, 2024
9deb67c
switch to selfcontained mode to render the plots
maud-p Aug 14, 2024
3d8268e
Update analyses/cell-type-wilms-tumor-06/00_reference.R
maud-p Aug 14, 2024
3ab1e5e
changes to PR #706
maud-p Aug 14, 2024
dad3312
changes to #706
maud-p Aug 16, 2024
4282e29
Uncomment workflow triggers, download relevant project, and run workflow
jaclyn-taroni Aug 16, 2024
6c70fbe
Merge pull request #4 from jaclyn-taroni/jaclyn-taroni/ci-wilms-06
maud-p Aug 19, 2024
cc22690
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
7d60c65
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
b081722
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
c7deac5
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
470d073
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
a47e8c2
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
3d32a5b
Update analyses/cell-type-wilms-tumor-06/notebook_template/00b_charac…
maud-p Aug 19, 2024
f6daa9b
Update analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-…
maud-p Aug 19, 2024
4c3ba05
Update analyses/cell-type-wilms-tumor-06/results/README.md
maud-p Aug 19, 2024
9ce015e
Update analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-…
maud-p Aug 19, 2024
b9f5eff
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
8eecdf5
Update analyses/cell-type-wilms-tumor-06/00_run_workflow.R
maud-p Aug 19, 2024
430b080
Update analyses/cell-type-wilms-tumor-06/notebook_template/00b_charac…
maud-p Aug 19, 2024
c480e83
Changes to PR#706
Aug 19, 2024
4eef88a
Changes to PR#706
Aug 19, 2024
4cf4b70
Update README.md
maud-p Aug 19, 2024
ce31417
debug set up renv error
Aug 19, 2024
c8fffe1
UPDATE renv file
maud-p Aug 19, 2024
b1f6b84
few changes/typos PR #706
maud-p Aug 21, 2024
7060f5c
add system dependencies
maud-p Aug 21, 2024
01b81b8
add ggplotify
maud-p Aug 27, 2024
70d61b4
Update dependencies.R
maud-p Aug 27, 2024
513f7de
Update 00_run_workflow.R
maud-p Aug 27, 2024
1c3dda4
Update run_cell-type-wilms-tumor-06.yml
maud-p Aug 27, 2024
9aeda5b
Update dependencies.R
maud-p Aug 27, 2024
3f7458f
Add files via upload
maud-p Aug 27, 2024
9cd70dd
Update renv.lock
maud-p Aug 28, 2024
8c4e6f4
Update dependencies.R
maud-p Aug 28, 2024
fd0a140
Update renv.lock
maud-p Aug 28, 2024
9abb234
Update 02a_label-transfer_fetal_full_reference_Cao.Rmd
maud-p Aug 28, 2024
0e55ed5
Update 02a_label-transfer_fetal_full_reference_Cao.Rmd
maud-p Aug 28, 2024
c01a3ad
Update 02a_label-transfer_fetal_full_reference_Cao.Rmd
maud-p Aug 28, 2024
0705141
Delete analyses/cell-type-wilms-tumor-06/notebook/SCPCS000168 directory
maud-p Aug 28, 2024
4b761a4
Update 02a_label-transfer_fetal_full_reference_Cao.Rmd
maud-p Aug 28, 2024
268c648
Update 02a_label-transfer_fetal_full_reference_Cao.Rmd
maud-p Aug 28, 2024
f05dc97
transitry comment few lines to faster the debugging process
maud-p Aug 28, 2024
0fcaaed
Update 00_run_workflow.R
maud-p Aug 28, 2024
6d0f854
Update 00_run_workflow.R
maud-p Aug 28, 2024
878dc04
Update run_cell-type-wilms-tumor-06.yml
maud-p Aug 28, 2024
f98da21
Update renv.lock
maud-p Aug 28, 2024
36fa88e
Try to debug k.weight
maud-p Aug 29, 2024
578f2bc
Debug RunAzimuth
maud-p Aug 29, 2024
4d96949
Try debug RunAzimuth
maud-p Aug 29, 2024
41549c0
correct ERROR in notebook 02b_
maud-p Aug 29, 2024
f6d6aec
Try debug RunAzimuth
maud-p Aug 29, 2024
036e799
Expand timeout for reference download
maud-p Aug 29, 2024
0ed2e63
Try debug RunAzimuth
maud-p Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
240 changes: 240 additions & 0 deletions analyses/cell-type-wilms-tumor-06/00_fetal_reference_kidney.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
---
title: "Build azimuth compatible fetal kidney reference from the kidney cell atlas"
author: "Maud PLASCHKA"
date: '2024-08-07'
output:
html_document:
toc: yes
toc_float: yes
code_folding: hide
highlight: pygments
df_print: paged
maud-p marked this conversation as resolved.
Show resolved Hide resolved
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message=FALSE,
warnings=FALSE)
```


Introduction

The aim is to build an azimuth compatible reference for fetal kidney from the kidney cell atlas.
The rds data can be download using the download link https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds

## Packages

Load required packages in the following chunk, if needed.
Do not install packages here; only load them with the `library()` function.

```{r packages, message=FALSE, warning=FALSE}
library("Seurat")
library(sctransform)
library(Azimuth)
library(SCpubr)
library(tidyverse)
library(patchwork)
library(SeuratWrappers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was poking around locally to see if I could better understand the Azimuth problem, and I think we need to get SeuratWrappers into the renv.lock file.

I was able to install from RStudio within the Docker container with:

remotes::install_github("satijalab/seurat-wrappers@8d46d6c47c089e193fe5c02a8c23970715918aa9")

(This is the most recent commit.)

If you run renv::snapshot(), I expect some packages might get removed because the code in this branch doesn't account for what is getting added in #704. That would be okay, though; we'd need to resolve it in whichever branch gets merged second.

```


## Base directories

```{r base paths, eval=TRUE, include=TRUE}
# The base path for the OpenScPCA repository, found by its (hidden) .git directory
repository_base <- rprojroot::find_root(rprojroot::is_git_root)

# The current data directory, found within the repository base directory
data_dir <- file.path(repository_base, "data", "2024-07-08", "SCPCP000006")
maud-p marked this conversation as resolved.
Show resolved Hide resolved

# The path to this module
module_base <- file.path(repository_base, "analyses", "cell-type-wilms-tumor-06")
```


## Input files


rds data can be downloaded using the URLhttps://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds

Please note that this download link permanently references the current version of the dataset (08/2024).
If this dataset is updated, a new download link will be created that permanently references the next version of this dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to change this text if using params.


We save it in the marker-sets folder of the module.
Note to the DataLab: should we save it somewhere else?
I suggest this rds file could be placed here transiently and removed once the reference is build?


```{r path_to_data}
path_to_data <- file.path(module_base, "marker-sets/fetal_full.rds")
maud-p marked this conversation as resolved.
Show resolved Hide resolved

url = "https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds"

download.file(url, path_to_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could move this URL to a parameter with this as the default. I'll comment above with that suggestion.

```
## Output file

The azimuth compatible fetal kidney reference will be saved in the marker-sets folder from the module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this text to reflect whatever happens in the chunk below!



```{r path_to_output}
path_to_output <- file.path(module_base, "marker-sets")
maud-p marked this conversation as resolved.
Show resolved Hide resolved
```


## Some config set-up / threshold

We store in a config list names "cfg" parameters used all along the analysis to filter for p-value, log fold change, percentage of expression, etc.


```{r cfg}
cfg = list()
cfg$padj_thershold = 0.05
cfg$lfc_threshold = 1
cfg$rate1_threshold = 0.5


set.seed(12345)
maud-p marked this conversation as resolved.
Show resolved Hide resolved
```


# Create a azimuth compatible reference

```{r pre_process, echo=TRUE, fig.height=7, fig.width=12, message=FALSE, warning=FALSE, out.width='100%'}
seurat <- readRDS(path_to_data)

d2 <- do_DimPlot(seurat, reduction = "umap", dims = c(1,2), group.by = "compartment", label = TRUE, repel = TRUE) + NoLegend() + ggtitle("umap reduction before SCTransform")
d3 <- do_DimPlot(seurat, reduction = "umap", dims = c(1,2), group.by = "cell_type", label = TRUE, repel = TRUE) + NoLegend()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like, you could still read what is in scratch for this plotting. Although, does do_DimPlot() perform any normalization? Would it be more appropriate to plot after the steps in 117-119? That would change my strategy a bit for the script I proposed.


d2 | d3

options(future.globals.maxSize= 891289600000)
s <- SCTransform(seurat, verbose = FALSE, method = "glmGamPoi", conserve.memory = TRUE)
s <- RunPCA(s, npcs = 50, verbose = FALSE)
s <- RunUMAP(s, dims = 1:50, verbose = FALSE, return.model = TRUE)
```

```{r create_ref, echo=TRUE, fig.height=7, fig.width=12, message=FALSE, warning=FALSE, out.width='100%'}
options(future.globals.maxSize= 891289600000)
Fetal_kidney <- AzimuthReference(
Copy link
Member

@jaclyn-taroni jaclyn-taroni Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if you have had any luck running AzimuthReference within a script? 🤔 Because if so, maybe we make this a script instead of a notebook.

Edited to add: If we did use a script, we'd probably want to use optparse to specify the different parameters (i.e., it would replace our params strategy).

Copy link
Member

@jaclyn-taroni jaclyn-taroni Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know very little about Azimuth, so forgive me if this is a naive question! How is the reference generated here different from what is available on Zenodo? https://zenodo.org/records/4738021#.YJIW4C2ZNQI

Edit: I assume the difference is in the input downloaded from CELLxGENE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run the same in a script instead of a RMarkdown but same, I couldn't run it using one clic on "Source". 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to this, but I am now under the impression we can use Seurat to accomplish many of the same things as Azimuth: https://azimuth.hubmapconsortium.org/#General

Can I run the app myself?

The source code is available here. However, for users interested in performing these analyses outside the context of the Azimuth app, we suggest using Seurat v4 and using our vignette on Mapping and annotating query datasets as an example. You can also download a Seurat v4 R script from the app once your analysis is complete to reproduce the results locally.

(h/t @jashapiro)

Following those links, I assume we'd want to use this section as a reference: https://satijalab.org/seurat/articles/integration_mapping.html#cell-type-classification-using-an-integrated-reference

Being unable to run this successfully except for chunk by chunk gives me pause – it would be hard to test this notebook in GitHub Actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference you suggest can also be of interest but it contains 15 different organs fron a quick look. The one I wanted to use is only composed of cells from the kidney and from what I understood the annotation have been done by kidney experts.
But I could give a try with the one you suggest, might be more straightforward!
I'll compare the two on few samples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I could give a try with the one you suggest, might be more straightforward! I'll compare the two on few samples.

I would be very curious about this result in general!

Another option would be to try to use Seurat #706 (comment) with the kidney dataset.

I am trying to avoid the AzimuthReference() bug if at all possible 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for looking into Seurat/Azimuth!
The label transfer using FindTransferAnchors and TransferData (described in the link you suggested https://satijalab.org/seurat/articles/integration_mapping.html#cell-type-classification-using-an-integrated-reference) is what I used before Seurat and Azimuth v5! I'll go back to it!

s,
refUMAP = "umap",
refDR = "pca",
refAssay = "SCT",
dims = 1:50,
k.param = 31,
plotref = "umap",
plot.metadata = NULL,
ori.index = NULL,
colormap = NULL,
assays = NULL,
metadata = c("compartment", "cell_type"),
reference.version = "0.0.0",
verbose = FALSE
)


d2 <- do_DimPlot(Fetal_kidney, reduction = "refUMAP", dims = c(1,2), group.by = "compartment", label = TRUE, repel = TRUE) + NoLegend() + ggtitle("umap reduction after SCTransform and Seurat workflow")
d3 <- do_DimPlot(Fetal_kidney, reduction = "refUMAP", dims = c(1,2), group.by = "cell_type", label = TRUE, repel = TRUE) + NoLegend()

d2 | d3

# save reference in rds file
SaveAnnoyIndex(object = Fetal_kidney[["refdr.annoy.neighbors"]], file = file.path(path_to_output, "idx.annoy"))
saveRDS(object = Fetal_kidney, file = file.path(path_to_output, "ref.Rds"))

# finally, we can remove the fetal_full.rds file from the marker-sets directory
file.remove(path_to_data)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also come out when using the script.

## Characterization of compartment and cell types in the reference

Here, we use an unbiased approach to find transcripts that characterized the different compartments and cell types.

This is just to get markers genes of the different population, in case some could be of interest for the Wilms tumor annotations.

We run DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense.
DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method.
Please check the preprint from Chistoph Hafemeister: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v1
and tool described here: https://github.com/cancerbits/DElegate

### Find marker genes for each of the compartment


```{r markers_compatment, fig.width=8, fig.height=7, out.width='100%'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be appropriate to save the output of this chunk in results? Will the results be used later?

de_results <- DElegate::FindAllMarkers2(s, group_column = "compartment")

#filter the most relevant markers
s.markers <- de_results[de_results$padj < cfg$padj_thershold & de_results$log_fc > cfg$lfc_threshold & de_results$rate1 > cfg$rate1_threshold,]


DT::datatable(s.markers, caption = ("marker genes"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',

buttons = c( 'csv', 'excel')))

# Select top 5 genes for heatmap plotting
s.markers <- na.omit(s.markers)
s.markers %>%
group_by(group1) %>%
top_n(n = 5, wt = log_fc) -> top5

# subset for plotting
Idents(s) <- s$compartment
cells <- WhichCells(s, downsample = 100)
ss <- subset(s, cells = cells)
ss <- ScaleData(ss, features = top5$feature)

p1 <- SCpubr::do_DimPlot(s, reduction="umap", group.by = "compartment", label = TRUE, repel = TRUE) + ggtitle("compartment")
p2 <- DoHeatmap(ss, features = top5$feature, cells = cells, group.by = "compartment") + NoLegend() +
scale_fill_gradientn(colors = c("#01665e","#35978f",'darkslategray3', "#f7f7f7", "#fee391","#fec44f","#F9AD03"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same color palette as what is used below? So I'd recommend saving it as a variable so you only would need to change it one place:

heatmap_color_palette <- c("#01665e","#35978f",'darkslategray3', "#f7f7f7", "#fee391","#fec44f","#F9AD03")

p3 <- ggplot(s@meta.data, aes(compartment, fill = compartment)) + geom_bar() + NoLegend()


common_title <- sprintf("Unsupervised clustering %s, %d cells", s@meta.data$orig.ident[1], ncol(s))
show((((p1 / p3) + plot_layout(heights = c(3,2)) | p2) ) + plot_layout(widths = c(1, 2)) + plot_layout(heights = c(3,1)) + plot_annotation(title = common_title))


```


### Find marker genes for each of the cell types


```{r markers_cell, fig.width=15, fig.height=17, out.width='100%'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be appropriate to save the output of this chunk in results? Will the results be used later?

de_results <- DElegate::FindAllMarkers2(s, group_column = "cell_type")

#filter the most relevant markers
s.markers <- de_results[de_results$padj < cfg$padj_thershold & de_results$log_fc > cfg$lfc_threshold & de_results$rate1 > cfg$rate1_threshold,]


DT::datatable(s.markers, caption = ("marker genes"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',

buttons = c( 'csv', 'excel')))

# Select top 5 genes for heatmap plotting
s.markers <- na.omit(s.markers)
s.markers %>%
group_by(group1) %>%
top_n(n = 5, wt = log_fc) -> top5

# subset for plotting
Idents(s) <- s$cell_type
cells <- WhichCells(s, downsample = 100)
ss <- subset(s, cells = cells)
ss <- ScaleData(ss, features = top5$feature)

p1 <- SCpubr::do_DimPlot(s, reduction="umap", group.by = "cell_type", label = TRUE, repel = TRUE) + ggtitle("cell_type") + NoLegend()
p2 <- DoHeatmap(ss, features = top5$feature, cells = cells, group.by = "cell_type") + NoLegend() +
scale_fill_gradientn(colors = c("#01665e","#35978f",'darkslategray3', "#f7f7f7", "#fee391","#fec44f","#F9AD03"))
p3 <- ggplot(s@meta.data, aes(cell_type, fill = cell_type)) + geom_bar() + NoLegend() + scale_x_discrete(guide = guide_axis(angle = 90))


common_title <- sprintf("Unsupervised clustering %s, %d cells", s@meta.data$orig.ident[1], ncol(s))
show((((p1 / p3) + plot_layout(heights = c(3,2)) | p2) ) + plot_layout(widths = c(1, 1)) + plot_layout(heights = c(3,1)) + plot_annotation(title = common_title))


```
19 changes: 19 additions & 0 deletions analyses/cell-type-wilms-tumor-06/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,25 @@ Some differenices are expected, some marker genes or pathways are associated wit

## Output files

## Human fetal kidney reference

Wilms tumors can contain up to three histologies that resemble fetal kidney: blastema, stroma, and epithelia [1-2].
Because of their histological similarity to fetal kidneys, Wilms tumors are thought to arise from developmental derangements in embryonic renal progenitors.

We thus decided to use the human fetal kidney atlas to transfer label into the Wilms tumor samples using azimuth.
You can find more about the human kidney atlas here: https://www.kidneycellatlas.org/ [3]

REF:
[1] https://www.ncbi.nlm.nih.gov/books/NBK373356/
[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915828/
[3] https://www.science.org/doi/10.1126/science.aat5031

The first step of the module is thus to download and build the azimuth compatible reference.
This can be achieved running 00_fetal_reference_kidney.Rmd.
Briefly, this will download the reference data from the cellxgene platform: https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds
and create an azimuth compatible Seurat object that will be saved in the marker-sets forlder as ref.Rds and idx.annoy files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to reflect the use of params (to let folks know where to look to see what is being used!) and where it will be saved.



## Marker sets

This folder is a resource for later validation of the annotated cell types.
Expand Down
Loading