-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The percentage of selected cell is: 0.000% #42
Comments
Hi Yuanjian,
I may need your data to debug it. Could you please send me your data to
***@***.***? For data privacy, you can send me a subset of data
with this problem.
Thanks,
Duanchen
…On Tue, 13 Dec 2022 at 09:46, Yuanjian Huang ***@***.***> wrote:
Hello Scissor,
I'm running Scissor, but no matter what alpha I used, it always came out 0
selected cells.
Could you please help me with this issue?
Thanks!
Best,
Yuanjian
infos <- Scissor_change(bulk_dataset=GeneMatrix_HGNC_input, sc_dataset=Merge2.combined, phenotype=phenotype, tag = tag, alpha = seq(1,5,1)/10000000, cutoff = 0.01, family = "binomial", Save_file = "Merge2_Scissor.RData")
[1] "|**************************************************|"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100% -0.04986036 0.14333164 0.23372394 0.35075314 0.79686469
[1] "|**************************************************|"
[1] "Current phenotype contains 48 Wild type and 17 Mutated samples."
[1] "Perform logistic regression on the given phenotypes:"
[1] "alpha = 1e-07"
[1] "Scissor identified 0 Scissor+ cells and 0 Scissor- cells."
[1] "The percentage of selected cell is: 0.000%"
—
Reply to this email directly, view it on GitHub
<#42>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYQWDEAUKAV7XE5PMTAUILWM7IQJANCNFSM6AAAAAAS4T6KVU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hello Duanchen, Data on Google drive: https://drive.google.com/file/d/1ZWuyQD7LVIfS1DBHDjdoP5-ZoAo16NBY/view?usp=sharing (2.1GB) Sys.setenv(LANG = "en_US")
library(Seurat)
library(ggplot2)
library(sctransform)
library(dplyr)
library(patchwork)
library(SeuratDisk)
library(TCGAbiolinks)
library(SummarizedExperiment)
library(DT)
library(biomaRt)
library(Scissor)
memory.limit(512000)
options(future.globals.maxSize = 256000*1024^2)
# 1. Get the patient list with the interested gene mutated
# download TCGA CNV information
query_1 <- GDCquery(project = "TCGA-COAD", data.category = "Simple Nucleotide Variation", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking", legacy = FALSE, access= 'open', data.format='TXT')
GDCdownload(query_1)
CNV <- GDCprepare(query=query_1, save=TRUE, save.filename='C:/Users/Park_Lab/Documents/TCGA_COAD_CNV.rda')
write.csv(CNV, file="C:/Users/Park_Lab/Documents/TCGA_COAD_CNV.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# retrieve the patient list with the interested gene mutated
mutation_df <- CNV[CNV$Hugo_Symbol == 'MSH6',] # subset CNV table into a table with only interested gene mutated
write.csv(mutation_df, file="C:/Users/Park_Lab/Documents/mutation_df.csv", sep = ",", row.names=TRUE, col.names = TRUE)
mutation_patient=as.data.frame(mutation_df$Tumor_Sample_Barcode) # use column `Tumor_Sample_Barcode` to build a new data frame
colnames(mutation_patient) <- c("Patient_ID") # rename the name of column 0 to Patient_ID
mutation_patient$Patient_ID <- substr(mutation_patient$Patient_ID, 0, 12) # only keep the first 12 character as Patient_ID, these patients are with interested gene mutated, note that this patient list includes all cancer types
mutation_patient$Status <- 1 # set 1 as the mutation status
write.csv(mutation_patient, file="C:/Users/Park_Lab/Documents/mutation_patient_list.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# 2. Get the patient list with interested pathological type
# download TCGA-COAD clinical index data
clinical_index <- GDCquery_clinic(project = "TCGA-COAD", type = "Clinical", save.csv = TRUE) # will generate TCGA-COAD_clinical.csv automatically
# retrieve the patient list with the interested pathological type
disease_clinical <- clinical_index %>% filter(primary_diagnosis=="Mucinous adenocarcinoma") # only keep the mucinous cancer patients clinical data
disease_patient=as.data.frame(disease_clinical$submitter_id) # use column `submitter_id` to build a new data frame
colnames(disease_patient) <- c("Patient_ID") # rename the name of column 0 to Patient_ID, note that this patient list includes all mucinous cancer with or without interested gene mutated
write.csv(disease_patient, file="C:/Users/Park_Lab/Documents/disease_patient_list1.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# 3. Prepare phenotype list (For patient diagnosed with one pathological type, set mutated ones as 1, wild type as 0)
disease_patient$Status=mutation_patient$Status[match(disease_patient$Patient_ID, mutation_patient$Patient_ID)] # copy mutation_patient status (1) to disease_patient if disease_patient also exist in mutation_patient data frame
disease_patient[is.na(disease_patient)] = 0 # set left patient status as 0, indicating wild type
write.csv(disease_patient, file="C:/Users/Park_Lab/Documents/disease_patient_list2.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# 4. Prepare human gene symbol annotated TCGA-COAD RNA-seq gene matrix
# download TCGA-COAD gene matrix
query_2 <- GDCquery(project = "TCGA-COAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", legacy = FALSE, access= 'open', data.format='TXT', experimental.strategy='RNA-Seq', sample.type=c('Primary Tumor')) # didn't download 'Solid Tissue Normal' data, because this script is for tumor CNV
GDCdownload(query_2)
RNA_seq <- GDCprepare(query=query_2, save=TRUE, save.filename='C:/Users/Park_Lab/Documents/TCGA_COAD_RNA_seq.rda')
GeneMatrix <- assay(RNA_seq)
write.csv(GeneMatrix, file="C:/Users/Park_Lab/Documents/TCGA_COAD_GeneMatrix.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# convert human ensembl_id of gene matrix to gene_symbol
human <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")
ensembl_id <- rownames(GeneMatrix)
gene_symbol <- getBM(attributes = c('ensembl_gene_id_version', 'hgnc_symbol'), filters = 'ensembl_gene_id_version', values = ensembl_id, mart = human)
head(gene_symbol)
gene_symbol <- gene_symbol[!(gene_symbol$hgnc_symbol==""),] # remove rows with empty human gene
gene_symbol <- gene_symbol[!duplicated(gene_symbol$ensembl_gene_id_version),] # remove rows with duplicated human ensembl
gene_symbol <- gene_symbol[!duplicated(gene_symbol$hgnc_symbol),] # remove rows with duplicated human gene
class(GeneMatrix)
GeneMatrix_df<- as.data.frame(GeneMatrix)
class(GeneMatrix_df)
GeneMatrix_df$genes <- gene_symbol$hgnc_symbol[match(rownames(GeneMatrix_df), gene_symbol$ensembl_gene_id_version)] # copy gene symbols to gene expression matrix, matching human ensembl id
GeneMatrix_df <- na.omit(GeneMatrix_df) # remove rows with 'NA' in 'genes' column
rownames(GeneMatrix_df) <- GeneMatrix_df$genes # set 'genes' as rownames
GeneMatrix_HGNC = GeneMatrix_df[,!(colnames(GeneMatrix_df) %in% c('genes'))] # remove 'genes' column
write.csv(GeneMatrix_HGNC, file="C:/Users/Park_Lab/Documents/GeneMatrix_HGNC.csv", sep = ",", row.names=TRUE, col.names = TRUE) # gene matrix with gene_symbol
# 5. Match gene matrix and phenotype list
# subset gene matrix into the patients that only exist in the phenotype list
colnames(GeneMatrix_HGNC) <- substr(colnames(GeneMatrix_HGNC), 0, 12) # only keep the first 12 character of sample ID to for matching with Patient_ID later
GeneMatrix_HGNC <- GeneMatrix_HGNC[, !duplicated(colnames(GeneMatrix_HGNC))] # 1 patient may have >1 samples, after simplify sample ID, there will be duplicates, but phenotype list has no duplicates, so we have to remove these sample duplicates
GeneMatrix_HGNC <- subset(GeneMatrix_HGNC, select = colnames(GeneMatrix_HGNC) %in% disease_patient$Patient_ID) # only include patients that also exist in the phenotype list
write.csv(GeneMatrix_HGNC, file="C:/Users/Park_Lab/Documents/GeneMatrix_HGNC_disease.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# make gene matrix and phenotype list the same order
GeneMatrix_HGNC_sorted <- GeneMatrix_HGNC[ , order(match(colnames(GeneMatrix_HGNC), disease_patient$Patient_ID))] # sort gene matrix the same order as patient ids of phenotype list
all(colnames(GeneMatrix_HGNC_sorted) == disease_patient$Patient_ID) # check whether the orders of sample ids in gene matrix and phenotype list are the same
write.csv(GeneMatrix_HGNC_sorted, file="C:/Users/Park_Lab/Documents/GeneMatrix_HGNC_disease_sorted.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# make the sample id annotated numeric phenotype list
class(disease_patient)
rownames(disease_patient) <- disease_patient$Patient_ID # set Patient_ID as data frame index
disease_patient_input = disease_patient[,!(colnames(disease_patient) %in% c('Patient_ID'))] # remove column 'Patient_ID'
disease_patient_input # disease_patient becomes numeric, only the values of column 'Status' will show
names(disease_patient_input) = rownames(disease_patient) # copy the rownames back
disease_patient_input
class(disease_patient_input)
table(disease_patient_input) # show statistics of status 0 and 1
write.csv(disease_patient_input, file="C:/Users/Park_Lab/Documents/disease_patient_input.csv", sep = ",", row.names=TRUE, col.names = TRUE)
# 6. Prepare scRNA-seq data
Merge2.combined <- readRDS(file = "C:/Users/Park_Lab/Documents/Merge2.combined_all.rds")
# 7. Run Scissor
phenotype <- disease_patient_input
tag <- c('Wild type', 'Mutated') # name the phenotype 0 (Wild type) first, then phenotype 1 (Mutated)
GeneMatrix_HGNC_input <- as.matrix(GeneMatrix_HGNC_sorted) # convert gene matrix (data.frame) to 'matrix'
# data is too big, the original Scissor() got errors. To fix this error, run the code of Scissor_change.R and as_matrix.R first, then run below steps
infos <- Scissor_change(bulk_dataset=GeneMatrix_HGNC_input, sc_dataset=Merge2.combined, phenotype=phenotype, tag = tag, alpha = 0.000012, cutoff = 0.01, family = "binomial", Save_file = "Merge2_Scissor.RData")
# MSH6 0.00001, 82.336%; 0.000011, 81.302%; 0.000012-0.00005, 0;
# TP53 >0.0000001, 0 |
Hi, I met the same problem that Scissor turned out to select 0.000%. Could I ask you how you work this out? |
Hi, I met the same problem that Scissor turned out to select 0.000%. Could I ask you how you work this out? |
Hello Scissor,
I'm running Scissor, but no matter what alpha I used, it always came out 0 selected cells.
Could you please help me with this issue?
Thanks!
Best,
Yuanjian
The text was updated successfully, but these errors were encountered: