Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add merged object to AUCell workflow #1023

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

allyhawkins
Copy link
Member

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Addresses #1017

What is the goal of this pull request?

I would like to be able to look at cell type assignments on all samples together rather than trying to assign cell types individually. To do that we want to look at the merged object, in particular we want to have AUCell results from the merged object.

Briefly describe the general approach you took to achieve this goal.

This PR makes the necessary modifications to the workflow and script that runs AUCell.

  • I added an is_merged option to deal with any special circumstances for the merged object. The formatting is slightly different so this affects filtering the SCE object to remove any non-detected genes.
  • I also noticed that the matrix wasn't quite in a sparse form? It's sparse DelayedMatrix object of type "double" so is showing the 0s and then causes an error when used as direct input to AUCell. I'll file an issue in scpcaTools where we make the merged object, because I think we do want to make sure the output there is actually a dgCMatrix.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes. I already started looking at the results here to make sure this was a reasonable thing to spend time on. So next will be a PR that looks at the results in an exploratory notebook using the template that we have as a guide to create that notebook.

Results

What is the name of your results bucket on S3?

s3://researcher-211125375652-us-east-2/cell-type-ewings/results/aucell-ews-signatures

What types of results does your code produce (e.g., table, figure)?

There should be TSV files with the results. I re-ran the whole workflow and copied the updated files for individual libraries and the merged library to S3.

What is your summary of the results?

Coming next!

Author checklists

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@allyhawkins allyhawkins requested review from sjspielman and removed request for jaclyn-taroni February 3, 2025 19:56
@allyhawkins allyhawkins changed the title Allyhawkins/aucell ewing merged Add merged object to AUCell workflow Feb 3, 2025
@allyhawkins
Copy link
Member Author

Hmm it looks like GHA doesn't have the credentials to download from the workflow results bucket?
See https://github.com/AlexsLemonade/OpenScPCA-analysis/actions/runs/13121720493/job/36609019909?pr=1023

@jaclyn-taroni
Copy link
Member

I suspect the more accurate statement is that download-results.py relies on listing objects, which is not allowed. cc: @jashapiro

@jashapiro
Copy link
Member

I think you are missing the --test-data flag for the download-results.py script, so it is trying to use the true results, which require credentials.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Suggested a few changes, but don't need to see again.

I'd recommend updating the scripts/README usage as well here:

To run this script using the default parameters use the following command:
```sh
Rscript 01-aucell.R \
--sce_file <path to processed SCE file> \
--output_file <path to TSV file to save AUC results>
```

Also, while I was finding this line in that README, I came across this line which has the wrong extension - it should be .R not .Rmd:

```sh
Rscript 04-run-infercnv.Rmd \
--annotations_file <path to save annotations file> \
--reference_cell_file <path to file with table of normal cell barcodes> \
--output_dir <full path to folder to save results> \
--threads 4
```

allyhawkins and others added 2 commits February 3, 2025 16:28
Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants