Cellranger_scCloud_from_fastq

Running cellranger_scCloud starting with fastq files

Single Cell Portal supports running Cell Ranger and scCloud for visualization of clustering results on the web. The cellranger_scCloud pipeline uses Cell Ranger to process Chromium single-cell RNA-seq output by aligning reads and generating a feature-barcode matrix. scCloud accepts the matrix and performs low quality cell filtration, variable gene selection, batch correction, dimensionality reduction, diffusion map calculation, graph-based clustering, 2D (e.g. t-SNE/FLE) and 3D (diffusion maps) visualization calculations. Running cellranger_scCloud in SCP is an easy way to visualize your scRNA-seq data and share your experiment with other researchers.

This tutorial will demonstrate how to input fastq data to cellranger_scCloud in SCP by stepping through an example analysis workflow. It should take 15-30 minutes to set up and about 2 hours for the analysis to run. Reach out to (scp-support@broadinstitute.zendesk.com) or on the #scp Slack channel with any questions or comments.

By the end, you will reproduce the following interactive visualization in Single Cell Portal:

Step 1. Create a study

You will first need to create a study so there is a place to work with your file. When you create the study, make sure to you use a billing project that is not the Default Project. Note: if you would like to use our analysis test billing project that is free of charge, please contact us at scp-support@broadinstitute.zendesk.com.

Please see instructions here to create a study.

Step 2. Check for Compute Access

If you see the "Analysis" tab, you have a study that is set up correctly for you to run pipelines.

Step 3. Upload data to a study

For this tutorial, please download 1k Brain Nuclei from an E18 Mouse, a 10x Genomics public data set.

You will need to upload the fastq.gz files from the nuclei_900_fastqs folder to the bucket associated with the study you created in step 1.

Here are instructions to upload files to your bucket using gsutil.

Upload folders of FASTQS

Each folder should contain all FASTQ files for one sample. Use sample names for the folder names. This is simply done by adding the folder name (ie. the sample name) you want into the gsutil cp command before the name of the file (the folder will automatically be made for you), see example below:

# Initial example command
# To copy all fastq.gz files in your current folder to the top level of the bucket
gsutil cp *.fastq.gz gs://xx-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

# Modified example command
# Copying all fastq.gz files into a "nuclei_900" folder in the bucket.
gsutil cp *.fastq.gz gs://xx-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/nuclei_900/

Create sample sheet for input fastqs

The sample sheet minimally contains the Sample, Reference and Flowcell (the location of the folder of fastqs for the sample).

A detailed description of the sample sheet for fastqs (aka. "cell ranger count only" mode) can be found at: https://kco-cloud.readthedocs.io/en/latest/cellranger.html#only-run-the-count-part

For this tutorial, use the sample sheet below replacing the google bucket placeholder (gs://xx-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) with the google bucket address for your study.

Sample,Flowcell,Reference
nuclei_900,gs://xx-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/nuclei_900,mm10

Upload sample sheet to your study

From the Summary page for your study, select "Settings":

From the Settings page for your study, select "Upload/Edit Data":

In the Upload Wizard, select the Miscellaneous tab and upload your sample sheet:

You have now uploaded all the data necessary to run cellranger_scCloud!

Step 4. Run pipeline

Go to the "Analysis" tab in your study.

Select the pipeline “cellranger_scCloud_workflow”.

Set parameters and submit your analysis.

cellranger_scCloud_workflow.input_csv_file: look in the dropdown menu for your uploaded sample sheet.

Setup the following parameters

Output directory: your choice of folder name for your output files cellranger_scCloud_workflow.sc_cloud_output_prefix: prefix applied to all files in the analysis
cellranger_scCloud_workflow.run_mkfastq: no

Default settings for the cellranger_scCloud workflow:
- cellranger_scCloud_workflow.cellranger_version: 2.2.0
- cellranger_scCloud_workflow.run_count: yes
- cellranger_scCloud_workflow.run_mkfastq: yes
- cellranger_scCloud_workflow.sccloud_version: 0.6.0
Optional settings for the cellranger_scCloud workflow (defined):
- cellranger_scCloud_workflow.expect_cells
- cellranger_scCloud_workflow.force_cells

Feel free to check in on the pipeline (click Refresh Table as needed).

You can come back to the submission history to check on the status your workflows. Feel free to refresh as needed.

Step 4. Sync Your Outputs

The portal is made to allow you to inspect your outputs of your pipelines before officially putting them in your study. This is particularly important if your study is public or shared, it gives you the ability to catch bad runs before publicizing or sharing the files.

After running a pipeline, you have access to the files if you want to inspect them, but you will need them to be "Sync"ed to the portal study before you can make interactive plots with them. Click the Sync button to select what files to link to the portal. A couple of quick details:

You do not have to Sync all files.
Un-Synced files will stay in your bucket unless deleted.
You can sync some files now and some files later.

Submission History Action Buttons

View Run Info: View submission information.
Sync: Copy the run outputs over to the portal. These will gain the sharing permissions of your study (eg. if the study is private or public, the synced files will be private or public respectively).
Show Errors: Show errors associated with errored workflow runs.
Delete Submission: Delete the submission and outputs generated by the submission but not the inputs.

Step 5. Review and Sync!

For every file found in the workspace, you will be prompted to provide details for each one. Depending on what type of file it is, the forms will update to match your selection. For 10X files we will make guesses for you on how you want to sync the run to optimize portal functionality. Feel free to edit or just use defaults. Click sync to sync each file. For more details, please read our sync detail page.

Edit cluster file names: edit the default portal file names for readability (default names are long to ensure uniqueness during processing)
add species to expression matrix: annotate your expression matrix with the correct species
click sync for each file in from the workflow results

NOTE: If you do not want a file to show up in your study, you can either leave the form blank, or click "Don't Sync" to remove the form from the page. The portal will ignore that file for now. You can add it in at a later date if you wish by re-running the 'Sync Workspace' function for this study.

Step 6. Enjoy Your Data in Your Study

Ok, done! If you selected defaults, you should have a study with two interactive plots, metadata labels, and files to download and share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly