Workflow: Single-cell RNA-Seq Analyze

Fetched 2023-01-09 13:04:21 GMT

Single-cell RNA-Seq Analyze ==================================================================== Runs filtering, normalization, scaling, integration (optionally) and clustering for a single or aggregated single-cell RNA-Seq datasets.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional)

Number of cores/cpus to use. Default: 1

dimensions Integer[] (Optional)

Dimensionality to use in UMAP projection and when constructing nearest-neighbor graph before clustering (from 1 to 50). If single value N is provided, use from 1 to N dimensions. If multiple values are provided, subset to only selected dimensions. Default: from 1 to 10

resolution Float[] (Optional)

Clustering resolution applied to the constructed nearest-neighbor graph. Can be set as an array. Default: 0.3, 0.5, 1.0

minimum_pct Float (Optional)

For putative gene markers identification include only those genes that are detected in not lower than this fraction of cells in either of the two tested clusters. Ignored if '--diffgenes' is not set. Default: 0.1

test_to_use

Statistical test to use for putative gene markers identification. Ignored if '--diffgenes' is not set. Default: wilcox

umap_method

UMAP implementation to run. If set to 'umap-learn' use --umetric 'correlation' Default: uwot

umap_metric

The metric to use to compute distances in high dimensional space for UMAP. Default: cosine

umap_spread Float (Optional)

The effective scale of embedded points on UMAP. In combination with '--mindist' it determines how clustered/clumped the embedded points are. Default: 1

mito_pattern String (Optional)

Regex pattern to identify mitochondrial genes. Default: '^Mt-'

umap_mindist Float (Optional)

Controls how tightly the embedding is allowed compress points together on UMAP. Larger values ensure embedded points are moreevenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5. Default: 0.3

barcodes_data File (Optional)

Path to the headerless TSV/CSV file with the list of barcodes to select cells of interest (one barcode per line). Prefilters input feature-barcode matrix to include only selected cells. Default: use all cells.

grouping_data File (Optional)

Path to the TSV/CSV file to define datasets grouping. First column - 'library_id' with the values and order that correspond to the 'library_id' column from the '--identity' file, second column 'condition'. Default: each dataset is assigned to its own group.

maximum_genes Integer[] (Optional)

Include cells with the number of genes not bigger than this value. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 5000 (applied to all datasets)

minimum_genes Integer[] (Optional)

Include cells where at least this many genes are detected. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 250 (applied to all datasets)

minimum_logfc Float (Optional)

For putative gene markers identification include only those genes that on average have log fold change difference in expression between every tested pair of clusters not lower than this value. Ignored if '--diffgenes' is not set. Default: 0.25

regress_genes Boolean (Optional)

Regress genes per cell counts as a confounding source of variation. Default: false

cluster_metric

Distance metric used when constructing nearest-neighbor graph before clustering. Default: euclidean

umap_neighbors Integer (Optional)

Determines the number of neighboring points used in UMAP. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. Default: 30

cell_cycle_data File (Optional)

Path to the TSV/CSV file with the information for cell cycle score assignment. First column - 'phase', second column 'gene_id'. If loaded Seurat object already includes cell cycle scores in 'S.Score' and 'G2M.Score' metatada columns they will be removed. Default: skip cell cycle score assignment.

regress_rna_umi Boolean (Optional)

Regress UMI per cell counts as a confounding source of variation. Default: false

rna_minimum_umi Integer[] (Optional)

Include cells where at least this many UMI (transcripts) are detected. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 500 (applied to all datasets)

genes_of_interest String[] (Optional)

Genes of interest to build genes expression plots. Default: None

maximum_mito_perc Float (Optional)

Include cells with the percentage of transcripts mapped to mitochondrial genes not bigger than this value. Default: 5 (applied to all datasets)

regress_cellcycle Boolean (Optional)

Regress cell cycle scores as a confounding source of variation. Ignored if --cellcycle is not provided. Default: false

regress_mito_perc Boolean (Optional)

Regress the percentage of transcripts mapped to mitochondrial genes as a confounding source of variation. Default: false

rna_minimum_cells Integer (Optional)

Include only genes detected in at least this many cells. Default: 5 (applied to all datasets)

integration_method

Integration method used for joint analysis of multiple datasets. Automatically set to 'none' if loaded Suerat object includes only one dataset. Default: seurat

identify_diff_genes Boolean (Optional)

Identify differentially expressed genes (putative gene markers) between each pair of clusters for all resolutions. Default: false

vector_memory_limit Integer (Optional)

Maximum vector memory in GB allowed to be used by R. Default: 128

aggregation_metadata File (Optional)

Path to the metadata TSV/CSV file to set the datasets identities. If '--mex' points to the Cell Ranger Aggregate outputs, the aggregation.csv file can be used. If input is not provided, the default dummy_metadata.csv will be used instead.

normalization_method

Normalization method applied to genes expression counts. If loaded Seurat object includes multiple datasets, normalization will be run independently for each of them, unless integration is disabled with --ntgr set to 'none' Default: sct

minimum_novelty_score Float[] (Optional)

Include cells with the novelty score not lower than this value, calculated for as log10(genes)/log10(UMI). If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 0.8 (applied to all datasets)

parallel_memory_limit Integer (Optional)

Maximum memory in GB allowed to be shared between the workers when using multiple --cpus. Default: 32

highly_var_genes_count Integer (Optional)

Number of highly variable genes used in datasets integration, scaling and dimensionality reduction. Default: 3000

only_positive_diff_genes Boolean (Optional)

For putative gene markers identification return only positive markers. Ignored if '--diffgenes' is not set. Default: false

feature_bc_matrices_folder File

Path to the compressed folder with feature-barcode matrix from Cell Ranger Count/Aggregate experiment in MEX format.

Steps

ID Runs Label Doc
sc_rna_filter
../tools/sc-rna-filter.cwl (CommandLineTool)
Single-cell RNA-Seq Filtering Analysis

Single-cell RNA-Seq Filtering Analysis ================================================================ Filters single-cell RNA-Seq datasets based on the common QC metrics.

sc_rna_reduce
../tools/sc-rna-reduce.cwl (CommandLineTool)
Single-cell RNA-Seq Dimensionality Reduction Analysis

Single-cell RNA-Seq Dimensionality Reduction Analysis =================================================================================== Integrates multiple single-cell RNA-Seq datasets, reduces dimensionality using PCA.

sc_rna_cluster
../tools/sc-rna-cluster.cwl (CommandLineTool)
Single-cell RNA-Seq Cluster Analysis

Single-cell RNA-Seq Cluster Analysis =============================================================== Clusters single-cell RNA-Seq datasets, identifies gene markers.

uncompress_feature_bc_matrices
../tools/tar-extract.cwl (CommandLineTool)
TAR extract

TAR extract ===============================================

Extracts the content of TAR file into a folder.

Outputs

ID Type Label Doc
elbow_plot_png File (Optional)

Elbow plot (from cells PCA). PNG format

seurat_data_rds File

Processed Seurat data in RDS format

gene_markers_tsv File (Optional)

Differentially expressed genes between each pair of clusters for all resolutions. TSV format

slh_res_plot_png File[] (Optional)

Silhouette scores. Downsampled to max 500 cells per cluster. PNG format

ucsc_cb_html_data Directory (Optional)

Directory with UCSC Cellbrowser html data.

umap_res_plot_png File[] (Optional)

Clustered cells UMAP. PNG format

qc_dim_corr_plot_png File (Optional)

Correlation plots between QC metrics and cells PCA components. PNG format

xpr_avg_res_plot_png File[] (Optional)

Log normalized scaled average gene expression per cluster. PNG format

raw_umi_dnst_plot_png File (Optional)

UMI per cell density (not filtered). PNG format

umap_spl_umi_plot_png File (Optional)

Split by the UMI per cell counts cells UMAP. PNG format

xpr_dnst_res_plot_png File[] (Optional)

Log normalized gene expression density per cluster. PNG format

fltr_umi_dnst_plot_png File (Optional)

UMI per cell density (filtered). PNG format

raw_gene_dnst_plot_png File (Optional)

Genes per cell density (not filtered). PNG format

raw_mito_dnst_plot_png File (Optional)

Percentage of transcripts mapped to mitochondrial genes per cell density (not filtered). PNG format

raw_nvlt_dnst_plot_png File (Optional)

Novelty score per cell density (not filtered). PNG format

umap_qc_mtrcs_plot_png File (Optional)

QC metrics on cells UMAP. PNG format

umap_spl_gene_plot_png File (Optional)

Split by the genes per cell counts cells UMAP. PNG format

umap_spl_mito_plot_png File (Optional)

Split by the percentage of transcripts mapped to mitochondrial genes cells UMAP. PNG format

fltr_gene_dnst_plot_png File (Optional)

Genes per cell density (filtered). PNG format

fltr_mito_dnst_plot_png File (Optional)

Percentage of transcripts mapped to mitochondrial genes per cell density (filtered). PNG format

fltr_nvlt_dnst_plot_png File (Optional)

Novelty score per cell density (filtered). PNG format

raw_cells_count_plot_png File (Optional)

Number of cells per dataset (not filtered). PNG format

sc_rna_filter_stderr_log File

stderr log generated by sc_rna_filter step

sc_rna_filter_stdout_log File

stdout log generated by sc_rna_filter step

sc_rna_reduce_stderr_log File

stderr log generated by sc_rna_reduce step

sc_rna_reduce_stdout_log File

stdout log generated by sc_rna_reduce step

umap_spl_ph_res_plot_png File[] (Optional)

Split by cell cycle phase clustered cells UMAP. PNG format

fltr_cells_count_plot_png File (Optional)

Number of cells per dataset (filtered). PNG format

sc_rna_cluster_stderr_log File

stderr log generated by sc_rna_cluster step

sc_rna_cluster_stdout_log File

stdout log generated by sc_rna_cluster step

umap_spl_cnd_res_plot_png File[] (Optional)

Split by grouping condition clustered cells UMAP. PNG format

xpr_per_cell_res_plot_png File[] (Optional)

Log normalized gene expression on cells UMAP. PNG format

raw_gene_umi_corr_plot_png File (Optional)

Genes vs UMI per cell correlation (not filtered). PNG format

raw_qc_mtrcs_dnst_plot_png File (Optional)

QC metrics per cell density (not filtered). PNG format

umap_spl_idnt_res_plot_png File[] (Optional)

Split by dataset clustered cells UMAP. PNG format

fltr_gene_umi_corr_plot_png File (Optional)

Genes vs UMI per cell correlation (filtered). PNG format

fltr_qc_mtrcs_dnst_plot_png File (Optional)

QC metrics per cell density (filtered). PNG format

umap_gr_cnd_spl_ph_plot_png File (Optional)

Grouped by condition split by cell cycle cells UMAP. PNG format

umap_gr_cnd_spl_umi_plot_png File (Optional)

Grouped by condition split by the UMI per cell counts cells UMAP. PNG format

raw_1_2_qc_mtrcs_pca_plot_png File (Optional)

PC1 and PC2 from the QC metrics PCA (not filtered). PNG format

raw_2_3_qc_mtrcs_pca_plot_png File (Optional)

PC2 and PC3 from the QC metrics PCA (not filtered). PNG format

raw_umi_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition UMI per cell density (not filtered). PNG format

umap_gr_cnd_spl_gene_plot_png File (Optional)

Grouped by condition split by the genes per cell counts cells UMAP. PNG format

umap_gr_cnd_spl_mito_plot_png File (Optional)

Grouped by condition split by the percentage of transcripts mapped to mitochondrial genes cells UMAP. PNG format

fltr_1_2_qc_mtrcs_pca_plot_png File (Optional)

PC1 and PC2 from the QC metrics PCA (filtered). PNG format

fltr_2_3_qc_mtrcs_pca_plot_png File (Optional)

PC2 and PC3 from the QC metrics PCA (filtered). PNG format

fltr_umi_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition UMI per cell density (filtered). PNG format

raw_gene_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition genes per cell density (not filtered). PNG format

raw_mito_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition the percentage of transcripts mapped to mitochondrial genes per cell density (not filtered). PNG format

raw_nvlt_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition the novelty score per cell density (not filtered). PNG format

cmp_gr_ph_spl_clst_res_plot_png File[] (Optional)

Grouped by cell cycle phase split by cluster cells composition plot. Downsampled. PNG format

cmp_gr_ph_spl_idnt_res_plot_png File[] (Optional)

Grouped by cell cycle phase split by dataset cells composition plot. Downsampled. PNG format

fltr_gene_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition genes per cell density (filtered). PNG format

fltr_mito_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition the percentage of transcripts mapped to mitochondrial genes per cell density (filtered). PNG format

fltr_nvlt_dnst_spl_cnd_plot_png File (Optional)

Split by grouping condition the novelty score per cell density (filtered). PNG format

cmp_gr_clst_spl_cnd_res_plot_png File[] (Optional)

Grouped by cluster split by condition cells composition plot. Downsampled. PNG format

cmp_gr_cnd_spl_clst_res_plot_png File[] (Optional)

Grouped by condition split by cluster cells composition plot. Downsampled. PNG format

cmp_gr_clst_spl_idnt_res_plot_png File[] (Optional)

Grouped by cluster split by dataset cells composition plot. Downsampled. PNG format

cmp_gr_idnt_spl_clst_res_plot_png File[] (Optional)

Grouped by dataset split by cluster cells composition plot. Downsampled. PNG format

Permalink: https://w3id.org/cwl/view/git/280cad66c2a5b2e1b66e4f8a5469942e88df5b74/workflows/sc-rna-analyze-wf.cwl