CWL Workflow: Single-cell RNA-Seq Analyze

Workflow: Single-cell RNA-Seq Analyze

Fetched 2023-01-09 13:04:16 GMT

Verified with cwltool version 3.1.20221201130942

Single-cell RNA-Seq Analyze Runs filtering, normalization, scaling, integration (optionally) and clustering for a single or aggregated single-cell RNA-Seq datasets.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Doc
threads	Integer (Optional)	Number of cores/cpus to use. Default: 1
dimensions	Integer[] (Optional)	Dimensionality to use in UMAP projection and when constructing nearest-neighbor graph before clustering (from 1 to 50). If single value N is provided, use from 1 to N dimensions. If multiple values are provided, subset to only selected dimensions. Default: from 1 to 10
resolution	Float[] (Optional)	Clustering resolution applied to the constructed nearest-neighbor graph. Can be set as an array. Default: 0.3, 0.5, 1.0
minimum_pct	Float (Optional)	For putative gene markers identification include only those genes that are detected in not lower than this fraction of cells in either of the two tested clusters. Ignored if '--diffgenes' is not set. Default: 0.1
test_to_use		Statistical test to use for putative gene markers identification. Ignored if '--diffgenes' is not set. Default: wilcox
umap_method		UMAP implementation to run. If set to 'umap-learn' use --umetric 'correlation' Default: uwot
umap_metric		The metric to use to compute distances in high dimensional space for UMAP. Default: cosine
umap_spread	Float (Optional)	The effective scale of embedded points on UMAP. In combination with '--mindist' it determines how clustered/clumped the embedded points are. Default: 1
mito_pattern	String (Optional)	Regex pattern to identify mitochondrial genes. Default: '^Mt-'
umap_mindist	Float (Optional)	Controls how tightly the embedding is allowed compress points together on UMAP. Larger values ensure embedded points are moreevenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5. Default: 0.3
barcodes_data	File (Optional)	Path to the headerless TSV/CSV file with the list of barcodes to select cells of interest (one barcode per line). Prefilters input feature-barcode matrix to include only selected cells. Default: use all cells.
grouping_data	File (Optional)	Path to the TSV/CSV file to define datasets grouping. First column - 'library_id' with the values and order that correspond to the 'library_id' column from the '--identity' file, second column 'condition'. Default: each dataset is assigned to its own group.
maximum_genes	Integer[] (Optional)	Include cells with the number of genes not bigger than this value. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 5000 (applied to all datasets)
minimum_genes	Integer[] (Optional)	Include cells where at least this many genes are detected. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 250 (applied to all datasets)
minimum_logfc	Float (Optional)	For putative gene markers identification include only those genes that on average have log fold change difference in expression between every tested pair of clusters not lower than this value. Ignored if '--diffgenes' is not set. Default: 0.25
regress_genes	Boolean (Optional)	Regress genes per cell counts as a confounding source of variation. Default: false
cluster_metric		Distance metric used when constructing nearest-neighbor graph before clustering. Default: euclidean
umap_neighbors	Integer (Optional)	Determines the number of neighboring points used in UMAP. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. Default: 30
cell_cycle_data	File (Optional)	Path to the TSV/CSV file with the information for cell cycle score assignment. First column - 'phase', second column 'gene_id'. If loaded Seurat object already includes cell cycle scores in 'S.Score' and 'G2M.Score' metatada columns they will be removed. Default: skip cell cycle score assignment.
regress_rna_umi	Boolean (Optional)	Regress UMI per cell counts as a confounding source of variation. Default: false
rna_minimum_umi	Integer[] (Optional)	Include cells where at least this many UMI (transcripts) are detected. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 500 (applied to all datasets)
genes_of_interest	String[] (Optional)	Genes of interest to build genes expression plots. Default: None
maximum_mito_perc	Float (Optional)	Include cells with the percentage of transcripts mapped to mitochondrial genes not bigger than this value. Default: 5 (applied to all datasets)
regress_cellcycle	Boolean (Optional)	Regress cell cycle scores as a confounding source of variation. Ignored if --cellcycle is not provided. Default: false
regress_mito_perc	Boolean (Optional)	Regress the percentage of transcripts mapped to mitochondrial genes as a confounding source of variation. Default: false
rna_minimum_cells	Integer (Optional)	Include only genes detected in at least this many cells. Default: 5 (applied to all datasets)
integration_method		Integration method used for joint analysis of multiple datasets. Automatically set to 'none' if loaded Suerat object includes only one dataset. Default: seurat
identify_diff_genes	Boolean (Optional)	Identify differentially expressed genes (putative gene markers) between each pair of clusters for all resolutions. Default: false
vector_memory_limit	Integer (Optional)	Maximum vector memory in GB allowed to be used by R. Default: 128
aggregation_metadata	File (Optional)	Path to the metadata TSV/CSV file to set the datasets identities. If '--mex' points to the Cell Ranger Aggregate outputs, the aggregation.csv file can be used. If input is not provided, the default dummy_metadata.csv will be used instead.
normalization_method		Normalization method applied to genes expression counts. If loaded Seurat object includes multiple datasets, normalization will be run independently for each of them, unless integration is disabled with --ntgr set to 'none' Default: sct
minimum_novelty_score	Float[] (Optional)	Include cells with the novelty score not lower than this value, calculated for as log10(genes)/log10(UMI). If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 0.8 (applied to all datasets)
parallel_memory_limit	Integer (Optional)	Maximum memory in GB allowed to be shared between the workers when using multiple --cpus. Default: 32
highly_var_genes_count	Integer (Optional)	Number of highly variable genes used in datasets integration, scaling and dimensionality reduction. Default: 3000
only_positive_diff_genes	Boolean (Optional)	For putative gene markers identification return only positive markers. Ignored if '--diffgenes' is not set. Default: false
feature_bc_matrices_folder	File	Path to the compressed folder with feature-barcode matrix from Cell Ranger Count/Aggregate experiment in MEX format.

Steps

ID	Runs	Label	Doc
sc_rna_filter	../tools/sc-rna-filter.cwl (CommandLineTool)	Single-cell RNA-Seq Filtering Analysis	Single-cell RNA-Seq Filtering Analysis Filters single-cell RNA-Seq datasets based on the common QC metrics.
sc_rna_reduce	../tools/sc-rna-reduce.cwl (CommandLineTool)	Single-cell RNA-Seq Dimensionality Reduction Analysis	Single-cell RNA-Seq Dimensionality Reduction Analysis Integrates multiple single-cell RNA-Seq datasets, reduces dimensionality using PCA.
sc_rna_cluster	../tools/sc-rna-cluster.cwl (CommandLineTool)	Single-cell RNA-Seq Cluster Analysis	Single-cell RNA-Seq Cluster Analysis Clusters single-cell RNA-Seq datasets, identifies gene markers.
uncompress_feature_bc_matrices	../tools/tar-extract.cwl (CommandLineTool)	TAR extract	TAR extract Extracts the content of TAR file into a folder.

Outputs

ID	Type	Doc
elbow_plot_png	File (Optional)	Elbow plot (from cells PCA). PNG format
seurat_data_rds	File	Processed Seurat data in RDS format
gene_markers_tsv	File (Optional)	Differentially expressed genes between each pair of clusters for all resolutions. TSV format
slh_res_plot_png	File[] (Optional)	Silhouette scores. Downsampled to max 500 cells per cluster. PNG format
ucsc_cb_html_data	Directory (Optional)	Directory with UCSC Cellbrowser html data.
umap_res_plot_png	File[] (Optional)	Clustered cells UMAP. PNG format
qc_dim_corr_plot_png	File (Optional)	Correlation plots between QC metrics and cells PCA components. PNG format
xpr_avg_res_plot_png	File[] (Optional)	Log normalized scaled average gene expression per cluster. PNG format
raw_umi_dnst_plot_png	File (Optional)	UMI per cell density (not filtered). PNG format
umap_spl_umi_plot_png	File (Optional)	Split by the UMI per cell counts cells UMAP. PNG format
xpr_dnst_res_plot_png	File[] (Optional)	Log normalized gene expression density per cluster. PNG format
fltr_umi_dnst_plot_png	File (Optional)	UMI per cell density (filtered). PNG format
raw_gene_dnst_plot_png	File (Optional)	Genes per cell density (not filtered). PNG format
raw_mito_dnst_plot_png	File (Optional)	Percentage of transcripts mapped to mitochondrial genes per cell density (not filtered). PNG format
raw_nvlt_dnst_plot_png	File (Optional)	Novelty score per cell density (not filtered). PNG format
umap_qc_mtrcs_plot_png	File (Optional)	QC metrics on cells UMAP. PNG format
umap_spl_gene_plot_png	File (Optional)	Split by the genes per cell counts cells UMAP. PNG format
umap_spl_mito_plot_png	File (Optional)	Split by the percentage of transcripts mapped to mitochondrial genes cells UMAP. PNG format
fltr_gene_dnst_plot_png	File (Optional)	Genes per cell density (filtered). PNG format
fltr_mito_dnst_plot_png	File (Optional)	Percentage of transcripts mapped to mitochondrial genes per cell density (filtered). PNG format
fltr_nvlt_dnst_plot_png	File (Optional)	Novelty score per cell density (filtered). PNG format
raw_cells_count_plot_png	File (Optional)	Number of cells per dataset (not filtered). PNG format
sc_rna_filter_stderr_log	File	stderr log generated by sc_rna_filter step
sc_rna_filter_stdout_log	File	stdout log generated by sc_rna_filter step
sc_rna_reduce_stderr_log	File	stderr log generated by sc_rna_reduce step
sc_rna_reduce_stdout_log	File	stdout log generated by sc_rna_reduce step
umap_spl_ph_res_plot_png	File[] (Optional)	Split by cell cycle phase clustered cells UMAP. PNG format
fltr_cells_count_plot_png	File (Optional)	Number of cells per dataset (filtered). PNG format
sc_rna_cluster_stderr_log	File	stderr log generated by sc_rna_cluster step
sc_rna_cluster_stdout_log	File	stdout log generated by sc_rna_cluster step
umap_spl_cnd_res_plot_png	File[] (Optional)	Split by grouping condition clustered cells UMAP. PNG format
xpr_per_cell_res_plot_png	File[] (Optional)	Log normalized gene expression on cells UMAP. PNG format
raw_gene_umi_corr_plot_png	File (Optional)	Genes vs UMI per cell correlation (not filtered). PNG format
raw_qc_mtrcs_dnst_plot_png	File (Optional)	QC metrics per cell density (not filtered). PNG format
umap_spl_idnt_res_plot_png	File[] (Optional)	Split by dataset clustered cells UMAP. PNG format
fltr_gene_umi_corr_plot_png	File (Optional)	Genes vs UMI per cell correlation (filtered). PNG format
fltr_qc_mtrcs_dnst_plot_png	File (Optional)	QC metrics per cell density (filtered). PNG format
umap_gr_cnd_spl_ph_plot_png	File (Optional)	Grouped by condition split by cell cycle cells UMAP. PNG format
umap_gr_cnd_spl_umi_plot_png	File (Optional)	Grouped by condition split by the UMI per cell counts cells UMAP. PNG format
raw_1_2_qc_mtrcs_pca_plot_png	File (Optional)	PC1 and PC2 from the QC metrics PCA (not filtered). PNG format
raw_2_3_qc_mtrcs_pca_plot_png	File (Optional)	PC2 and PC3 from the QC metrics PCA (not filtered). PNG format
raw_umi_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition UMI per cell density (not filtered). PNG format
umap_gr_cnd_spl_gene_plot_png	File (Optional)	Grouped by condition split by the genes per cell counts cells UMAP. PNG format
umap_gr_cnd_spl_mito_plot_png	File (Optional)	Grouped by condition split by the percentage of transcripts mapped to mitochondrial genes cells UMAP. PNG format
fltr_1_2_qc_mtrcs_pca_plot_png	File (Optional)	PC1 and PC2 from the QC metrics PCA (filtered). PNG format
fltr_2_3_qc_mtrcs_pca_plot_png	File (Optional)	PC2 and PC3 from the QC metrics PCA (filtered). PNG format
fltr_umi_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition UMI per cell density (filtered). PNG format
raw_gene_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition genes per cell density (not filtered). PNG format
raw_mito_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition the percentage of transcripts mapped to mitochondrial genes per cell density (not filtered). PNG format
raw_nvlt_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition the novelty score per cell density (not filtered). PNG format
cmp_gr_ph_spl_clst_res_plot_png	File[] (Optional)	Grouped by cell cycle phase split by cluster cells composition plot. Downsampled. PNG format
cmp_gr_ph_spl_idnt_res_plot_png	File[] (Optional)	Grouped by cell cycle phase split by dataset cells composition plot. Downsampled. PNG format
fltr_gene_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition genes per cell density (filtered). PNG format
fltr_mito_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition the percentage of transcripts mapped to mitochondrial genes per cell density (filtered). PNG format
fltr_nvlt_dnst_spl_cnd_plot_png	File (Optional)	Split by grouping condition the novelty score per cell density (filtered). PNG format
cmp_gr_clst_spl_cnd_res_plot_png	File[] (Optional)	Grouped by cluster split by condition cells composition plot. Downsampled. PNG format
cmp_gr_cnd_spl_clst_res_plot_png	File[] (Optional)	Grouped by condition split by cluster cells composition plot. Downsampled. PNG format
cmp_gr_clst_spl_idnt_res_plot_png	File[] (Optional)	Grouped by cluster split by dataset cells composition plot. Downsampled. PNG format
cmp_gr_idnt_spl_clst_res_plot_png	File[] (Optional)	Grouped by dataset split by cluster cells composition plot. Downsampled. PNG format

Permalink: https://w3id.org/cwl/view/git/7bbe737324bb6f21f244dea09a926dc2774ed731/workflows/sc-rna-analyze-wf.cwl