Workflow: Single-cell RNA-Seq Analyze
Single-cell RNA-Seq Analyze Runs filtering, normalization, scaling, integration (optionally) and clustering for a single or aggregated single-cell RNA-Seq datasets.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
threads | Integer (Optional) |
Number of cores/cpus to use. Default: 1 |
|
dimensions | Integer[] (Optional) |
Dimensionality to use in UMAP projection and when constructing nearest-neighbor graph before clustering (from 1 to 50). If single value N is provided, use from 1 to N dimensions. If multiple values are provided, subset to only selected dimensions. Default: from 1 to 10 |
|
resolution | Float[] (Optional) |
Clustering resolution applied to the constructed nearest-neighbor graph. Can be set as an array. Default: 0.3, 0.5, 1.0 |
|
minimum_pct | Float (Optional) |
For putative gene markers identification include only those genes that are detected in not lower than this fraction of cells in either of the two tested clusters. Ignored if '--diffgenes' is not set. Default: 0.1 |
|
test_to_use |
Statistical test to use for putative gene markers identification. Ignored if '--diffgenes' is not set. Default: wilcox |
||
umap_method |
UMAP implementation to run. If set to 'umap-learn' use --umetric 'correlation' Default: uwot |
||
umap_metric |
The metric to use to compute distances in high dimensional space for UMAP. Default: cosine |
||
umap_spread | Float (Optional) |
The effective scale of embedded points on UMAP. In combination with '--mindist' it determines how clustered/clumped the embedded points are. Default: 1 |
|
mito_pattern | String (Optional) |
Regex pattern to identify mitochondrial genes. Default: '^Mt-' |
|
umap_mindist | Float (Optional) |
Controls how tightly the embedding is allowed compress points together on UMAP. Larger values ensure embedded points are moreevenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5. Default: 0.3 |
|
barcodes_data | File (Optional) |
Path to the headerless TSV/CSV file with the list of barcodes to select cells of interest (one barcode per line). Prefilters input feature-barcode matrix to include only selected cells. Default: use all cells. |
|
grouping_data | File (Optional) |
Path to the TSV/CSV file to define datasets grouping. First column - 'library_id' with the values and order that correspond to the 'library_id' column from the '--identity' file, second column 'condition'. Default: each dataset is assigned to its own group. |
|
maximum_genes | Integer[] (Optional) |
Include cells with the number of genes not bigger than this value. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 5000 (applied to all datasets) |
|
minimum_genes | Integer[] (Optional) |
Include cells where at least this many genes are detected. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 250 (applied to all datasets) |
|
minimum_logfc | Float (Optional) |
For putative gene markers identification include only those genes that on average have log fold change difference in expression between every tested pair of clusters not lower than this value. Ignored if '--diffgenes' is not set. Default: 0.25 |
|
regress_genes | Boolean (Optional) |
Regress genes per cell counts as a confounding source of variation. Default: false |
|
cluster_metric |
Distance metric used when constructing nearest-neighbor graph before clustering. Default: euclidean |
||
umap_neighbors | Integer (Optional) |
Determines the number of neighboring points used in UMAP. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. Default: 30 |
|
cell_cycle_data | File (Optional) |
Path to the TSV/CSV file with the information for cell cycle score assignment. First column - 'phase', second column 'gene_id'. If loaded Seurat object already includes cell cycle scores in 'S.Score' and 'G2M.Score' metatada columns they will be removed. Default: skip cell cycle score assignment. |
|
regress_rna_umi | Boolean (Optional) |
Regress UMI per cell counts as a confounding source of variation. Default: false |
|
rna_minimum_umi | Integer[] (Optional) |
Include cells where at least this many UMI (transcripts) are detected. If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 500 (applied to all datasets) |
|
genes_of_interest | String[] (Optional) |
Genes of interest to build genes expression plots. Default: None |
|
maximum_mito_perc | Float (Optional) |
Include cells with the percentage of transcripts mapped to mitochondrial genes not bigger than this value. Default: 5 (applied to all datasets) |
|
regress_cellcycle | Boolean (Optional) |
Regress cell cycle scores as a confounding source of variation. Ignored if --cellcycle is not provided. Default: false |
|
regress_mito_perc | Boolean (Optional) |
Regress the percentage of transcripts mapped to mitochondrial genes as a confounding source of variation. Default: false |
|
rna_minimum_cells | Integer (Optional) |
Include only genes detected in at least this many cells. Default: 5 (applied to all datasets) |
|
integration_method |
Integration method used for joint analysis of multiple datasets. Automatically set to 'none' if loaded Suerat object includes only one dataset. Default: seurat |
||
identify_diff_genes | Boolean (Optional) |
Identify differentially expressed genes (putative gene markers) between each pair of clusters for all resolutions. Default: false |
|
vector_memory_limit | Integer (Optional) |
Maximum vector memory in GB allowed to be used by R. Default: 128 |
|
aggregation_metadata | File (Optional) |
Path to the metadata TSV/CSV file to set the datasets identities. If '--mex' points to the Cell Ranger Aggregate outputs, the aggregation.csv file can be used. If input is not provided, the default dummy_metadata.csv will be used instead. |
|
normalization_method |
Normalization method applied to genes expression counts. If loaded Seurat object includes multiple datasets, normalization will be run independently for each of them, unless integration is disabled with --ntgr set to 'none' Default: sct |
||
minimum_novelty_score | Float[] (Optional) |
Include cells with the novelty score not lower than this value, calculated for as log10(genes)/log10(UMI). If multiple values provided, each of them will be applied to the correspondent dataset from the '--mex' input based on the '--identity' file. Default: 0.8 (applied to all datasets) |
|
parallel_memory_limit | Integer (Optional) |
Maximum memory in GB allowed to be shared between the workers when using multiple --cpus. Default: 32 |
|
highly_var_genes_count | Integer (Optional) |
Number of highly variable genes used in datasets integration, scaling and dimensionality reduction. Default: 3000 |
|
only_positive_diff_genes | Boolean (Optional) |
For putative gene markers identification return only positive markers. Ignored if '--diffgenes' is not set. Default: false |
|
feature_bc_matrices_folder | File |
Path to the compressed folder with feature-barcode matrix from Cell Ranger Count/Aggregate experiment in MEX format. |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
sc_rna_filter |
../tools/sc-rna-filter.cwl
(CommandLineTool)
|
Single-cell RNA-Seq Filtering Analysis |
Single-cell RNA-Seq Filtering Analysis |
sc_rna_reduce |
../tools/sc-rna-reduce.cwl
(CommandLineTool)
|
Single-cell RNA-Seq Dimensionality Reduction Analysis |
Single-cell RNA-Seq Dimensionality Reduction Analysis |
sc_rna_cluster |
../tools/sc-rna-cluster.cwl
(CommandLineTool)
|
Single-cell RNA-Seq Cluster Analysis |
Single-cell RNA-Seq Cluster Analysis |
uncompress_feature_bc_matrices |
../tools/tar-extract.cwl
(CommandLineTool)
|
TAR extract |
TAR extract |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
elbow_plot_png | File (Optional) |
Elbow plot (from cells PCA). PNG format |
|
seurat_data_rds | File |
Processed Seurat data in RDS format |
|
gene_markers_tsv | File (Optional) |
Differentially expressed genes between each pair of clusters for all resolutions. TSV format |
|
slh_res_plot_png | File[] (Optional) |
Silhouette scores. Downsampled to max 500 cells per cluster. PNG format |
|
ucsc_cb_html_data | Directory (Optional) |
Directory with UCSC Cellbrowser html data. |
|
umap_res_plot_png | File[] (Optional) |
Clustered cells UMAP. PNG format |
|
qc_dim_corr_plot_png | File (Optional) |
Correlation plots between QC metrics and cells PCA components. PNG format |
|
xpr_avg_res_plot_png | File[] (Optional) |
Log normalized scaled average gene expression per cluster. PNG format |
|
raw_umi_dnst_plot_png | File (Optional) |
UMI per cell density (not filtered). PNG format |
|
umap_spl_umi_plot_png | File (Optional) |
Split by the UMI per cell counts cells UMAP. PNG format |
|
xpr_dnst_res_plot_png | File[] (Optional) |
Log normalized gene expression density per cluster. PNG format |
|
fltr_umi_dnst_plot_png | File (Optional) |
UMI per cell density (filtered). PNG format |
|
raw_gene_dnst_plot_png | File (Optional) |
Genes per cell density (not filtered). PNG format |
|
raw_mito_dnst_plot_png | File (Optional) |
Percentage of transcripts mapped to mitochondrial genes per cell density (not filtered). PNG format |
|
raw_nvlt_dnst_plot_png | File (Optional) |
Novelty score per cell density (not filtered). PNG format |
|
umap_qc_mtrcs_plot_png | File (Optional) |
QC metrics on cells UMAP. PNG format |
|
umap_spl_gene_plot_png | File (Optional) |
Split by the genes per cell counts cells UMAP. PNG format |
|
umap_spl_mito_plot_png | File (Optional) |
Split by the percentage of transcripts mapped to mitochondrial genes cells UMAP. PNG format |
|
fltr_gene_dnst_plot_png | File (Optional) |
Genes per cell density (filtered). PNG format |
|
fltr_mito_dnst_plot_png | File (Optional) |
Percentage of transcripts mapped to mitochondrial genes per cell density (filtered). PNG format |
|
fltr_nvlt_dnst_plot_png | File (Optional) |
Novelty score per cell density (filtered). PNG format |
|
raw_cells_count_plot_png | File (Optional) |
Number of cells per dataset (not filtered). PNG format |
|
sc_rna_filter_stderr_log | File |
stderr log generated by sc_rna_filter step |
|
sc_rna_filter_stdout_log | File |
stdout log generated by sc_rna_filter step |
|
sc_rna_reduce_stderr_log | File |
stderr log generated by sc_rna_reduce step |
|
sc_rna_reduce_stdout_log | File |
stdout log generated by sc_rna_reduce step |
|
umap_spl_ph_res_plot_png | File[] (Optional) |
Split by cell cycle phase clustered cells UMAP. PNG format |
|
fltr_cells_count_plot_png | File (Optional) |
Number of cells per dataset (filtered). PNG format |
|
sc_rna_cluster_stderr_log | File |
stderr log generated by sc_rna_cluster step |
|
sc_rna_cluster_stdout_log | File |
stdout log generated by sc_rna_cluster step |
|
umap_spl_cnd_res_plot_png | File[] (Optional) |
Split by grouping condition clustered cells UMAP. PNG format |
|
xpr_per_cell_res_plot_png | File[] (Optional) |
Log normalized gene expression on cells UMAP. PNG format |
|
raw_gene_umi_corr_plot_png | File (Optional) |
Genes vs UMI per cell correlation (not filtered). PNG format |
|
raw_qc_mtrcs_dnst_plot_png | File (Optional) |
QC metrics per cell density (not filtered). PNG format |
|
umap_spl_idnt_res_plot_png | File[] (Optional) |
Split by dataset clustered cells UMAP. PNG format |
|
fltr_gene_umi_corr_plot_png | File (Optional) |
Genes vs UMI per cell correlation (filtered). PNG format |
|
fltr_qc_mtrcs_dnst_plot_png | File (Optional) |
QC metrics per cell density (filtered). PNG format |
|
umap_gr_cnd_spl_ph_plot_png | File (Optional) |
Grouped by condition split by cell cycle cells UMAP. PNG format |
|
umap_gr_cnd_spl_umi_plot_png | File (Optional) |
Grouped by condition split by the UMI per cell counts cells UMAP. PNG format |
|
raw_1_2_qc_mtrcs_pca_plot_png | File (Optional) |
PC1 and PC2 from the QC metrics PCA (not filtered). PNG format |
|
raw_2_3_qc_mtrcs_pca_plot_png | File (Optional) |
PC2 and PC3 from the QC metrics PCA (not filtered). PNG format |
|
raw_umi_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition UMI per cell density (not filtered). PNG format |
|
umap_gr_cnd_spl_gene_plot_png | File (Optional) |
Grouped by condition split by the genes per cell counts cells UMAP. PNG format |
|
umap_gr_cnd_spl_mito_plot_png | File (Optional) |
Grouped by condition split by the percentage of transcripts mapped to mitochondrial genes cells UMAP. PNG format |
|
fltr_1_2_qc_mtrcs_pca_plot_png | File (Optional) |
PC1 and PC2 from the QC metrics PCA (filtered). PNG format |
|
fltr_2_3_qc_mtrcs_pca_plot_png | File (Optional) |
PC2 and PC3 from the QC metrics PCA (filtered). PNG format |
|
fltr_umi_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition UMI per cell density (filtered). PNG format |
|
raw_gene_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition genes per cell density (not filtered). PNG format |
|
raw_mito_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition the percentage of transcripts mapped to mitochondrial genes per cell density (not filtered). PNG format |
|
raw_nvlt_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition the novelty score per cell density (not filtered). PNG format |
|
cmp_gr_ph_spl_clst_res_plot_png | File[] (Optional) |
Grouped by cell cycle phase split by cluster cells composition plot. Downsampled. PNG format |
|
cmp_gr_ph_spl_idnt_res_plot_png | File[] (Optional) |
Grouped by cell cycle phase split by dataset cells composition plot. Downsampled. PNG format |
|
fltr_gene_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition genes per cell density (filtered). PNG format |
|
fltr_mito_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition the percentage of transcripts mapped to mitochondrial genes per cell density (filtered). PNG format |
|
fltr_nvlt_dnst_spl_cnd_plot_png | File (Optional) |
Split by grouping condition the novelty score per cell density (filtered). PNG format |
|
cmp_gr_clst_spl_cnd_res_plot_png | File[] (Optional) |
Grouped by cluster split by condition cells composition plot. Downsampled. PNG format |
|
cmp_gr_cnd_spl_clst_res_plot_png | File[] (Optional) |
Grouped by condition split by cluster cells composition plot. Downsampled. PNG format |
|
cmp_gr_clst_spl_idnt_res_plot_png | File[] (Optional) |
Grouped by cluster split by dataset cells composition plot. Downsampled. PNG format |
|
cmp_gr_idnt_spl_clst_res_plot_png | File[] (Optional) |
Grouped by dataset split by cluster cells composition plot. Downsampled. PNG format |
https://w3id.org/cwl/view/git/7bbe737324bb6f21f244dea09a926dc2774ed731/workflows/sc-rna-analyze-wf.cwl