Workflow: Seurat Cluster

Fetched 2023-01-09 12:42:26 GMT

Seurat Cluster ============== Runs filtering, integration, and clustering analyses for Cell Ranger Count Gene Expression or Cell Ranger Aggregate experiments.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
no_sct Boolean (Optional) Use LogNormalize instead of SCTransform when integrating datasets

Do not use SCTransform when running datasets integration. Use LogNormalize instead.

species Species for gene name conversion when running cell type prediction

Select species for gene name conversion when running cell type prediction with Garnett classifier. If \"none\" - do not convert gene names

threads Integer (Optional) Threads number to use

Threads number

test_use Statistical test to use for gene markers identification

Statistical test to use for gene markers identification.

resolution String (Optional) Comma or space separated list of clustering resolutions

Comma or space separated list of clustering resolutions

minimum_pct Float (Optional) Include only those genes that are detected in not lower than this fraction of cells in either of the two tested clusters

Include only those genes that are detected in not lower than this fraction of cells in either of the two tested clusters.

umap_method UMAP implementation to run

UMAP implementation to run.

umap_metric The metric to use to compute distances in high dimensional space for UMAP

The metric to use to compute distances in high dimensional space for UMAP.

umap_spread Float (Optional) Effective scale of embedded points on UMAP. Determines how clustered/clumped the embedded points are.

The effective scale of embedded points on UMAP. In combination with mindist this determines how clustered/clumped the embedded points are.

minimum_umis String (Optional) Include cells where at least this many UMIs are detected

Include cells where at least this many UMIs are detected. If multiple values provided each of them will be applied to the correspondent dataset.

mito_pattern String (Optional) Pattern to identify mitochondrial genes

Pattern to identify mitochondrial genes.

umap_mindist Float (Optional) Controls how tightly the embedding is allowed compress points together on UMAP. Sensible values are in the range 0.001 to 0.5

Controls how tightly the embedding is allowed compress points together on UMAP. Larger values ensure embedded points are moreevenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5.

barcodes_data File (Optional) Headerless TSV/CSV file with cell barcodes (one barcode per line) to prefilter input data

Path to the headerless TSV/CSV file with selected barcodes (one per line) to prefilter input feature-barcode matrices. If not provided, use all cells

minimum_cells Integer (Optional) Include genes detected in at least this many cells

Include genes detected in at least this many cells (applied to thoughout all datasets together).

minimum_logfc Float (Optional) Include only those genes that on average have log fold change difference in expression between every tested pair of clusters not lower than this value

Include only those genes that on average have log fold change difference in expression between every tested pair of clusters not lower than this value.

classifier_rds File (Optional) Garnett classifier rds file for cell type prediction

Path to the Garnett classifier rds file for cell type prediction. If not provided, skip cell type prediction

cluster_metric Distance metric used by the nearest neighbors algorithm when running clustering

Distance metric used by the nearest neighbors algorithm when running clustering.

dimensionality Integer (Optional) Number of principal components to use in UMAP projection and clustering (from 1 to 50)

Number of principal components to use in UMAP projection and clustering (from 1 to 50). Use Elbow plot to adjust this parameter.

cell_cycle_data File (Optional) TSV/CSV file with cell cycle data with 'phase' and 'gene_id' columns

TSV/CSV file with cell cycle data. First column - 'phase', second column 'gene_id'. If not provided, skip cell cycle score assignment

conditions_data File (Optional) TSV/CSV file to define datasets conditions with 'library_id' and 'condition' columns. Rows order should correspond to the aggregation metadata.

Path to the TSV/CSV file to define datasets grouping. First column - 'library_id' with the values provided in the same order as in the correspondent column of the --identity file, second column 'condition'. If not provided, each dataset is assigned to its own biological condition

umap_nneighbors Integer (Optional) Number of neighboring points used in UMAP. Larger values result in loss of detailed local structure.

Determines the number of neighboring points used in UMAP. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50.

maximum_features String (Optional) Include cells with the number of genes not bigger than this value

Include cells with the number of genes not bigger than this value. If multiple values provided each of them will be applied to the correspondent dataset.

minimum_features String (Optional) Include cells where at least this many genes are detected

Include cells where at least this many genes are detected. If multiple values provided each of them will be applied to the correspondent dataset.

maximum_mito_perc Float (Optional) Include cells with the percentage of transcripts mapped to mitochondrial genes not bigger than this value

Include cells with the percentage of transcripts mapped to mitochondrial genes not bigger than this value.

regress_cellcycle Boolean (Optional) Regress cell cycle as a confounding source of variation

Regress cell cycle as a confounding source of variation.

regress_mito_perc Boolean (Optional) Regress mitochondrial gene expression as a confounding source of variation

Regress mitochondrial gene expression as a confounding source of variation.

selected_features String (Optional) Comma or space separated list of genes of interest

Comma or space separated list of genes of interest. Default: do not highlight any features

aggregation_metadata File scRNA-Seq Cellranger Aggregate Experiment

Aggregation metadata in CSV format

minimum_novelty_score String (Optional) Include cells with the novelty score (the ratio of genes per cell over UMIs per cell) not lower than this value

Include cells with the novelty score (the ratio of genes per cell over UMIs per cell) not lower than this value (calculated as log10(genes)/log10(UMIs)). If multiple values provided each of them will be applied to the correspondent dataset.

only_positive_markers Boolean (Optional) Report only positive gene markers

Report only positive gene markers.

high_var_features_count Integer (Optional) Number of highly variable genes to detect (used for dataset integration and dimensional reduction)

Number of highly variable genes to detect (used for dataset integration and dimensional reduction).

filtered_feature_bc_matrix_folder File scRNA-Seq Cellranger Aggregate Experiment

Compressed folder with aggregated filtered feature-barcode matrices in MEX format

Steps

ID Runs Label Doc
seurat_cluster
../tools/seurat-cluster.cwl (CommandLineTool)
Seurat cluster

Seurat cluster ==============

The joint analysis of multiple scRNA-Seq datasets with [Seurat](https://satijalab.org/seurat/) starts with evaluation of common single-cell quality control (QC) metrics – genes and UMIs counts, percentage of mitochondrial genes expressed. QC allows to get a general overview of the datasets quality as well as to define filtering thresholds for dead or low-quality cells removal. Filtered merged datasets are then being processed with the integration algorithm. Its main goal is to identify integration anchors – pairs of cells that can “pull together” the same cell type populations from the different datasets. An integration algorithm can also solve batch correction problem by regressing out the unwanted sources of variation. The integrated data then undergo the dimensionality reduction processing that starts from the principal component analysis (PCA). Based on the PCA results the uniform manifold approximation and projection (UMAP) and clustering analysis are run with the principal components of the highest variance. Clustered data are then used for gene markers identification. These genes are differentially expressed between clusters and can be used for cell types assignment. More details about scRNA-Seq integration analysis with Seurat can be found in the official [documentation](https://satijalab.org/seurat/articles/integration_introduction.html).

uncompress_feature_bc_matrices
seurat-cluster.cwl#uncompress_feature_bc_matrices/e1eb561b-f42c-4dae-b083-bd589b529e92 (CommandLineTool)
compress_cellbrowser_config_data
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

Outputs

ID Type Label Doc
ntgr_pca_plot_pdf File (Optional) PCA of filtered integrated/scaled datasets

PCA of filtered integrated/scaled datasets. PDF format

ntgr_pca_plot_png File (Optional) PCA of filtered integrated/scaled datasets

PCA of filtered integrated/scaled datasets. PNG format

ntgr_elbow_plot_pdf File (Optional) Elbow plot from PCA of filtered integrated/scaled datasets

Elbow plot from PCA of filtered integrated/scaled datasets. PDF format

ntgr_elbow_plot_png File (Optional) Elbow plot from PCA of filtered integrated/scaled datasets

Elbow plot from PCA of filtered integrated/scaled datasets. PNG format

ntgr_pca_heatmap_pdf File (Optional) Genes per cells expression heatmap sorted by their PC scores from PCA of filtered integrated/scaled datasets

Genes per cells expression heatmap sorted by their PC scores from PCA of filtered integrated/scaled datasets. PDF format

ntgr_pca_heatmap_png File (Optional) Genes per cells expression heatmap sorted by their PC scores from PCA of filtered integrated/scaled datasets

Genes per cells expression heatmap sorted by their PC scores from PCA of filtered integrated/scaled datasets. PNG format

seurat_clst_data_rds File Clustered filtered integrated/scaled Seurat data

Clustered filtered integrated Seurat data. RDS format

cellbrowser_html_data Directory Directory with UCSC Cellbrowser formatted html data

Directory with UCSC Cellbrowser formatted html data

cellbrowser_html_file File Open in UCSC Cell Browser

HTML index file from the directory with UCSC Cellbrowser formatted html data

raw_qc_mtrcs_plot_pdf File (Optional) QC metrics densities per cell (not filtered)

QC metrics densities per cell (not filtered). PDF format

raw_qc_mtrcs_plot_png File (Optional) QC metrics densities per cell (not filtered)

QC metrics densities per cell (not filtered). PNG format

clst_pttv_gene_markers File Putative gene markers file for all clusters and all resolutions

Putative gene markers file for all clusters and all resolutions. TSV format

clst_umap_res_plot_pdf File[] (Optional) Clustered UMAP projected PCA of filtered integrated/scaled datasets

Clustered UMAP projected PCA of filtered integrated/scaled datasets. PDF format

clst_umap_res_plot_png File[] (Optional) Clustered UMAP projected PCA of filtered integrated/scaled datasets

Clustered UMAP projected PCA of filtered integrated/scaled datasets. PNG format

fltr_qc_mtrcs_plot_pdf File (Optional) QC metrics densities per cell (filtered)

QC metrics densities per cell (filtered). PDF format

fltr_qc_mtrcs_plot_png File (Optional) QC metrics densities per cell (filtered)

QC metrics densities per cell (filtered). PNG format

clst_csrvd_gene_markers File Conserved gene markers file for all clusters and all resolutions

Conserved gene markers file for all clusters and all resolutions. TSV format

raw_cell_count_plot_pdf File (Optional) Number of cells per dataset (not filtered)

Number of cells per dataset (not filtered). PDF format

raw_cell_count_plot_png File (Optional) Number of cells per dataset (not filtered)

Number of cells per dataset (not filtered). PNG format

fltr_cell_count_plot_pdf File (Optional) Number of cells per dataset (filtered)

Number of cells per dataset (filtered). PDF format

fltr_cell_count_plot_png File (Optional) Number of cells per dataset (filtered)

Number of cells per dataset (filtered). PNG format

seurat_cluster_stderr_log File stderr log generated by Seurat

stderr log generated by Seurat

seurat_cluster_stdout_log File stdout log generated by Seurat

stdout log generated by Seurat

clst_qc_mtrcs_res_plot_pdf File[] (Optional) QC metrics for clustered UMAP projected PCA of filtered integrated/scaled datasets

QC metrics for clustered UMAP projected PCA of filtered integrated/scaled datasets. PDF format

clst_qc_mtrcs_res_plot_png File[] (Optional) QC metrics for clustered UMAP projected PCA of filtered integrated/scaled datasets

QC metrics for clustered UMAP projected PCA of filtered integrated/scaled datasets. PNG format

ntgr_pca_loadings_plot_pdf File (Optional) PC scores of the most variant genes from PCA of filtered integrated/scaled datasets

PC scores of the most variant genes from PCA of filtered integrated/scaled datasets. PDF format

ntgr_pca_loadings_plot_png File (Optional) PC scores of the most variant genes from PCA of filtered integrated/scaled datasets

PC scores of the most variant genes from PCA of filtered integrated/scaled datasets. PNG format

fltr_pca_spl_by_ph_plot_pdf File (Optional) Split by cell cycle phase PCA of filtered unintegrated/scaled datasets

Split by cell cycle phase PCA of filtered unintegrated/scaled datasets. PDF format

fltr_pca_spl_by_ph_plot_png File (Optional) Split by cell cycle phase PCA of filtered unintegrated/scaled datasets

Split by cell cycle phase PCA of filtered unintegrated/scaled datasets. PNG format

clst_umap_ctype_res_plot_pdf File[] (Optional) Grouped by predicted cell types UMAP projected PCA of filtered integrated/scaled datasets

Grouped by predicted cell types UMAP projected PCA of filtered integrated/scaled datasets. PDF format

clst_umap_ctype_res_plot_png File[] (Optional) Grouped by predicted cell types UMAP projected PCA of filtered integrated/scaled datasets

Grouped by predicted cell types UMAP projected PCA of filtered integrated/scaled datasets. PNG format

expr_avg_per_clst_res_plot_pdf File[] (Optional) Scaled average log normalized gene expression per cluster of filtered integrated/scaled datasets

Scaled average log normalized gene expression per cluster of filtered integrated/scaled datasets. PDF format

expr_avg_per_clst_res_plot_png File[] (Optional) Scaled average log normalized gene expression per cluster of filtered integrated/scaled datasets

Scaled average log normalized gene expression per cluster of filtered integrated/scaled datasets. PNG format

expr_clst_heatmap_res_plot_pdf File[] (Optional) Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets

Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets. PDF format

expr_clst_heatmap_res_plot_png File[] (Optional) Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets

Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets. PNG format

fltr_umap_spl_by_idnt_plot_pdf File (Optional) Split by identity UMAP projected PCA of filtered unintegrated/scaled datasets

Split by identity UMAP projected PCA of filtered unintegrated/scaled datasets. PDF format

fltr_umap_spl_by_idnt_plot_png File (Optional) Split by identity UMAP projected PCA of filtered unintegrated/scaled datasets

Split by identity UMAP projected PCA of filtered unintegrated/scaled datasets. PNG format

ntgr_umap_spl_by_idnt_plot_pdf File (Optional) Split by identity UMAP projected PCA of filtered integrated/scaled datasets

Split by identity UMAP projected PCA of filtered integrated/scaled datasets. PDF format

ntgr_umap_spl_by_idnt_plot_png File (Optional) Split by identity UMAP projected PCA of filtered integrated/scaled datasets

Split by identity UMAP projected PCA of filtered integrated/scaled datasets. PNG format

expr_avg_per_ctype_res_plot_pdf File[] (Optional) Scaled average log normalized gene expression per predicted cell type of filtered integrated/scaled datasets

Scaled average log normalized gene expression per predicted cell type of filtered integrated/scaled datasets. PDF format

expr_avg_per_ctype_res_plot_png File[] (Optional) Scaled average log normalized gene expression per predicted cell type of filtered integrated/scaled datasets

Scaled average log normalized gene expression per predicted cell type of filtered integrated/scaled datasets. PNG format

expr_ctype_heatmap_res_plot_pdf File[] (Optional) Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets with predicted cell types

Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets with predicted cell types. PDF format

expr_ctype_heatmap_res_plot_png File[] (Optional) Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets with predicted cell types

Log normalized gene expression heatmap of clustered filtered integrated/scaled datasets with predicted cell types. PNG format

expr_dnst_per_clst_res_plot_pdf File[] (Optional) Log normalized gene expression densities per cluster of filtered integrated/scaled datasets

Log normalized gene expression densities per cluster of filtered integrated/scaled datasets. PDF format

expr_dnst_per_clst_res_plot_png File[] (Optional) Log normalized gene expression densities per cluster of filtered integrated/scaled datasets

Log normalized gene expression densities per cluster of filtered integrated/scaled datasets. PNG format

expr_per_clst_cell_res_plot_pdf File[] (Optional) Log normalized gene expression per cell of clustered filtered integrated/scaled datasets

Log normalized gene expression per cell of clustered filtered integrated/scaled datasets. PDF format

expr_per_clst_cell_res_plot_png File[] (Optional) Log normalized gene expression per cell of clustered filtered integrated/scaled datasets

Log normalized gene expression per cell of clustered filtered integrated/scaled datasets. PNG format

clst_umap_spl_by_ph_res_plot_pdf File[] (Optional) Split by cell cycle phase clustered UMAP projected PCA of filtered integrated/scaled datasets

Split by cell cycle phase clustered UMAP projected PCA of filtered integrated/scaled datasets. PDF format

clst_umap_spl_by_ph_res_plot_png File[] (Optional) Split by cell cycle phase clustered UMAP projected PCA of filtered integrated/scaled datasets

Split by cell cycle phase clustered UMAP projected PCA of filtered integrated/scaled datasets. PNG format

expr_dnst_per_ctype_res_plot_pdf File[] (Optional) Log normalized gene expression densities per predicted cell type of filtered integrated/scaled datasets

Log normalized gene expression densities per predicted cell type of filtered integrated/scaled datasets. PDF format

expr_dnst_per_ctype_res_plot_png File[] (Optional) Log normalized gene expression densities per predicted cell type of filtered integrated/scaled datasets

Log normalized gene expression densities per predicted cell type of filtered integrated/scaled datasets. PNG format

expr_per_ctype_cell_res_plot_pdf File[] (Optional) Log normalized gene expression per cell of clustered filtered integrated/scaled datasets with predicted cell types

Log normalized gene expression per cell of clustered filtered integrated/scaled datasets with predicted cell types. PDF format

expr_per_ctype_cell_res_plot_png File[] (Optional) Log normalized gene expression per cell of clustered filtered integrated/scaled datasets with predicted cell types

Log normalized gene expression per cell of clustered filtered/scaled integrated datasets with predicted cell types. PNG format

raw_qc_mtrcs_gr_by_cond_plot_pdf File (Optional) Grouped by condition QC metrics densities per cell (not filtered)

Grouped by condition QC metrics densities per cell (not filtered). PDF format

raw_qc_mtrcs_gr_by_cond_plot_png File (Optional) Grouped by condition QC metrics densities per cell (not filtered)

Grouped by condition QC metrics densities per cell (not filtered). PNG format

fltr_qc_mtrcs_gr_by_cond_plot_pdf File (Optional) Grouped by condition QC metrics densities per cell (filtered)

Grouped by condition QC metrics densities per cell (filtered). PDF format

fltr_qc_mtrcs_gr_by_cond_plot_png File (Optional) Grouped by condition QC metrics densities per cell (filtered)

Grouped by condition QC metrics densities per cell (filtered). PDF format

raw_umi_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition UMI density per cell (not filtered)

Split by condition UMI density per cell (not filtered). PDF format

raw_umi_dnst_spl_by_cond_plot_png File (Optional) Split by condition UMI density per cell (not filtered)

Split by condition UMI density per cell (not filtered). PNG format

clst_umap_spl_by_cond_res_plot_pdf File[] (Optional) Split by condition clustered UMAP projected PCA of filtered integrated/scaled datasets

Split by condition clustered UMAP projected PCA of filtered integrated/scaled datasets. PDF format

clst_umap_spl_by_cond_res_plot_png File[] (Optional) Split by condition clustered UMAP projected PCA of filtered integrated/scaled datasets

Split by condition clustered UMAP projected PCA of filtered integrated/scaled datasets. PNG format

compressed_cellbrowser_config_data File Compressed directory with UCSC Cellbrowser configuration data

Compressed directory with UCSC Cellbrowser configuration data

fltr_pca_spl_by_mito_perc_plot_pdf File (Optional) Split by level of transcripts mapped to mitochondrial genes PCA of filtered unintegrated/scaled datasets

Split by level of transcripts mapped to mitochondrial genes PCA of filtered unintegrated/scaled datasets. PDF format

fltr_pca_spl_by_mito_perc_plot_png File (Optional) Split by level of transcripts mapped to mitochondrial genes PCA of filtered unintegrated/scaled datasets

Split by level of transcripts mapped to mitochondrial genes PCA of filtered unintegrated/scaled datasets. PNG format

fltr_umi_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition UMI density per cell (filtered)

Split by condition UMI density per cell (filtered). PDF format

fltr_umi_dnst_spl_by_cond_plot_png File (Optional) Split by condition UMI density per cell (filtered)

Split by condition UMI density per cell (filtered). PNG format

raw_gene_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition gene density per cell (not filtered)

Split by condition gene density per cell (not filtered). PDF format

raw_gene_dnst_spl_by_cond_plot_png File (Optional) Split by condition gene density per cell (not filtered)

Split by condition gene density per cell (not filtered). PNG format

fltr_gene_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition gene density per cell (filtered)

Split by condition gene density per cell (filtered). PDF format

fltr_gene_dnst_spl_by_cond_plot_png File (Optional) Split by condition gene density per cell (filtered)

Split by condition gene density per cell (filtered). PNG format

raw_gene_umi_corr_spl_by_ident_plot_pdf File (Optional) Split by identity genes vs UMIs per cell correlation (not filtered)

Split by identity genes vs UMIs per cell correlation (not filtered). PDF format

raw_gene_umi_corr_spl_by_ident_plot_png File (Optional) Split by identity genes vs UMIs per cell correlation (not filtered)

Split by identity genes vs UMIs per cell correlation (not filtered). PNG format

raw_mito_perc_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition density of transcripts mapped to mitochondrial genes per cell (not filtered)

Split by condition density of transcripts mapped to mitochondrial genes per cell (not filtered). PDF format

raw_mito_perc_dnst_spl_by_cond_plot_png File (Optional) Split by condition density of transcripts mapped to mitochondrial genes per cell (not filtered)

Split by condition density of transcripts mapped to mitochondrial genes per cell (not filtered). PNG format

fltr_gene_umi_corr_spl_by_ident_plot_pdf File (Optional) Split by identity genes vs UMIs per cell correlation (filtered)

Split by identity genes vs UMIs per cell correlation (filtered). PDF format

fltr_gene_umi_corr_spl_by_ident_plot_png File (Optional) Split by identity genes vs UMIs per cell correlation (filtered)

Split by identity genes vs UMIs per cell correlation (filtered). PNG format

fltr_mito_perc_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition density of transcripts mapped to mitochondrial genes per cell (filtered)

Split by condition density of transcripts mapped to mitochondrial genes per cell (filtered). PDF format

fltr_mito_perc_dnst_spl_by_cond_plot_png File (Optional) Split by condition density of transcripts mapped to mitochondrial genes per cell (filtered)

Split by condition density of transcripts mapped to mitochondrial genes per cell (filtered). PNG format

raw_nvlt_score_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition novelty score density per cell (not filtered)

Split by condition novelty score density per cell (not filtered). PDF format

raw_nvlt_score_dnst_spl_by_cond_plot_png File (Optional) Split by condition novelty score density per cell (not filtered)

Split by condition novelty score density per cell (not filtered). PNG format

fltr_nvlt_score_dnst_spl_by_cond_plot_pdf File (Optional) Split by condition novelty score density per cell (filtered)

Split by condition novelty score density per cell (filtered). PDF format

fltr_nvlt_score_dnst_spl_by_cond_plot_png File (Optional) Split by condition novelty score density per cell (filtered)

Split by condition novelty score density per cell (filtered). PNG format

Permalink: https://w3id.org/cwl/view/git/480e99a4bb3046e0565113d9dca294e0895d3b0c/workflows/seurat-cluster.cwl