Workflow: Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix

Fetched 2023-01-03 19:44:57 GMT

Devel version of Single-Cell Cell Ranger Reanalyze ================================================== Workflow calls \"cellranger aggr\" command to rerun secondary analysis performed on the feature-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings. As an input we use filtered feature-barcode matrices in HDF5 format from cellranger count or aggr experiments. Note, we don't pass aggregation_metadata from the upstream cellranger aggr step. Need to address this issue when needed.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
cbc_knn Integer (Optional) Specify the number of nearest neighbors used to identify mutual nearest neighbors. Setting this too high will increase runtime

Specify the number of nearest neighbors used to identify mutual nearest neighbors. Setting this too high will increase runtime and may cause out of memory error. See Chemistry Batch Correction page for more details. Ranges from 5 to 20. Default: 10

threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

cbc_alpha Float (Optional) Specify the threshold of the percentage of matched cells between two batches, which is used to determine if the batch pair will be merged

Specify the threshold of the percentage of matched cells between two batches, which is used to determine if the batch pair will be merged. See Chemistry Batch Correction page for more details. Ranges from 0.05 to 0.5. Default: 0.1

cbc_sigma Float (Optional) Specify the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell

Specify the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell. See Chemistry Batch Correction page for more details. Ranges from 10 to 500. Default: 150

neighbor_a Float (Optional) neighbor_a parameter for the number of nearest neighbors k = neighbor_a + neighbor_b * log10(n_cells)

The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b * log10(n_cells). The actual number of neighbors used is the maximum of this value and graphclust_neighbors. Determines how clustering granularity scales with cell count. Default: -230.0

neighbor_b Float (Optional) neighbor_b parameter for the number of nearest neighbors k = neighbor_a + neighbor_b * log10(n_cells)

The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b * log10(n_cells). The actual number of neighbors used is the maximum of this value and graphclust_neighbors. Determines how clustering granularity scales with cell count. Default: 120.0

tsne_theta Float (Optional) TSNE theta parameter. Higher values yield faster, more approximate results (and vice versa)

TSNE theta parameter (see the TSNE FAQ for more details). Higher values yield faster, more approximate results (and vice versa). The runtime and memory performance of TSNE will increase dramatically if you set this below 0.25. Ranges from 0 to 1. Default: 0.5

num_pca_bcs Integer (Optional) Randomly subset data to N barcodes when computing PCA projection. Try reducing this parameter if your analysis is running out of memory

Randomly subset data to N barcodes when computing PCA projection (the most memory-intensive step). The PCA projection will still be applied to the full dataset, i.e. your final results will still reflect all the data. Try reducing this parameter if your analysis is running out of memory. Cannot be set higher than the available number of cells. Default: null

random_seed Integer (Optional) Random seed

Random seed. Due to the randomized nature of the algorithms, changing this will produce slightly different results. If the TSNE or UMAP results don't look good, try running multiple times with different seeds and pick the TSNE or UMAP that looks best. Default: 0

umap_metric Determines how the distance is computed in the input space

Determines how the distance is computed in the input space. Default: \"correlation\"

max_clusters Integer (Optional) Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called

Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called. Ranges from 10 to 50, depending on the number of cell populations / clusters you expect to see. Default: 10

memory_limit Integer (Optional) Maximum memory used (GB)

Maximum memory used (GB). The same will be applied to virtual memory

num_pca_genes Integer (Optional) Subset data to the top N genes when computing PCA. Try reducing this parameter if your analysis is running out of memory

Subset data to the top N genes (ranked by normalized dispersion) when computing PCA. Differential expression will still reflect all genes. Try reducing this parameter if your analysis is running out of memory. Cannot be set higher than the number of genes in the reference transcriptome. Default: null

tsne_max_dims Integer (Optional) Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections

Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections (note: runtime will increase significantly). Ranges from 2 to 3. Default: 2

tsne_max_iter Integer (Optional) Number of total TSNE iterations. Try increasing this if TSNE results do not look good on larger numbers of cells

Number of total TSNE iterations. Try increasing this if TSNE results do not look good on larger numbers of cells. Runtime increases linearly with number of iterations. Ranges from 1000 to 10000. Default: 1000

umap_max_dims Integer (Optional) Maximum number of UMAP output dimensions. Set this to 3 to produce both 2D and 3D UMAP projections

Maximum number of UMAP output dimensions. Set this to 3 to produce both 2D and 3D UMAP projections. Ranges from 2 to 3. Default: 2

umap_min_dist Float (Optional) Controls how tightly the embedding is allowed to pack points together

Controls how tightly the embedding is allowed to pack points together. Larger values make embedded points are more evenly distributed, while smaller values make the embedding more accurately with regard to the local structure. Ranges from 0.001 to 0.5. Default: 0.3

excluded_genes File (Optional) A CSV file containing a list of gene IDs to exclude for reanalysis. Applied after setting selected genes

A CSV file containing a list of gene IDs to exclude for reanalysis (corresponding to the gene_id field of the reference GTF). All gene IDs must be present in the matrix. The exclusion is applied after setting the gene list with --genes. Note that only gene features are used in secondary analysis. Feature Barcode features are ignored.

selected_genes File (Optional) A CSV file containing a list of gene IDs to use for reanalysis

A CSV file containing a list of gene IDs to use for reanalysis (corresponding to the gene_id field of the reference GTF). All gene IDs must be present in the matrix. Note that only gene features are used in secondary analysis. Feature Barcode features are ignored.

tsne_input_pcs Integer (Optional) Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs

Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs, independent of the clustering / differential expression. You may find that TSNE is faster and/or the output looks better when using fewer PCs. Cannot be set higher than the num_principal_comps parameter. Default: null

umap_input_pcs Integer (Optional) Subset to top N principal components for UMAP. Change this parameter if you want to see how the UMAP plot changes when using fewer PCs

Subset to top N principal components for UMAP. Change this parameter if you want to see how the UMAP plot changes when using fewer PCs, independent of the clustering / differential expression. You may find that UMAP is faster and/or the output looks better when using fewer PCs. Cannot be set higher than the num_principal_comps parameter. Default: null

force_cells_num Integer (Optional) Force pipeline to use this number of cells, bypassing the cell detection algorithm

Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot. If specifying a value that exceeds the original cell count, you must use the raw_gene_bc_matrices_h5.h5

tsne_perplexity Integer (Optional) TSNE perplexity parameter. When analyzing 100k+ cells, increasing this parameter may improve TSNE results

TSNE perplexity parameter (see the TSNE FAQ for more details). When analyzing 100k+ cells, increasing this parameter may improve TSNE results, but the algorithm will be slower. Ranges from 30 to 50. Default: 30

num_analysis_bcs Integer (Optional) Randomly subset data to N barcodes for all analysis. Reduce this parameter if you want to improve performance or simulate results from lower cell counts

Randomly subset data to N barcodes for all analysis. Reduce this parameter if you want to improve performance or simulate results from lower cell counts. Cannot be set higher than the available number of cells. Default: null

umap_n_neighbors Integer (Optional) Determines the number of neighboring points used in local approximations of manifold structure

Determines the number of neighboring points used in local approximations of manifold structure. Larger values will usually result in more global structure at the loss of detailed local structure. Ranges from 5 to 50. Default: 30

selected_barcodes File (Optional) A CSV file containing a list of cell barcodes to use for reanalysis

A CSV file containing a list of cell barcodes to use for reanalysis, e.g. barcodes exported from Loupe Browser. All barcodes must be present in the matrix.

num_principal_comps Integer (Optional) Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called

Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called. The default value is 100 when the chemistry batch correction is enabled. Set from 10 to 100, depending on the number of cell populations/clusters you expect to see. Default: 10

cbc_realign_panorama Boolean (Optional) Specify if two batches will be merged if they are already in the same panorama. Setting this to True will usually improve the performance

Specify if two batches will be merged if they are already in the same panorama. Setting this to True will usually improve the performance, but will also increase runtime and memory usage. See Chemistry Batch Correction page for more details. One of true or false. Default: false

graphclust_neighbors Integer (Optional) Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering

Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering. The actual number of neighbors used is the maximum of this value and that determined by neighbor_a and neighbor_b. Set this value to zero to use those values instead. Ranged from 10 to 500, depending on desired granularity. Default: 0

tsne_mom_switch_iter Integer (Optional) Iteration at which TSNE momentum is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells

Iteration at which TSNE momentum is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells. Cannot be set higher than tsne_max_iter. Cannot be set higher than tsne_max_iter. Default: 250

tsne_stop_lying_iter Integer (Optional) Iteration at which TSNE learning rate is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells

Iteration at which TSNE learning rate is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells. Cannot be set higher than tsne_max_iter. Default: 250

filtered_feature_bc_matrix_h5 File scRNA-Seq Cell Ranger Experiment

Filtered feature-barcode matrices in HDF5 format from cellranger count or aggr results

Steps

ID Runs Label Doc
reanalyze
../tools/cellranger-reanalyze.cwl (CommandLineTool)
Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix

Tool runs cellranger reanalyze command to rerun secondary analysis performed on the feature-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings.

Parameters set by default: --disable-ui - no need in any UI when running in Docker container --id - hardcoded to `reanalyzed` as we want to return the content of the output folder as separate outputs

Skipped parameters: --dry --noexit --nopreflight --description --jobmode --mempercore --maxjobs --jobinterval --overrides --uiport

Skipped outputs as they are identical to inputs: - Filtered feature-barcode matrices MEX - Filtered feature-barcode matrices HDF5 - Copy of the input aggregation CSV

Notes: - Passing `aggregation_metadata` might not work as it will require additional inputs for all files from that CSV file. Otherwise cellranger will fail to parse it. Address this question when needed.

compress_secondary_analysis_report_folder
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

Outputs

ID Type Label Doc
reanalyze_params File Reanalyze params in CSV format

Reanalyze params in CSV format

web_summary_report File Reanalyzed run summary metrics and charts in HTML format

Reanalyzed run summary metrics and charts in HTML format

loupe_browser_track File Loupe Browser visualization and analysis file for reanalyzed results

Loupe Browser visualization and analysis file for reanalyzed results

reanalyze_stderr_log File stderr log generated by cellranger reanalyze

stderr log generated by cellranger reanalyze

reanalyze_stdout_log File stdout log generated by cellranger reanalyze

stdout log generated by cellranger agreanalyzegr

secondary_analysis_report_folder File Compressed folder with reanalyzed secondary analysis results

Compressed folder with secondary analysis results including dimensionality reduction, cell clustering, and differential expression of reanalyzed results

Permalink: https://w3id.org/cwl/view/git/8a92669a566589d80fde9d151054ffc220ed4ddd/workflows/cellranger-reanalyze.cwl