Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix

Workflow: Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix

Fetched 2023-01-03 19:44:57 GMT

Verified with cwltool version 3.1.20221201130942

Devel version of Single-Cell Cell Ranger Reanalyze ================================================== Workflow calls \"cellranger aggr\" command to rerun secondary analysis performed on the feature-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings. As an input we use filtered feature-barcode matrices in HDF5 format from cellranger count or aggr experiments. Note, we don't pass aggregation_metadata from the upstream cellranger aggr step. Need to address this issue when needed.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

Steps

ID	Runs	Label	Doc
reanalyze	../tools/cellranger-reanalyze.cwl (CommandLineTool)	Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix	Tool runs cellranger reanalyze command to rerun secondary analysis performed on the feature-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings. Parameters set by default: --disable-ui - no need in any UI when running in Docker container --id - hardcoded to `reanalyzed` as we want to return the content of the output folder as separate outputs Skipped parameters: --dry --noexit --nopreflight --description --jobmode --mempercore --maxjobs --jobinterval --overrides --uiport Skipped outputs as they are identical to inputs: - Filtered feature-barcode matrices MEX - Filtered feature-barcode matrices HDF5 - Copy of the input aggregation CSV Notes: - Passing `aggregation_metadata` might not work as it will require additional inputs for all files from that CSV file. Otherwise cellranger will fail to parse it. Address this question when needed.
compress_secondary_analysis_report_folder	../tools/tar-compress.cwl (CommandLineTool)		Compresses input directory to tar.gz

Outputs

ID	Type	Label	Doc
reanalyze_params	File	Reanalyze params in CSV format	Reanalyze params in CSV format
web_summary_report	File	Reanalyzed run summary metrics and charts in HTML format	Reanalyzed run summary metrics and charts in HTML format
loupe_browser_track	File	Loupe Browser visualization and analysis file for reanalyzed results	Loupe Browser visualization and analysis file for reanalyzed results
reanalyze_stderr_log	File	stderr log generated by cellranger reanalyze	stderr log generated by cellranger reanalyze
reanalyze_stdout_log	File	stdout log generated by cellranger reanalyze	stdout log generated by cellranger agreanalyzegr
secondary_analysis_report_folder	File	Compressed folder with reanalyzed secondary analysis results	Compressed folder with secondary analysis results including dimensionality reduction, cell clustering, and differential expression of reanalyzed results

Permalink: https://w3id.org/cwl/view/git/8a92669a566589d80fde9d151054ffc220ed4ddd/workflows/cellranger-reanalyze.cwl

ID	Type	Title	Doc
alias	String	Experiment short name/Alias
cbc_knn	Integer (Optional)	Specify the number of nearest neighbors used to identify mutual nearest neighbors. Setting this too high will increase runtime	Specify the number of nearest neighbors used to identify mutual nearest neighbors. Setting this too high will increase runtime and may cause out of memory error. See Chemistry Batch Correction page for more details. Ranges from 5 to 20. Default: 10
threads	Integer (Optional)	Number of threads	Number of threads for those steps that support multithreading
cbc_alpha	Float (Optional)	Specify the threshold of the percentage of matched cells between two batches, which is used to determine if the batch pair will be merged	Specify the threshold of the percentage of matched cells between two batches, which is used to determine if the batch pair will be merged. See Chemistry Batch Correction page for more details. Ranges from 0.05 to 0.5. Default: 0.1
cbc_sigma	Float (Optional)	Specify the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell	Specify the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell. See Chemistry Batch Correction page for more details. Ranges from 10 to 500. Default: 150
neighbor_a	Float (Optional)	neighbor_a parameter for the number of nearest neighbors k = neighbor_a + neighbor_b * log10(n_cells)	The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b * log10(n_cells). The actual number of neighbors used is the maximum of this value and graphclust_neighbors. Determines how clustering granularity scales with cell count. Default: -230.0
neighbor_b	Float (Optional)	neighbor_b parameter for the number of nearest neighbors k = neighbor_a + neighbor_b * log10(n_cells)	The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b * log10(n_cells). The actual number of neighbors used is the maximum of this value and graphclust_neighbors. Determines how clustering granularity scales with cell count. Default: 120.0
tsne_theta	Float (Optional)	TSNE theta parameter. Higher values yield faster, more approximate results (and vice versa)	TSNE theta parameter (see the TSNE FAQ for more details). Higher values yield faster, more approximate results (and vice versa). The runtime and memory performance of TSNE will increase dramatically if you set this below 0.25. Ranges from 0 to 1. Default: 0.5
num_pca_bcs	Integer (Optional)	Randomly subset data to N barcodes when computing PCA projection. Try reducing this parameter if your analysis is running out of memory	Randomly subset data to N barcodes when computing PCA projection (the most memory-intensive step). The PCA projection will still be applied to the full dataset, i.e. your final results will still reflect all the data. Try reducing this parameter if your analysis is running out of memory. Cannot be set higher than the available number of cells. Default: null
random_seed	Integer (Optional)	Random seed	Random seed. Due to the randomized nature of the algorithms, changing this will produce slightly different results. If the TSNE or UMAP results don't look good, try running multiple times with different seeds and pick the TSNE or UMAP that looks best. Default: 0
umap_metric		Determines how the distance is computed in the input space	Determines how the distance is computed in the input space. Default: \"correlation\"
max_clusters	Integer (Optional)	Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called	Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called. Ranges from 10 to 50, depending on the number of cell populations / clusters you expect to see. Default: 10
memory_limit	Integer (Optional)	Maximum memory used (GB)	Maximum memory used (GB). The same will be applied to virtual memory
num_pca_genes	Integer (Optional)	Subset data to the top N genes when computing PCA. Try reducing this parameter if your analysis is running out of memory	Subset data to the top N genes (ranked by normalized dispersion) when computing PCA. Differential expression will still reflect all genes. Try reducing this parameter if your analysis is running out of memory. Cannot be set higher than the number of genes in the reference transcriptome. Default: null
tsne_max_dims	Integer (Optional)	Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections	Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections (note: runtime will increase significantly). Ranges from 2 to 3. Default: 2
tsne_max_iter	Integer (Optional)	Number of total TSNE iterations. Try increasing this if TSNE results do not look good on larger numbers of cells	Number of total TSNE iterations. Try increasing this if TSNE results do not look good on larger numbers of cells. Runtime increases linearly with number of iterations. Ranges from 1000 to 10000. Default: 1000
umap_max_dims	Integer (Optional)	Maximum number of UMAP output dimensions. Set this to 3 to produce both 2D and 3D UMAP projections	Maximum number of UMAP output dimensions. Set this to 3 to produce both 2D and 3D UMAP projections. Ranges from 2 to 3. Default: 2
umap_min_dist	Float (Optional)	Controls how tightly the embedding is allowed to pack points together	Controls how tightly the embedding is allowed to pack points together. Larger values make embedded points are more evenly distributed, while smaller values make the embedding more accurately with regard to the local structure. Ranges from 0.001 to 0.5. Default: 0.3
excluded_genes	File (Optional)	A CSV file containing a list of gene IDs to exclude for reanalysis. Applied after setting selected genes	A CSV file containing a list of gene IDs to exclude for reanalysis (corresponding to the gene_id field of the reference GTF). All gene IDs must be present in the matrix. The exclusion is applied after setting the gene list with --genes. Note that only gene features are used in secondary analysis. Feature Barcode features are ignored.
selected_genes	File (Optional)	A CSV file containing a list of gene IDs to use for reanalysis	A CSV file containing a list of gene IDs to use for reanalysis (corresponding to the gene_id field of the reference GTF). All gene IDs must be present in the matrix. Note that only gene features are used in secondary analysis. Feature Barcode features are ignored.
tsne_input_pcs	Integer (Optional)	Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs	Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs, independent of the clustering / differential expression. You may find that TSNE is faster and/or the output looks better when using fewer PCs. Cannot be set higher than the num_principal_comps parameter. Default: null
umap_input_pcs	Integer (Optional)	Subset to top N principal components for UMAP. Change this parameter if you want to see how the UMAP plot changes when using fewer PCs	Subset to top N principal components for UMAP. Change this parameter if you want to see how the UMAP plot changes when using fewer PCs, independent of the clustering / differential expression. You may find that UMAP is faster and/or the output looks better when using fewer PCs. Cannot be set higher than the num_principal_comps parameter. Default: null
force_cells_num	Integer (Optional)	Force pipeline to use this number of cells, bypassing the cell detection algorithm	Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot. If specifying a value that exceeds the original cell count, you must use the raw_gene_bc_matrices_h5.h5
tsne_perplexity	Integer (Optional)	TSNE perplexity parameter. When analyzing 100k+ cells, increasing this parameter may improve TSNE results	TSNE perplexity parameter (see the TSNE FAQ for more details). When analyzing 100k+ cells, increasing this parameter may improve TSNE results, but the algorithm will be slower. Ranges from 30 to 50. Default: 30
num_analysis_bcs	Integer (Optional)	Randomly subset data to N barcodes for all analysis. Reduce this parameter if you want to improve performance or simulate results from lower cell counts	Randomly subset data to N barcodes for all analysis. Reduce this parameter if you want to improve performance or simulate results from lower cell counts. Cannot be set higher than the available number of cells. Default: null
umap_n_neighbors	Integer (Optional)	Determines the number of neighboring points used in local approximations of manifold structure	Determines the number of neighboring points used in local approximations of manifold structure. Larger values will usually result in more global structure at the loss of detailed local structure. Ranges from 5 to 50. Default: 30
selected_barcodes	File (Optional)	A CSV file containing a list of cell barcodes to use for reanalysis	A CSV file containing a list of cell barcodes to use for reanalysis, e.g. barcodes exported from Loupe Browser. All barcodes must be present in the matrix.
num_principal_comps	Integer (Optional)	Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called	Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called. The default value is 100 when the chemistry batch correction is enabled. Set from 10 to 100, depending on the number of cell populations/clusters you expect to see. Default: 10
cbc_realign_panorama	Boolean (Optional)	Specify if two batches will be merged if they are already in the same panorama. Setting this to True will usually improve the performance	Specify if two batches will be merged if they are already in the same panorama. Setting this to True will usually improve the performance, but will also increase runtime and memory usage. See Chemistry Batch Correction page for more details. One of true or false. Default: false
graphclust_neighbors	Integer (Optional)	Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering	Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering. The actual number of neighbors used is the maximum of this value and that determined by neighbor_a and neighbor_b. Set this value to zero to use those values instead. Ranged from 10 to 500, depending on desired granularity. Default: 0
tsne_mom_switch_iter	Integer (Optional)	Iteration at which TSNE momentum is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells	Iteration at which TSNE momentum is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells. Cannot be set higher than tsne_max_iter. Cannot be set higher than tsne_max_iter. Default: 250
tsne_stop_lying_iter	Integer (Optional)	Iteration at which TSNE learning rate is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells	Iteration at which TSNE learning rate is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells. Cannot be set higher than tsne_max_iter. Default: 250
filtered_feature_bc_matrix_h5	File	scRNA-Seq Cell Ranger Experiment	Filtered feature-barcode matrices in HDF5 format from cellranger count or aggr results