Workflow: Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix
Devel version of Single-Cell Cell Ranger Reanalyze ================================================== Workflow calls \"cellranger aggr\" command to rerun secondary analysis performed on the feature-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings. As an input we use filtered feature-barcode matrices in HDF5 format from cellranger count or aggr experiments. Note, we don't pass aggregation_metadata from the upstream cellranger aggr step. Need to address this issue when needed.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
alias | String | Experiment short name/Alias | |
cbc_knn | Integer (Optional) | Specify the number of nearest neighbors used to identify mutual nearest neighbors. Setting this too high will increase runtime |
Specify the number of nearest neighbors used to identify mutual nearest neighbors. Setting this too high will increase runtime and may cause out of memory error. See Chemistry Batch Correction page for more details. Ranges from 5 to 20. Default: 10 |
threads | Integer (Optional) | Number of threads |
Number of threads for those steps that support multithreading |
cbc_alpha | Float (Optional) | Specify the threshold of the percentage of matched cells between two batches, which is used to determine if the batch pair will be merged |
Specify the threshold of the percentage of matched cells between two batches, which is used to determine if the batch pair will be merged. See Chemistry Batch Correction page for more details. Ranges from 0.05 to 0.5. Default: 0.1 |
cbc_sigma | Float (Optional) | Specify the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell |
Specify the bandwidth of the Gaussian smoothing kernel used to compute the correction vector for each cell. See Chemistry Batch Correction page for more details. Ranges from 10 to 500. Default: 150 |
neighbor_a | Float (Optional) | neighbor_a parameter for the number of nearest neighbors k = neighbor_a + neighbor_b * log10(n_cells) |
The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b * log10(n_cells). The actual number of neighbors used is the maximum of this value and graphclust_neighbors. Determines how clustering granularity scales with cell count. Default: -230.0 |
neighbor_b | Float (Optional) | neighbor_b parameter for the number of nearest neighbors k = neighbor_a + neighbor_b * log10(n_cells) |
The number of nearest neighbors, k, used in the graph-based clustering is computed as follows: k = neighbor_a + neighbor_b * log10(n_cells). The actual number of neighbors used is the maximum of this value and graphclust_neighbors. Determines how clustering granularity scales with cell count. Default: 120.0 |
tsne_theta | Float (Optional) | TSNE theta parameter. Higher values yield faster, more approximate results (and vice versa) |
TSNE theta parameter (see the TSNE FAQ for more details). Higher values yield faster, more approximate results (and vice versa). The runtime and memory performance of TSNE will increase dramatically if you set this below 0.25. Ranges from 0 to 1. Default: 0.5 |
num_pca_bcs | Integer (Optional) | Randomly subset data to N barcodes when computing PCA projection. Try reducing this parameter if your analysis is running out of memory |
Randomly subset data to N barcodes when computing PCA projection (the most memory-intensive step). The PCA projection will still be applied to the full dataset, i.e. your final results will still reflect all the data. Try reducing this parameter if your analysis is running out of memory. Cannot be set higher than the available number of cells. Default: null |
random_seed | Integer (Optional) | Random seed |
Random seed. Due to the randomized nature of the algorithms, changing this will produce slightly different results. If the TSNE or UMAP results don't look good, try running multiple times with different seeds and pick the TSNE or UMAP that looks best. Default: 0 |
umap_metric | Determines how the distance is computed in the input space |
Determines how the distance is computed in the input space. Default: \"correlation\" |
|
max_clusters | Integer (Optional) | Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called |
Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called. Ranges from 10 to 50, depending on the number of cell populations / clusters you expect to see. Default: 10 |
memory_limit | Integer (Optional) | Maximum memory used (GB) |
Maximum memory used (GB). The same will be applied to virtual memory |
num_pca_genes | Integer (Optional) | Subset data to the top N genes when computing PCA. Try reducing this parameter if your analysis is running out of memory |
Subset data to the top N genes (ranked by normalized dispersion) when computing PCA. Differential expression will still reflect all genes. Try reducing this parameter if your analysis is running out of memory. Cannot be set higher than the number of genes in the reference transcriptome. Default: null |
tsne_max_dims | Integer (Optional) | Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections |
Maximum number of TSNE output dimensions. Set this to 3 to produce both 2D and 3D TSNE projections (note: runtime will increase significantly). Ranges from 2 to 3. Default: 2 |
tsne_max_iter | Integer (Optional) | Number of total TSNE iterations. Try increasing this if TSNE results do not look good on larger numbers of cells |
Number of total TSNE iterations. Try increasing this if TSNE results do not look good on larger numbers of cells. Runtime increases linearly with number of iterations. Ranges from 1000 to 10000. Default: 1000 |
umap_max_dims | Integer (Optional) | Maximum number of UMAP output dimensions. Set this to 3 to produce both 2D and 3D UMAP projections |
Maximum number of UMAP output dimensions. Set this to 3 to produce both 2D and 3D UMAP projections. Ranges from 2 to 3. Default: 2 |
umap_min_dist | Float (Optional) | Controls how tightly the embedding is allowed to pack points together |
Controls how tightly the embedding is allowed to pack points together. Larger values make embedded points are more evenly distributed, while smaller values make the embedding more accurately with regard to the local structure. Ranges from 0.001 to 0.5. Default: 0.3 |
excluded_genes | File (Optional) | A CSV file containing a list of gene IDs to exclude for reanalysis. Applied after setting selected genes |
A CSV file containing a list of gene IDs to exclude for reanalysis (corresponding to the gene_id field of the reference GTF). All gene IDs must be present in the matrix. The exclusion is applied after setting the gene list with --genes. Note that only gene features are used in secondary analysis. Feature Barcode features are ignored. |
selected_genes | File (Optional) | A CSV file containing a list of gene IDs to use for reanalysis |
A CSV file containing a list of gene IDs to use for reanalysis (corresponding to the gene_id field of the reference GTF). All gene IDs must be present in the matrix. Note that only gene features are used in secondary analysis. Feature Barcode features are ignored. |
tsne_input_pcs | Integer (Optional) | Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs |
Subset to top N principal components for TSNE. Change this parameter if you want to see how the TSNE plot changes when using fewer PCs, independent of the clustering / differential expression. You may find that TSNE is faster and/or the output looks better when using fewer PCs. Cannot be set higher than the num_principal_comps parameter. Default: null |
umap_input_pcs | Integer (Optional) | Subset to top N principal components for UMAP. Change this parameter if you want to see how the UMAP plot changes when using fewer PCs |
Subset to top N principal components for UMAP. Change this parameter if you want to see how the UMAP plot changes when using fewer PCs, independent of the clustering / differential expression. You may find that UMAP is faster and/or the output looks better when using fewer PCs. Cannot be set higher than the num_principal_comps parameter. Default: null |
force_cells_num | Integer (Optional) | Force pipeline to use this number of cells, bypassing the cell detection algorithm |
Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot. If specifying a value that exceeds the original cell count, you must use the raw_gene_bc_matrices_h5.h5 |
tsne_perplexity | Integer (Optional) | TSNE perplexity parameter. When analyzing 100k+ cells, increasing this parameter may improve TSNE results |
TSNE perplexity parameter (see the TSNE FAQ for more details). When analyzing 100k+ cells, increasing this parameter may improve TSNE results, but the algorithm will be slower. Ranges from 30 to 50. Default: 30 |
num_analysis_bcs | Integer (Optional) | Randomly subset data to N barcodes for all analysis. Reduce this parameter if you want to improve performance or simulate results from lower cell counts |
Randomly subset data to N barcodes for all analysis. Reduce this parameter if you want to improve performance or simulate results from lower cell counts. Cannot be set higher than the available number of cells. Default: null |
umap_n_neighbors | Integer (Optional) | Determines the number of neighboring points used in local approximations of manifold structure |
Determines the number of neighboring points used in local approximations of manifold structure. Larger values will usually result in more global structure at the loss of detailed local structure. Ranges from 5 to 50. Default: 30 |
selected_barcodes | File (Optional) | A CSV file containing a list of cell barcodes to use for reanalysis |
A CSV file containing a list of cell barcodes to use for reanalysis, e.g. barcodes exported from Loupe Browser. All barcodes must be present in the matrix. |
num_principal_comps | Integer (Optional) | Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called |
Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called. The default value is 100 when the chemistry batch correction is enabled. Set from 10 to 100, depending on the number of cell populations/clusters you expect to see. Default: 10 |
cbc_realign_panorama | Boolean (Optional) | Specify if two batches will be merged if they are already in the same panorama. Setting this to True will usually improve the performance |
Specify if two batches will be merged if they are already in the same panorama. Setting this to True will usually improve the performance, but will also increase runtime and memory usage. See Chemistry Batch Correction page for more details. One of true or false. Default: false |
graphclust_neighbors | Integer (Optional) | Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering |
Number of nearest-neighbors to use in the graph-based clustering. Lower values result in higher-granularity clustering. The actual number of neighbors used is the maximum of this value and that determined by neighbor_a and neighbor_b. Set this value to zero to use those values instead. Ranged from 10 to 500, depending on desired granularity. Default: 0 |
tsne_mom_switch_iter | Integer (Optional) | Iteration at which TSNE momentum is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells |
Iteration at which TSNE momentum is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells. Cannot be set higher than tsne_max_iter. Cannot be set higher than tsne_max_iter. Default: 250 |
tsne_stop_lying_iter | Integer (Optional) | Iteration at which TSNE learning rate is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells |
Iteration at which TSNE learning rate is reduced. Try increasing this if TSNE results do not look good on larger numbers of cells. Cannot be set higher than tsne_max_iter. Default: 250 |
filtered_feature_bc_matrix_h5 | File | scRNA-Seq Cell Ranger Experiment |
Filtered feature-barcode matrices in HDF5 format from cellranger count or aggr results |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
reanalyze |
../tools/cellranger-reanalyze.cwl
(CommandLineTool)
|
Cellranger reanalyze - reruns secondary analysis performed on the feature-barcode matrix |
Tool runs cellranger reanalyze command to rerun secondary analysis performed on
the feature-barcode matrix (dimensionality reduction, clustering and visualization)
using different parameter settings. |
compress_secondary_analysis_report_folder |
../tools/tar-compress.cwl
(CommandLineTool)
|
Compresses input directory to tar.gz |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
reanalyze_params | File | Reanalyze params in CSV format |
Reanalyze params in CSV format |
web_summary_report | File | Reanalyzed run summary metrics and charts in HTML format |
Reanalyzed run summary metrics and charts in HTML format |
loupe_browser_track | File | Loupe Browser visualization and analysis file for reanalyzed results |
Loupe Browser visualization and analysis file for reanalyzed results |
reanalyze_stderr_log | File | stderr log generated by cellranger reanalyze |
stderr log generated by cellranger reanalyze |
reanalyze_stdout_log | File | stdout log generated by cellranger reanalyze |
stdout log generated by cellranger agreanalyzegr |
secondary_analysis_report_folder | File | Compressed folder with reanalyzed secondary analysis results |
Compressed folder with secondary analysis results including dimensionality reduction, cell clustering, and differential expression of reanalyzed results |
https://w3id.org/cwl/view/git/8a92669a566589d80fde9d151054ffc220ed4ddd/workflows/cellranger-reanalyze.cwl