Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph scRNA-seq pipeline using Salmon and Alevin

https://github.com/hubmapconsortium/salmon-rnaseq.git

Path: pipeline.cwl

Branch/Commit ID: 8af5a1c9c99b06e7024e4ddbf45a15cf07ea9410

workflow graph workflow_with_facets.cwl

CWL workflow for generating Roslin / Argos post pipeline analysis files and cBioPortal data and metadata files This workflow includes Facets and Facets Suite usages Inputs ------ The following parameters are required: project_id project_pi request_pi project_short_name project_name project_description cancer_type cancer_study_identifier argos_version_string helix_filter_version is_impact extra_pi_groups pairs The following filenames are required: analysis_mutations_filename analysis_gene_cna_filename analysis_sv_filename analysis_segment_cna_filename cbio_segment_data_filename cbio_meta_cna_segments_filename The following filenames have default values and are optional: cbio_mutation_data_filename cbio_cna_data_filename cbio_fusion_data_filename cbio_clinical_patient_data_filename cbio_clinical_sample_data_filename cbio_clinical_sample_meta_filename cbio_clinical_patient_meta_filename cbio_meta_study_filename cbio_meta_cna_filename cbio_meta_fusions_filename cbio_meta_mutations_filename cbio_cases_all_filename cbio_cases_cnaseq_filename cbio_cases_cna_filename cbio_cases_sequenced_filename Output ------ Workflow output should look like this: output ├── analysis │   ├── <project_id>.gene.cna.txt │   ├── <project_id>.muts.maf │   ├── <project_id>.seg.cna.txt │   └── <project_id>.svs.maf ├── facets │ ├── <tumor_id>.<normal_id> (passed) │ │ └── <facets_files> │ └── <tumor_id>.<normal_id> (failed) │ └── <log_files> └── portal ├── case_list │   ├── cases_all.txt │   ├── cases_cnaseq.txt │   ├── cases_cna.txt │   └── cases_sequenced.txt ├── data_clinical_patient.txt ├── data_clinical_sample.txt ├── data_CNA.ascna.txt ├── data_CNA.scna.txt ├── data_CNA.txt ├── data_fusions.txt ├── data_mutations_extended.txt ├── meta_clinical_patient.txt ├── meta_clinical_sample.txt ├── meta_CNA.txt ├── meta_fusions.txt ├── meta_mutations_extended.txt ├── meta_study.txt ├── <project_id>_data_cna_hg19.seg └── <project_id>_meta_cna_hg19_seg.txt

https://github.com/mskcc/pluto-cwl.git

Path: cwl/workflow_with_facets.cwl

Branch/Commit ID: 342e6f1f4f7a3839e579fbe96ccc8d6f7a61ac77

workflow graph Trim Galore SMARTer RNA-Seq pipeline paired-end strand specific

https://chipster.csc.fi/manual/library-type-summary.html Modified original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) **RNA-Seq** basic analysis for a **pair-end** experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ files 2. Use STAR to align reads from input FASTQ files according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ files and generate quality statistics files 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ files to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file

https://github.com/datirium/workflows.git

Path: workflows/trim-rnaseq-pe-smarter-dutp.cwl

Branch/Commit ID: aebf2355539fdf81fd9082616f8b21440d2691c6

workflow graph Cellranger aggr - aggregates data from multiple Cellranger runs

Devel version of Single-Cell Cell Ranger Aggregate ================================================== Workflow calls \"cellranger aggr\" command to combine output files from \"cellranger count\" (the molecule_info.h5 file from each run) into a single feature-barcode matrix containing all the data. When combining multiple GEM wells, the barcode sequences for each channel are distinguished by a GEM well suffix appended to the barcode sequence. Each GEM well is a physically distinct set of GEM partitions, but draws barcode sequences randomly from the pool of valid barcodes, known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer identifying the GEM well to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier in the feature-barcode matrix. For example, AGACCATTGAGACTTA-1 and AGACCATTGAGACTTA-2 are distinct cell barcodes from different GEM wells, despite having the same barcode nucleotide sequence. This number, which tells us which GEM well this barcode sequence came from, is called the GEM well suffix. The numbering of the GEM wells will reflect the order that the GEM wells were provided in the \"molecule_info_h5\" and \"gem_well_labels\" inputs. When combining data from multiple GEM wells, the \"cellranger aggr\" pipeline automatically equalizes the average read depth per cell between groups before merging. This approach avoids artifacts that may be introduced due to differences in sequencing depth. It is possible to turn off normalization or change the way normalization is done through the \"normalization_mode\" input. The \"none\" value may be appropriate if you want to maximize sensitivity and plan to deal with depth normalization in a downstream step.

https://github.com/datirium/workflows.git

Path: workflows/cellranger-aggr.cwl

Branch/Commit ID: 57437c1e9f881411b65f79acd64b7cf14df5b901

workflow graph paramref_arguments_self.cwl

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/paramref_arguments_self.cwl

Branch/Commit ID: e1a9100dff381ebd59b2a74806f705b7c68a8584

workflow graph rnaseq-se-dutp.cwl

RNA-Seq basic analysis workflow for strand specific single-read experiment.

https://github.com/datirium/workflows.git

Path: workflows/rnaseq-se-dutp.cwl

Branch/Commit ID: e284e3f6dff25037b209895c52f2abd37a1ce1bf

workflow graph DESeq - differential gene expression analysis

# Differential gene expression analysis This differential gene expression (DGE) analysis takes as input samples from two experimental conditions that have been processed with an RNA-Seq workflow (see list of \"Upstream workflows\" below). DESeq estimates variance-mean dependence in count data from high-throughput sequencing assays, then tests for DGE based on a model which assumes a negative binomial distribution of gene expression (aligned read count per gene). ### Experimental Setup and Results Interpretation The workflow design uses as its fold change (FC) calculation: condition 1 (c1, e.g. treatment) over condition 2 (c2, e.g. control). In other words: `FC == (c1/c2)` Therefore: - if FC<1 the log2(FC) is <0 (negative), meaning expression in condition1<condition2 (gene is downregulated in c1) - if FC>1 the log2(FC) is >0 (positive), meaning expression in condition1>condition2 (gene is upregulated in c1) In other words, if you have input TREATMENT samples as condition 1, and CONTROL samples as condition 2, a positive L2FC for a gene indicates that expression of the gene in TREATMENT is greater (or upregulated) compared to CONTROL. Next, threshold the p-adjusted values with your FDR (false discovery rate) cutoff to determine if the change may be considered significant or not. It is important to note when DESeq1 or DESeq2 is used in our DGE analysis workflow. If a user inputs only a single sample per condition DESeq1 is used for calculating DGE. In this experimental setup, there are no repeated measurements per gene per condition, therefore biological variability in each condition cannot be captured so the output p-values are assumed to be purely \"technical\". On the other hand, if >1 sample(s) are input per condition DESeq2 is used. In this case, biological variability per gene within each condition is available to be incorporated into the model, and resulting p-values are assumed to be \"biological\". Additionally, DESeq2 fold change is \"shrunk\" to account for sample variability, and as Michael Love (DESeq maintainer) puts it, \"it looks at the largest fold changes that are not due to low counts and uses these to inform a prior distribution. So the large fold changes from genes with lots of statistical information are not shrunk, while the imprecise fold changes are shrunk. This allows you to compare all estimated LFC across experiments, for example, which is not really feasible without the use of a prior\". In either case, the null hypothesis (H0) tested is that there are no significantly differentially expressed genes between conditions, therefore a smaller p-value indicates a lower probability of the H0 occurring by random chance and therefore, below a certain threshold (traditionally <0.05), H0 should be rejected. Additionally, due to the many thousands of independent hypotheses being tested (each gene representing an independent test), the p-values attained by the Wald test are adjusted using the Benjamini and Hochberg method by default. These \"padj\" values should be used for determination of significance (a reasonable value here would be <0.10, i.e. below a 10% FDR). Further Analysis: Output from the DESeq workflow may be used as input to the GSEA (Gene Set Enrichment Analysis) workflow for identifying enriched marker gene sets between conditions. ### DESeq1 High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://www.bioconductor.org/packages/3.8/bioc/html/DESeq.html), as an R/Bioconductor package. ### DESeq2 In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. ### __References__ - Anders S, Huber W (2010). “Differential expression analysis for sequence count data.” Genome Biology, 11, R106. doi: 10.1186/gb-2010-11-10-r106, http://genomebiology.com/2010/11/10/R106/. - Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. doi: 10.1186/s13059-014-0550-8.

https://github.com/datirium/workflows.git

Path: workflows/deseq.cwl

Branch/Commit ID: 7ae3b75bbe614e59cdeaba06047234a6c40c0fe9

workflow graph id_to_json_workflow.cwl

https://github.com/sfu-ireceptor/AIRR-seqAA.git

Path: cwl/id_to_json_workflow.cwl

Branch/Commit ID: 4b98ab2f65e2d4f68bd05d5719eadbbad14e94e1

workflow graph dfast-filelist-outputdir.cwl

https://github.com/nigyta/bact_genome.git

Path: cwl/workflow/dfast-filelist-outputdir.cwl

Branch/Commit ID: e316f37f502005165ebd7f22b5257900c7c712ac

workflow graph cond-wf-004_nojs.cwl

https://github.com/common-workflow-language/cwl-v1.2.git

Path: tests/conditionals/cond-wf-004_nojs.cwl

Branch/Commit ID: 707ebcd2173889604459c5f4ffb55173c508abb3