Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph repliseq-parta.cwl

https://github.com/4dn-dcic/pipelines-cwl.git

Path: cwl_awsem_v1/repliseq/repliseq-parta.cwl

Branch/Commit ID: dev2

workflow graph GATK Co-Cleaning Workflow

PCAWG GATK Co-cleaning workflow is developed by the Broad Institute (https://www.broadinstitute.org), it consists of two pre-processing steps for tumor/normal BAM files: indel realignment and base quality score recalibration (BQSR). The workflow has been dockerized and packaged using CWL workflow language, the source code is available on GitHub at: https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning. ## Run the workflow with your own data ### Prepare compute environment and install software packages The workflow has been tested in Ubuntu 16.04 Linux environment with the following hardware and software settings. #### Hardware requirement (assuming 30X coverage whole genome sequence) - CPU core: 16 - Memory: 64GB - Disk space: 1TB #### Software installation - Docker (1.12.6): follow instructions to install Docker https://docs.docker.com/engine/installation - CWL tool ``` pip install cwltool==1.0.20170217172322 ``` ### Prepare input data #### Input aligned tumor / normal BAM files The workflow uses a pair of aligned BAM files as input, one BAM for tumor, the other for normal, both from the same donor. Here we assume file names are *tumor_sample.bam* and *normal_sample.bam*, and are under *bams* subfolder. #### Reference data files The workflow also uses the following files as reference, they can be downloaded from the ICGC Data Portal: - Under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-bwa-mem - genome.fa.gz - genome.dict - Under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-gatk-cocleaning - 1000G_phase1.indels.hg19.sites.fixed.vcf.gz - Mills_and_1000G_gold_standard.indels.hg19.sites.fixed.vcf.gz - dbsnp_132_b37.leftAligned.vcf.gz We assume the reference files are under *reference* subfolder. #### Job JSON file for CWL Finally, we need to prepare a JSON file with input, reference files specified. Please replace the *tumor_bam* and *normal_bam* parameters with your real BAM files. Name the JSON file: *pcawg-gatk-cocleaning.job.json* ``` { \"tumor_bam\": { \"class\": \"File\", \"location\": \"bams/tumor_sample.bam\" }, \"normal_bam\": { \"class\": \"File\", \"location\": \"bams/normal_sample.bam\" }, \"reference\": { \"class\": \"File\", \"location\": \"reference/genome.fa\" }, \"knownIndels\": [ { \"class\": \"File\", \"location\": \"reference/1000G_phase1.indels.hg19.sites.fixed.vcf.gz\" }, { \"class\": \"File\", \"location\": \"reference/Mills_and_1000G_gold_standard.indels.hg19.sites.fixed.vcf.gz\" } ], \"knownSites\": [ { \"class\": \"File\", \"location\": \"reference/dbsnp_132_b37.leftAligned.vcf.gz\" } ] } ``` ### Run the workflow #### Option 1: Run with CWL tool - Download CWL workflow definition files ``` wget https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning/archive/0.1.1.tar.gz tar xvf pcawg-gatk-cocleaning-0.1.1.tar.gz ``` - Run `cwltool` to execute the workflow ``` nohup cwltool --debug --non-strict pcawg-gatk-cocleaning-0.1.1/gatk-cocleaning-workflow.cwl pcawg-gatk-cocleaning.job.json > pcawg-gatk-cocleaning.log 2>&1 & ``` #### Option 2: Run with the Dockstore CLI See the *Launch with* section below for details.

https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning.git

Path: gatk-cocleaning-workflow.cwl

Branch/Commit ID: 0.1.1

workflow graph Gathered Downsample and HaplotypeCaller

https://github.com/tmooney/cancer-genomics-workflow.git

Path: definitions/pipelines/gathered_downsample_and_recall.cwl

Branch/Commit ID: downsample_and_recall

workflow graph exomeseq-gatk4-01-preprocessing.cwl

https://github.com/bespin-workflows/exomeseq-gatk4.git

Path: subworkflows/exomeseq-gatk4-01-preprocessing.cwl

Branch/Commit ID: v2.0.3

workflow graph samtools_view_sam2bam

https://gitlab.bsc.es/lrodrig1/structuralvariants_poc.git

Path: structuralvariants/cwl/subworkflows/samtools_view_sam2bam.cwl

Branch/Commit ID: 1.0.7

workflow graph canine_annotation_module.cwl

https://github.com/d3b-center/canine-dev.git

Path: subworkflows/canine_annotation_module.cwl

Branch/Commit ID: master

workflow graph samples_fillout_index_batch_workflow.cwl

Wrapper to run bam indexing on all bams before submitting for samples fillout Also includes steps to pre-filter some maf input files NOTE: each sample in a sample_group must have a .bam file, and there must be a minumum of 1 .maf file amoungst samples in the same sample_group this means that for each sample in the sample_group, a .bam is required but a .maf is optional as long as one sample in the group has a .maf this also means that singleton sample groups, or a sample group with only one sample, MUST include a .maf file; singletons cannot lack a .maf NOTE: all .maf files must be valid, at a minimum they must have a header and at least one variant if a sample has no variants in its .maf file, or has an empty .maf file, then it should NOT have a maf_file entry associated with it

https://github.com/mskcc/pluto-cwl.git

Path: cwl/samples_fillout_index_batch_workflow.cwl

Branch/Commit ID: master

workflow graph bulk-atac-seq-pipeline.cwl

https://github.com/hubmapconsortium/sc-atac-seq-pipeline.git

Path: bulk-atac-seq-pipeline.cwl

Branch/Commit ID: 302f1f3

workflow graph functional analysis prediction with InterProScan

https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl.git

Path: workflows/functional_analysis.cwl

Branch/Commit ID: 3f85843

workflow graph TransDecoder 2 step workflow, running TransDecoder.LongOrfs (step 1) followed by TransDecoder.Predict (step2)

https://github.com/stain/workflow-is-cwl.git

Path: workflows/TransDecoder-v5-wf-2steps.cwl

Branch/Commit ID: avoid-spaces