Workflow: RNA-seq (VCF) alelle specific pipeline for paired-end data

Fetched 2023-01-08 22:11:27 GMT

Allele specific RNA-Seq (using vcf) paired-end workflow

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
strain1 String I strain name

First strain name

strain2 String II strain name

Second strain name

threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

chrom_length_file File [Textual format] Chromosome length file for reference genome

Chromosome length file for reference genome

strain1_chain_file File [Textual format] I strain chain file

Chain file to project strain I to reference genome

strain2_chain_file File [Textual format] II strain chain file

Chain file to project strain II to reference genome

fastq_file_upstream File [FASTQ] FASTQ upstream input file

Upstream reads data in a FASTQ format, received after paired end sequencing

star_indices_folder Directory STAR indices folder for reference genome

Path to STAR generated indices for reference genome

fastq_file_downstream File [FASTQ] FASTQ downstream input file

Downstream reads data in a FASTQ format, received after paired end sequencing

insilico_star_indices_folder Directory STAR indices folder for insilico genome

Path to STAR generated indices for insilico genome

Steps

ID Runs Label Doc
extract_fastq_upstream
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_fastq_downstream
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

allele_vcf_alignreads_se_pe

Workflow maps FASTQ files from `fastq_files` input into reference genome `reference_star_indices_folder` and insilico generated `insilico_star_indices_folder` genome (concatenated genome for both `strain1` and `strain2` strains). For both genomes STAR is run with `outFilterMultimapNmax` parameter set to 1 to discard all of the multimapped reads. For insilico genome SAM file is generated. Then it's splitted into two SAM files based on strain names and then sorted by coordinates into the BAM format. For reference genome output BAM file from STAR slignment is also coordinate sorted.

Outputs

ID Type Label Doc
strain1_bigwig File [bigWig] I strain bigWig file

Generated bigWig file for the first strain

strain2_bigwig File [bigWig] II strain bigWig file

Generated bigWig file for the second strain

reference_bigwig File [bigWig] Reference bigWig file

Generated BigWig file for the reference genome

strain1_bambai_pair File [BAM] Strain I coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file for strain I

strain2_bambai_pair File [BAM] Strain II coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file for strain II

insilico_star_out_log File (Optional) [Textual format] STAR log out for insilico genome

STAR Log.out for insilico genome

reference_bambai_pair File [BAM] Reference coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file for reference genome

reference_star_out_log File (Optional) [Textual format] STAR log out for reference genome

STAR Log.out for reference genome

insilico_star_final_log File [Textual format] STAR final log for insilico genome

STAR Log.final.out for insilico genome

insilico_star_stdout_log File (Optional) [Textual format] STAR stdout log for insilico genome

STAR Log.std.out for insilico genome

reference_star_final_log File [Textual format] STAR final log for reference genome

STAR Log.final.out for reference genome

reference_star_stdout_log File (Optional) [Textual format] STAR stdout log for reference genome

STAR Log.std.out for reference genome

insilico_star_progress_log File (Optional) [Textual format] STAR progress log for insilico genome

STAR Log.progress.out for insilico genome

reference_star_progress_log File (Optional) [Textual format] STAR progress log for reference genome

STAR Log.progress.out for reference genome

Permalink: https://w3id.org/cwl/view/git/7518b100d8cbc80c8be32e9e939dfbb27d6b4361/workflows/allele-vcf-rnaseq-pe.cwl