Workflow: CLIP-Seq pipeline for single-read experiment NNNNG
Cross-Linking ImmunoPrecipitation ================================= `CLIP` (`cross-linking immunoprecipitation`) is a method used in molecular biology that combines UV cross-linking with immunoprecipitation in order to analyse protein interactions with RNA or to precisely locate RNA modifications (e.g. m6A). (Uhl|Houwaart|Corrado|Wright|Backofen|2017)(Ule|Jensen|Ruggiu|Mele|2003)(Sugimoto|König|Hussain|Zupan|2012)(Zhang|Darnell|2011) (Ke| Alemu| Mertens| Gantman|2015) CLIP-based techniques can be used to map RNA binding protein binding sites or RNA modification sites (Ke| Alemu| Mertens| Gantman|2015)(Ke| Pandya-Jones| Saito| Fak|2017) of interest on a genome-wide scale, thereby increasing the understanding of post-transcriptional regulatory networks. The identification of sites where RNA-binding proteins (RNABPs) interact with target RNAs opens the door to understanding the vast complexity of RNA regulation. UV cross-linking and immunoprecipitation (CLIP) is a transformative technology in which RNAs purified from _in vivo_ cross-linked RNA-protein complexes are sequenced to reveal footprints of RNABP:RNA contacts. CLIP combined with high-throughput sequencing (HITS-CLIP) is a generalizable strategy to produce transcriptome-wide maps of RNA binding with higher accuracy and resolution than standard RNA immunoprecipitation (RIP) profiling or purely computational approaches. The application of CLIP to Argonaute proteins has expanded the utility of this approach to mapping binding sites for microRNAs and other small regulatory RNAs. Finally, recent advances in data analysis take advantage of cross-link–induced mutation sites (CIMS) to refine RNA-binding maps to single-nucleotide resolution. Once IP conditions are established, HITS-CLIP takes ~8 d to prepare RNA for sequencing. Established pipelines for data analysis, including those for CIMS, take 3–4 d. Workflow -------- CLIP begins with the in-vivo cross-linking of RNA-protein complexes using ultraviolet light (UV). Upon UV exposure, covalent bonds are formed between proteins and nucleic acids that are in close proximity. (Darnell|2012) The cross-linked cells are then lysed, and the protein of interest is isolated via immunoprecipitation. In order to allow for sequence specific priming of reverse transcription, RNA adapters are ligated to the 3' ends, while radiolabeled phosphates are transferred to the 5' ends of the RNA fragments. The RNA-protein complexes are then separated from free RNA using gel electrophoresis and membrane transfer. Proteinase K digestion is then performed in order to remove protein from the RNA-protein complexes. This step leaves a peptide at the cross-link site, allowing for the identification of the cross-linked nucleotide. (König| McGlincy| Ule|2012) After ligating RNA linkers to the RNA 5' ends, cDNA is synthesized via RT-PCR. High-throughput sequencing is then used to generate reads containing distinct barcodes that identify the last cDNA nucleotide. Interaction sites can be identified by mapping the reads back to the transcriptome.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
adapter | String (Optional) | Adapter sequence to be trimmed |
Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-detect whether the Illumina universal, Nextera transposase or Illumina small RNA adapter sequence was used. Also see '--illumina', '--nextera' and '--small_rna'. If no adapter can be detected within the first 1 million sequences of the first file specified Trim Galore defaults to '--illumina'. |
species | String (Optional) | Species string for clipper (hg38, mm10) |
species: one of ce10 ce11 dm3 hg19 GRCh38 mm9 mm10 |
threads | Integer (Optional) | Number of threads |
Number of threads for those steps that support multithreading |
bc_pattern | String (Optional) | Barcode pattern | |
fastq_file | File [FASTQ] | FASTQ input file |
Reads data in a FASTQ format, received after single end sequencing |
clip_3p_end | Integer (Optional) | Clip from 3p end |
Number of bases to clip from the 3p end |
clip_5p_end | Integer (Optional) | Clip from 5p end |
Number of bases to clip from the 5p end |
exclude_chr | String (Optional) | Chromosome to be excluded in rpkm calculation |
Chromosome to be excluded in rpkm calculation |
extract_method | UMI extract method 'string' or 'regex' |
How to extract the umi +/- cell barcodes, Choose from 'string' or 'regex' |
|
annotation_file | File [TSV] | Annotation file |
Tab-separated annotation file |
chrom_length_file | File [Textual format] | Chromosomes length file |
Chromosomes length file |
star_indices_folder | Directory | STAR indices folder |
Path to STAR generated indices |
bowtie_indices_folder | Directory | BowTie Ribosomal Indices |
Path to Bowtie generated indices |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
clipper |
../tools/clipper.cwl
(CommandLineTool)
|
CLIPper is a tool to define peaks in your CLIP-seq dataset. CLIPper was developed in the Yeo Lab at the University of California, San Diego. Usage: clipper --bam CLIP-seq_reads.srt.bam --species hg19 --outfile CLIP-seq_reads.srt.peaks.bed |
|
bamtobed |
../tools/bedtools-bamtobed.cwl
(CommandLineTool)
|
Converts BAM to BED. All Options are not implemented. |
|
dedup_umi |
../tools/umi-tools-dedup.cwl
(CommandLineTool)
|
Deduplicate BAM files based on the first mapping co-ordinate and the UMI attached to the read Only -I, --paired and -S parameters are implemented. |
|
tagstopeak |
../tools/clip-toolkit-tag2peak.cwl
(CommandLineTool)
|
detecting peaks from CLIP data Usage: tag2peak.pl [options] <tag.bed> <peak.bed> <tag.bed> : BED file of unique CLIP tags, input <peak.bed>: BED file of called peaks, output Options: -big : big input file -ss : separate the two strands --valley-seeking : find candidate peaks by valley seeking --valley-depth [float] : depth of valley if valley seeking (0.9) --out-boundary [string]: output cluster boundaries --out-half-PH [string]: output half peak height boundaries --dbkey [string]: species to retrieve the default gene bed file (mm10|hg19) --gene [string]: custom gene bed file for scan statistics (will override --dbkey) --use-expr : use expression levels given in the score column in the custom gene bed file for normalization -p [float] : threshold of p-value to call peak (0.01) --multi-test : do Bonferroni multiple test correction -minPH [int] : min peak height (2) -maxPH [int] : max peak height to calculate p-value(-1, no limit if < 0) --skip-out-of-range-peaks: skip peaks with PH > maxPH -gap [int] : merge cluster peaks closer than the gap (-1, no merge if < 0) --prefix [string]: prefix of peak id (Peak) -c [dir] : cache dir --keep-cache : keep cache when the job done -v : verbose |
|
trim_fastq |
../tools/trimgalore.cwl
(CommandLineTool)
|
Tool runs Trimgalore - the wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming
to FastQ files. |
|
extract_umi |
../tools/umi-tools-extract.cwl
(CommandLineTool)
|
Extract UMI barcode from a read and add it to the read name, leaving any sample barcode in place. Can deal with paired end reads and UMIs split across the paired ends. Can also optionally extract cell barcodes and append these to the read name also. See the section below for an explanation for how to encode the barcode pattern(s) to specficy the position of the UMI +/- cell barcode. |
|
star_aligner |
../tools/star-alignreads.cwl
(CommandLineTool)
|
Tool runs STAR alignReads. |
|
bam_to_bigwig |
../tools/bam-bedgraph-bigwig.cwl
(Workflow)
|
Workflow converts input BAM file into bigWig and bedGraph files. |
|
extract_fastq |
../tools/extract-fastq.cwl
(CommandLineTool)
|
Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all |
|
island_intersect |
../tools/iaintersect.cwl
(CommandLineTool)
|
Tool assigns each peak obtained from MACS2 to a gene and region (upstream, promoter, exon, intron, intergenic) |
|
samtools_sort_index1 |
../tools/samtools-sort-index.cwl
(CommandLineTool)
|
Tool to sort and index input BAM/SAM/CRAM.
If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and
`samtools index`, return sorted BAM and BAI/CSI index file.
If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in
`secondaryFiles`) files, previously staged into output directory. |
|
samtools_sort_index2 |
../tools/samtools-sort-index.cwl
(CommandLineTool)
|
Tool to sort and index input BAM/SAM/CRAM.
If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and
`samtools index`, return sorted BAM and BAI/CSI index file.
If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in
`secondaryFiles`) files, previously staged into output directory. |
|
ribosomal_bowtie_aligner |
../tools/bowtie-alignreads.cwl
(CommandLineTool)
|
Tool maps input raw reads files to reference genome using Bowtie. |
|
fastx_quality_stats_after |
../tools/fastx-quality-stats.cwl
(CommandLineTool)
|
Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension. |
|
stats_and_transformations |
clipseq-se.cwl#stats_and_transformations/25a21e16-f9ef-4e9f-92cb-f1d3383c2dcd
(CommandLineTool)
|
||
tagstopeak_transformations |
clipseq-se.cwl#tagstopeak_transformations/a9bd962c-4370-4f8b-a2a2-29bb70cf51f0
(CommandLineTool)
|
||
fastx_quality_stats_original |
../tools/fastx-quality-stats.cwl
(CommandLineTool)
|
Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension. |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
bigwig | File [bigWig] | BigWig file |
Generated BigWig file |
dedup_log | File | deduped CLIP log file |
deduped CLIP log file |
error_log | File | clipped error log file |
clipped error log file |
peaks_bed | File | ||
output_bed | File | ||
atdp_result | File [TSV] | Fake ATDP results for BioWardrobe |
Average Tag Density generated results |
bambai_pair | File [BAM] | Deduped BAM alignment file |
Coordinate sorted BAM file and BAI index file (+index BAI) |
clipper_bed | File | ||
extract_log | File | clipped extract log file |
clipped extract log file |
star_sj_log | File (Optional) [Textual format] | STAR sj log |
STAR SJ.out.tab |
trim_report | File [Textual format] | trimm report |
TrimGalore generated log |
dedup_output | File | deduped CLIP file | |
get_stat_log | File (Optional) [Textual format] | Old Bowtie, STAR and GEEP combined log |
Processed and combined Bowtie & STAR aligner and GEEP logs |
star_out_log | File (Optional) [Textual format] | STAR log out |
STAR Log.out |
clipper_pickle | File | ||
star_final_log | File [Textual format] | STAR final log |
STAR Log.final.out |
dedup_error_log | File | deduped CLIP error log file |
deduped CLIP error log file |
star_stdout_log | File (Optional) [Textual format] | STAR stdout log |
STAR Log.std.out |
star_progress_log | File (Optional) [Textual format] | STAR progress log |
STAR Log.progress.out |
transformed_peaks | File [TSV] | Transformed peaks Mimics MACS2 | |
iaintersect_result | File [TSV] | Island intersect results |
Iaintersect generated results |
get_formatted_stats | File (Optional) [Textual format] | Bowtie, STAR and GEEP mapping stats |
Processed and combined Bowtie & STAR aligner and GEEP logs |
rebosomal_bowtie_log | File [Textual format] | Bowtie alignment log |
Bowtie alignment log file |
fastx_statistics_after | File [Textual format] | FASTQ statistics |
fastx_quality_stats generated FASTQ file quality statistics file |
fastx_statistics_original | File [Textual format] | FASTQ statistics |
fastx_quality_stats generated FASTQ file quality statistics file |
https://w3id.org/cwl/view/git/7fb8a1ebf8145791440bc2fed9c5f2d78a19d04c/workflows/clipseq-se.cwl