nf-core/methylseq
Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
2.7.0
). The latest
stable release is
2.7.1
.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Options for saving a variety of intermediate files
Save reference(s) to results directory
boolean
Save aligned intermediates to results directory
boolean
Bismark only - Save unmapped reads to FastQ files
boolean
Use the --unmapped
flag to set the --unmapped
flag with Bismark align and save the unmapped reads to FastQ files.
Save trimmed reads to results directory.
boolean
By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.
Options for the reference genome indices used to align reads.
Name of iGenomes reference.
string
If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38
.
See the nf-core website docs for more details.
Path to FASTA genome file
string
^\S+\.fn?a(sta)?(\.gz)?$
If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible. You can use the command line option --save_reference
to keep the generated references so that they can be added to your config and used again in the future. If aligner is Bismark and bismark_index is specified, this parameter is ignored.
Path to Fasta index file.
string
^\S+\.fn?a(sta)?.fai$
The FASTA index file (.fa.fai
) is only needed when using the bwa_meth aligner. It is used by MethylDackel. If using Bismark this parameter is ignored.
Path to a directory containing a Bismark reference index.
string
bwameth index filename base
string
Directory for a bwa-meth genome reference index. Only used when using the bwa-meth aligner.
Note that this is not a complete path, but the directory containing the reference. For example, if you have file paths such as /path/to/ref/genome.fa.bwameth.c2t.bwt
, you should specify /path/to/ref/
.
Do not load the iGenomes reference config.
boolean
Do not load igenomes.config
when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config
.
The base path to the igenomes reference files
string
s3://ngi-igenomes/igenomes/
Alignment tool to use.
string
The nf-core/methylseq package is actually two pipelines in one. The default workflow uses Bismark with Bowtie2 as alignment tool: unless specified otherwise, nf-core/methylseq will run this pipeline.
Since bismark v0.21.0 it is also possible to use HISAT2 as alignment tool. To run this workflow, invoke the pipeline with the command line flag --aligner bismark_hisat
. HISAT2 also supports splice-aware alignment if analysis of RNA is desired (e.g. SLAMseq experiments), a file containing a list of known splicesites can be provided with --known_splices
.
The second workflow uses BWA-Meth and MethylDackel instead of Bismark. To run this workflow, run the pipeline with the command line flag --aligner bwameth
.
Output information for all cytosine contexts.
boolean
By default, the pipeline only produces data for cytosine methylation states in CpG context. Specifying --comprehensive
makes the pipeline give results for all cytosine contexts. Note that for large genomes (e.g. Human), these can be massive files. This is only recommended for small genomes (especially those that don't exhibit strong CpG context methylation specificity).
If specified, this flag instructs the Bismark methylation extractor to use the --comprehensive
and --merge_non_CpG
flags. This produces coverage files with information from about all strands and cytosine contexts merged into two files - one for CpG context and one for non-CpG context.
If using the bwa-meth workflow, the flag makes MethylDackel report CHG and CHH contexts as well.
Presets for working with specific bisulfite library preparation methods.
Preset for working with PBAT libraries.
boolean
Specify this parameter when working with PBAT (Post Bisulfite Adapter Tagging) libraries.
Using this parameter sets the --pbat
flag when aligning with Bismark. This tells Bismark to align complementary strands (the opposite of --directional
).
Additionally, this is a trimming preset equivalent to --clip_r1 6
--clip_r2 9
--three_prime_clip_r1 6
--three_prime_clip_r2 9
Turn on if dealing with MspI digested material.
boolean
Use this parameter when working with RRBS (Reduced Representation Bisulfite Sequencing) data, that is digested using MspI.
Specifying --rrbs
will pass on the --rrbs
parameter to TrimGalore! See the TrimGalore! documentation to read more about the effects of this option.
This parameter also makes the pipeline skip the deduplication step.
Run bismark in SLAM-seq mode.
boolean
Specify to run Bismark with the --slam
flag to run bismark in SLAM-seq mode
NB: Only works with when using the
bismark_hisat
aligner (--aligner bismark_hisat
)
Preset for EM-seq libraries.
boolean
Equivalent to --clip_r1 10
--clip_r2 10
--three_prime_clip_r1 10
--three_prime_clip_r2 10
.
Also sets the --maxins
flag to 1000
for Bismark.
Trimming preset for single-cell bisulfite libraries.
boolean
Equivalent to --clip_r1 6
--clip_r2 6
--three_prime_clip_r1 6
--three_prime_clip_r2 6
.
Also sets the --non_directional
flag for Bismark.
Trimming preset for the Accel kit.
boolean
Equivalent to --clip_r1 10
--clip_r2 15
--three_prime_clip_r1 10
--three_prime_clip_r2 10
Trimming preset for the CEGX bisulfite kit.
boolean
Equivalent to --clip_r1 6
--clip_r2 6
--three_prime_clip_r1 2
--three_prime_clip_r2 2
Trimming preset for the Epignome kit.
boolean
Equivalent to --clip_r1 8
--clip_r2 8
--three_prime_clip_r1 8
--three_prime_clip_r2 8
Trimming preset for the Zymo kit.
boolean
Equivalent to --clip_r1 10
--clip_r2 10
--three_prime_clip_r1 10
--three_prime_clip_r2 10
.
Also sets the --non_directional
flag for Bismark.
Bisulfite libraries often require additional base pairs to be removed from the ends of the reads before alignment.
Trim bases from the 5' end of read 1 (or single-end reads).
integer
Trim bases from the 5' end of read 2 (paired-end only).
integer
Trim bases from the 3' end of read 1 AFTER adapter/quality trimming.
integer
Trim bases from the 3' end of read 2 AFTER adapter/quality trimming
integer
Trim bases below this quality value from the 3' end of the read, ignoring high-quality G bases
integer
Discard reads that become shorter than INT because of either quality or adapter trimming.
integer
Parameters specific to the Bismark workflow
Run alignment against all four possible strands.
boolean
By default, Bismark assumes that libraries are directional and does not align against complementary strands. If your library prep was not directional, use --non_directional
to align against all four possible strands.
Note that the --single_cell
and --zymo
parameters both set the --non_directional
workflow flag automatically.
Output stranded cytosine report, following Bismark's bismark_methylation_extractor step.
boolean
By default, Bismark does not produce stranded calls. With this option the output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide context and methylation state.
Turn on to relax stringency for alignment (set allowed penalty with --num_mismatches).
boolean
By default, Bismark is pretty strict about which alignments it accepts as valid. If you have good reason to believe that your reads will contain more mismatches than normal, this flags can be used to relax the stringency that Bismark uses when accepting alignments. This can greatly improve the number of aligned reads you get back, but may negatively impact the quality of your data.
Bismark uses the Bowtie alignment scoring mechanism to filter reads. Mismatches cost -6
, gap opening -5
and gap extension -2
. So, a threshold of-60
would allow 10 mismatches or ~ 8 x 1-2bp indels. The threshold is dependent on the length of reads, so a penalty value is used where penalty * bp read length = threshold
.
The penalty value used by Bismark by default is 0.2
, so for 100bp reads this would be a threshold of -20
.
If you specifying the --relax_mismatches
pipeline flag, Bismark instead uses 0.6
, or a threshold of -60
. This adds the Bismark flag --score_min L,0,-0.6
to the alignment command.
The penalty value can be modified using the --num_mismatches
pipeline option.
0.6 will allow a penalty of bp * -0.6 - for 100bp reads (bismark default is 0.2)
number
0.6
Customise the penalty in the function used to filter reads based on mismatches. The parameter --relax_mismatches
must also be specified.
See the parameter documentation for --relax_mismatches
for an explanation.
Specify a minimum read coverage to report a methylation call
integer
Use to discard any methylation calls with less than a given read coverage depth (in fold coverage) during Bismark's bismark_methylation_extractor
step.
Ignore read 2 methylation when it overlaps read 1
boolean
true
For paired-end reads it is theoretically possible that read_1 and read_2 overlap. To avoid scoring overlapping methylation calls twice, this is set to true
by default. (Only methylation calls of read 1 are used since read 1 has historically higher quality basecalls than read 2). Whilst this option removes a bias towards more methylation calls in the center of sequenced fragments it may de facto remove a sizable proportion of the data. To count methylation data from both reads in overlapping regions, set this to false
.
Ignore methylation in first n bases of 5' end of R1
integer
Ignore the first <int> bp from the 5' end of Read 1 (or single-end alignment files) when processing the methylation call string. This can remove e.g. a restriction enzyme site at the start of each read or any other source of bias (such as PBAT-Seq data).
Ignore methylation in first n bases of 5' end of R2
integer
2
Ignore the first <int> bp from the 5' end of Read 2 of paired-end sequencing results only. Since the first couple of bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of end-repairing sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details.
Ignore methylation in last n bases of 3' end of R1
integer
Ignore the first <int> bp from the 5' end of Read 2 of paired-end sequencing results only. Since the first couple of bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of end-repairing sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details.
Ignore methylation in last n bases of 3' end of R2
integer
2
Ignore the last <int> bp from the 3' end of Read 1 (or single-end alignment files) when processing the methylation call string. This can remove unwanted biases from the end of reads.
Supply a .gtf file containing known splice sites (bismark_hisat only).
string
^\S+\.gtf(\.gz)?$
Specify to run Bismark with the --known-splicesite-infile
flag to run splice-aware alignment using HISAT2. A .gtf
file has to be provided from which a list of known splicesites is created by the pipeline
NB: This only works when using the
bismark_hisat
aligner with--align
Allow soft-clipping of reads (potentially useful for single-cell experiments).
boolean
Specify to run Bismark with the --local
flag to allow soft-clipping of reads. This should only be used with care in certain single-cell applications or PBAT libraries, which may produce chimeric read pairs. (See Wu et al.).
The minimum insert size for valid paired-end alignments.
integer
For example, if --minins 60
is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as --maxins
is also satisfied). A 19-bp gap would not be valid in that case.
Default: no flag (Bismark default: 0
).
The maximum insert size for valid paired-end alignments.
integer
For example, if --maxins 100
is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as --minins
is also satisfied). A 61-bp gap would not be valid in that case.
Default: not specified. Bismark default: 500
.
Sample is NOMe-seq or NMT-seq. Runs coverage2cytosine.
boolean
Sets --CX
during methylation extraction and --nome
during the coverage2cytosine
step.
Will also force the coverage2cytosine step to run.
Specify a minimum read coverage for MethylDackel to report a methylation call.
integer
MethylDackel - ignore SAM flags
boolean
Run MethylDackel with the --ignore_flags
option, to ignore SAM flags.
Save files for use with methylKit
boolean
Run MethylDackel with the --methyl_kit
option, to produce files suitable for use with the methylKit R package.
Qualimap configurations
A GFF or BED file containing the target regions which will be passed to Qualimap/Bamqc.
string
^\S+\.gff|\.bed(\.gz)?$
Setting this option could be useful if you want calculate coverage stats over a list of regions, i.e. for targeted methylation sequencing data.
Skip read trimming.
boolean
Skip deduplication step after alignment.
boolean
Deduplication removes PCR duplicate reads after alignment. Specifying this option will skip this step, leaving duplicate reads in your data.
Note that this is turned on automatically if --rrbs
is specified.
Skip MultiQC
boolean
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/methylseq