Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required
type: string
pattern: ^\S+\.csv$

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required
type: string

Email address for completion summary.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Options for saving a variety of intermediate files

Save reference(s) to results directory

type: boolean

Save aligned intermediates to results directory

type: boolean

Bismark only - Save unmapped reads to FastQ files

type: boolean

Use the --unmapped flag to set the --unmapped flag with Bismark align and save the unmapped reads to FastQ files.

Save trimmed reads to results directory.

type: boolean

By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.

Options for the reference genome indices used to align reads.

Name of iGenomes reference.

type: string

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Path to FASTA genome file

type: string
pattern: ^\S+\.fn?a(sta)?(\.gz)?$

If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible. You can use the command line option --save_reference to keep the generated references so that they can be added to your config and used again in the future. If aligner is Bismark and bismark_index is specified, this parameter is ignored.

Path to Fasta index file.

type: string
pattern: ^\S+\.fn?a(sta)?.fai$

The FASTA index file (.fa.fai) is only needed when using the bwa_meth aligner. It is used by MethylDackel. If using Bismark this parameter is ignored.

Path to a directory containing a Bismark reference index.

type: string

bwameth index filename base

type: string

Directory for a bwa-meth genome reference index. Only used when using the bwa-meth aligner.

Note that this is not a complete path, but the directory containing the reference. For example, if you have file paths such as /path/to/ref/genome.fa.bwameth.c2t.bwt, you should specify /path/to/ref/.

Do not load the iGenomes reference config.

hidden
type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

The base path to the igenomes reference files

hidden
type: string
default: s3://ngi-igenomes/igenomes/

Alignment tool to use.

required
type: string

The nf-core/methylseq package is actually two pipelines in one. The default workflow uses Bismark with Bowtie2 as alignment tool: unless specified otherwise, nf-core/methylseq will run this pipeline.

Since bismark v0.21.0 it is also possible to use HISAT2 as alignment tool. To run this workflow, invoke the pipeline with the command line flag --aligner bismark_hisat. HISAT2 also supports splice-aware alignment if analysis of RNA is desired (e.g. SLAMseq experiments), a file containing a list of known splicesites can be provided with --known_splices.

The second workflow uses BWA-Meth and MethylDackel instead of Bismark. To run this workflow, run the pipeline with the command line flag --aligner bwameth.

Output information for all cytosine contexts.

type: boolean

By default, the pipeline only produces data for cytosine methylation states in CpG context. Specifying --comprehensive makes the pipeline give results for all cytosine contexts. Note that for large genomes (e.g. Human), these can be massive files. This is only recommended for small genomes (especially those that don't exhibit strong CpG context methylation specificity).

If specified, this flag instructs the Bismark methylation extractor to use the --comprehensive and --merge_non_CpG flags. This produces coverage files with information from about all strands and cytosine contexts merged into two files - one for CpG context and one for non-CpG context.

If using the bwa-meth workflow, the flag makes MethylDackel report CHG and CHH contexts as well.

Presets for working with specific bisulfite library preparation methods.

Preset for working with PBAT libraries.

type: boolean

Specify this parameter when working with PBAT (Post Bisulfite Adapter Tagging) libraries.

Using this parameter sets the --pbat flag when aligning with Bismark. This tells Bismark to align complementary strands (the opposite of --directional).

Additionally, this is a trimming preset equivalent to --clip_r1 6 --clip_r2 9 --three_prime_clip_r1 6 --three_prime_clip_r2 9

Turn on if dealing with MspI digested material.

type: boolean

Use this parameter when working with RRBS (Reduced Representation Bisulfite Sequencing) data, that is digested using MspI.

Specifying --rrbs will pass on the --rrbs parameter to TrimGalore! See the TrimGalore! documentation to read more about the effects of this option.

This parameter also makes the pipeline skip the deduplication step.

Run bismark in SLAM-seq mode.

type: boolean

Specify to run Bismark with the --slam flag to run bismark in SLAM-seq mode

NB: Only works with when using the bismark_hisat aligner (--aligner bismark_hisat)

Preset for EM-seq libraries.

type: boolean

Equivalent to --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 10 --three_prime_clip_r2 10.

Also sets the --maxins flag to 1000 for Bismark.

Trimming preset for single-cell bisulfite libraries.

type: boolean

Equivalent to --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 6 --three_prime_clip_r2 6.

Also sets the --non_directional flag for Bismark.

Trimming preset for the Accel kit.

type: boolean

Equivalent to --clip_r1 10 --clip_r2 15 --three_prime_clip_r1 10 --three_prime_clip_r2 10

Trimming preset for the CEGX bisulfite kit.

type: boolean

Equivalent to --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 2 --three_prime_clip_r2 2

Trimming preset for the Epignome kit.

type: boolean

Equivalent to --clip_r1 8 --clip_r2 8 --three_prime_clip_r1 8 --three_prime_clip_r2 8

Trimming preset for the Zymo kit.

type: boolean

Equivalent to --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 10 --three_prime_clip_r2 10.

Also sets the --non_directional flag for Bismark.

Bisulfite libraries often require additional base pairs to be removed from the ends of the reads before alignment.

Trim bases from the 5' end of read 1 (or single-end reads).

type: integer

Trim bases from the 5' end of read 2 (paired-end only).

type: integer

Trim bases from the 3' end of read 1 AFTER adapter/quality trimming.

type: integer

Trim bases from the 3' end of read 2 AFTER adapter/quality trimming

type: integer

Trim bases below this quality value from the 3' end of the read, ignoring high-quality G bases

type: integer

Discard reads that become shorter than INT because of either quality or adapter trimming.

type: integer

Parameters specific to the Bismark workflow

Run alignment against all four possible strands.

type: boolean

By default, Bismark assumes that libraries are directional and does not align against complementary strands. If your library prep was not directional, use --non_directional to align against all four possible strands.

Note that the --single_cell and --zymo parameters both set the --non_directional workflow flag automatically.

Output stranded cytosine report, following Bismark's bismark_methylation_extractor step.

type: boolean

By default, Bismark does not produce stranded calls. With this option the output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide context and methylation state.

Turn on to relax stringency for alignment (set allowed penalty with --num_mismatches).

type: boolean

By default, Bismark is pretty strict about which alignments it accepts as valid. If you have good reason to believe that your reads will contain more mismatches than normal, this flags can be used to relax the stringency that Bismark uses when accepting alignments. This can greatly improve the number of aligned reads you get back, but may negatively impact the quality of your data.

Bismark uses the Bowtie alignment scoring mechanism to filter reads. Mismatches cost -6, gap opening -5 and gap extension -2. So, a threshold of-60 would allow 10 mismatches or ~ 8 x 1-2bp indels. The threshold is dependent on the length of reads, so a penalty value is used where penalty * bp read length = threshold.

The penalty value used by Bismark by default is 0.2, so for 100bp reads this would be a threshold of -20.

If you specifying the --relax_mismatches pipeline flag, Bismark instead uses 0.6, or a threshold of -60. This adds the Bismark flag --score_min L,0,-0.6 to the alignment command.

The penalty value can be modified using the --num_mismatches pipeline option.

0.6 will allow a penalty of bp * -0.6 - for 100bp reads (bismark default is 0.2)

type: number
default: 0.6

Customise the penalty in the function used to filter reads based on mismatches. The parameter --relax_mismatches must also be specified.

See the parameter documentation for --relax_mismatches for an explanation.

Specify a minimum read coverage to report a methylation call

type: integer

Use to discard any methylation calls with less than a given read coverage depth (in fold coverage) during Bismark's bismark_methylation_extractor step.

Ignore read 2 methylation when it overlaps read 1

type: boolean
default: true

For paired-end reads it is theoretically possible that read_1 and read_2 overlap. To avoid scoring overlapping methylation calls twice, this is set to true by default. (Only methylation calls of read 1 are used since read 1 has historically higher quality basecalls than read 2). Whilst this option removes a bias towards more methylation calls in the center of sequenced fragments it may de facto remove a sizable proportion of the data. To count methylation data from both reads in overlapping regions, set this to false.

Ignore methylation in first n bases of 5' end of R1

type: integer

Ignore the first <int> bp from the 5' end of Read 1 (or single-end alignment files) when processing the methylation call string. This can remove e.g. a restriction enzyme site at the start of each read or any other source of bias (such as PBAT-Seq data).

Ignore methylation in first n bases of 5' end of R2

type: integer
default: 2

Ignore the first <int> bp from the 5' end of Read 2 of paired-end sequencing results only. Since the first couple of bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of end-repairing sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details.

Ignore methylation in last n bases of 3' end of R1

type: integer

Ignore the first <int> bp from the 5' end of Read 2 of paired-end sequencing results only. Since the first couple of bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of end-repairing sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details.

Ignore methylation in last n bases of 3' end of R2

type: integer
default: 2

Ignore the last <int> bp from the 3' end of Read 1 (or single-end alignment files) when processing the methylation call string. This can remove unwanted biases from the end of reads.

Supply a .gtf file containing known splice sites (bismark_hisat only).

type: string
pattern: ^\S+\.gtf(\.gz)?$

Specify to run Bismark with the --known-splicesite-infile flag to run splice-aware alignment using HISAT2. A .gtf file has to be provided from which a list of known splicesites is created by the pipeline

NB: This only works when using the bismark_hisat aligner with --align

Allow soft-clipping of reads (potentially useful for single-cell experiments).

type: boolean

Specify to run Bismark with the --local flag to allow soft-clipping of reads. This should only be used with care in certain single-cell applications or PBAT libraries, which may produce chimeric read pairs. (See Wu et al.).

The minimum insert size for valid paired-end alignments.

type: integer

For example, if --minins 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as --maxins is also satisfied). A 19-bp gap would not be valid in that case.

Default: no flag (Bismark default: 0).

The maximum insert size for valid paired-end alignments.

type: integer

For example, if --maxins 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as --minins is also satisfied). A 61-bp gap would not be valid in that case.

Default: not specified. Bismark default: 500.

Sample is NOMe-seq or NMT-seq. Runs coverage2cytosine.

type: boolean

Sets --CX during methylation extraction and --nome during the coverage2cytosine step.

Will also force the coverage2cytosine step to run.

Specify a minimum read coverage for MethylDackel to report a methylation call.

type: integer

MethylDackel - ignore SAM flags

type: boolean

Run MethylDackel with the --ignore_flags option, to ignore SAM flags.

Save files for use with methylKit

type: boolean

Run MethylDackel with the --methyl_kit option, to produce files suitable for use with the methylKit R package.

Qualimap configurations

A GFF or BED file containing the target regions which will be passed to Qualimap/Bamqc.

type: string
pattern: ^\S+\.gff|\.bed(\.gz)?$

Setting this option could be useful if you want calculate coverage stats over a list of regions, i.e. for targeted methylation sequencing data.

Skip read trimming.

type: boolean

Skip deduplication step after alignment.

type: boolean

Deduplication removes PCR duplicate reads after alignment. Specifying this option will skip this step, leaving duplicate reads in your data.

Note that this is turned on automatically if --rrbs is specified.

Skip MultiQC

type: boolean

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Institutional config URL link.

hidden
type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

hidden
type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden
type: string
default: 25.MB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden
type: boolean

Incoming hook URL for messaging service

hidden
type: string

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

Custom config file to supply to MultiQC.

hidden
type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden
type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Base URL or local path to location of pipeline test dataset files

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/methylseq