nf_core.download
Downloads a nf-core pipeline to the local file system.
exception
nf_core.pipelines.download.ContainerError(container, registry, address, absolute_URI, out_path, singularity_command, error_msg)
Bases: Exception
A class of errors related to pulling containers with Singularity/Apptainer
exception
ImageExistsError(error_log)
Bases: FileExistsError
Image already exists in cache/output directory.
exception
ImageNotFoundError(error_log)
Bases: FileNotFoundError
The image can not be found in the registry
exception
InvalidTagError(error_log)
Bases: AttributeError
Image and registry are valid, but the (version) tag is not
exception
OtherError(error_log)
Bases: RuntimeError
Undefined error with the container
exception
RegistryNotFoundError(error_log)
Bases: ConnectionRefusedError
The specified registry does not resolve to a valid IP address
exception
nf_core.pipelines.download.DownloadError
Bases: RuntimeError
A custom exception that is raised when nf-core pipelines download encounters a problem that we already took into consideration. In this case, we do not want to print the traceback, but give the user some concise, helpful feedback instead.
class
nf_core.pipelines.download.DownloadProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)
Bases: Progress
Custom Progress bar class, allowing us to have two progress bars with different columns / layouts.
get_renderables()
Get a number of renderables for the progress display.
class
nf_core.pipelines.download.DownloadWorkflow(pipeline=None, revision=None, outdir=None, compress_type=None, force=False, platform=False, download_configuration=None, additional_tags=None, container_system=None, container_library=None, container_cache_utilisation=None, container_cache_index=None, parallel_downloads=4)
Bases: object
Downloads a nf-core workflow from GitHub to the local file system.
Can also download its Singularity container image if required.
- Parameters:
- pipeline (str) – A nf-core pipeline name.
- revision (List *[*str ]) – The workflow revision(s) to download, like 1.0 or dev . Defaults to None.
- outdir (str) – Path to the local download directory. Defaults to None.
- compress_type (str) – Type of compression for the downloaded files. Defaults to None.
- force (bool) – Flag to force download even if files already exist (overwrite existing files). Defaults to False.
- platform (bool) – Flag to customize the download for Seqera Platform (convert to git bare repo). Defaults to False.
- download_configuration (str) – Download the configuration files from nf-core/configs. Defaults to None.
- tag (List *[*str ]) – Specify additional tags to add to the downloaded pipeline. Defaults to None.
- container_system (str) – The container system to use (e.g., “singularity”). Defaults to None.
- container_library (List *[*str ]) – The container libraries (registries) to use. Defaults to None.
- container_cache_utilisation (str) – If a local or remote cache of already existing container images should be considered. Defaults to None.
- container_cache_index (str) – An index for the remote container cache. Defaults to None.
- parallel_downloads (int) – The number of parallel downloads to use. Defaults to 4.
compress_download() → None
Take the downloaded files and make a compressed .tar.gz archive.
download_configs()
Downloads the centralised config profiles from nf-core/configs to self.outdir
.
download_wf_files(revision, wf_sha, download_url)
Downloads workflow files from GitHub to the self.outdir
.
download_workflow()
Starts a nf-core workflow download.
download_workflow_platform(location=None)
Create a bare-cloned git repository of the workflow, so it can be launched with tw launch as file:/ pipeline
download_workflow_static()
Downloads a nf-core workflow from GitHub to the local file system in a self-contained manner.
find_container_images(workflow_directory: str) → None
Find container image names for workflow.
Starts by using nextflow config to pull out any process.container declarations. This works for DSL1. It should return a simple string with resolved logic, but not always, e.g. not for differentialabundance 1.2.0
Second, we look for DSL2 containers. These can’t be found with nextflow config at the time of writing, so we scrape the pipeline files. This returns raw matches that will likely need to be cleaned.
gather_registries(workflow_directory: str) → None
Fetch the registries from the pipeline config and CLI arguments and store them in a set. This is needed to symlink downloaded container images so Nextflow will find them.
get_revision_hash()
Find specified revision / branch hash
get_singularity_images(current_revision: str = '') → None
Loop through container names and download Singularity images
prioritize_direct_download(container_list: List[str]) → List[str]
Helper function that takes a list of container images (URLs and Docker URIs), eliminates all Docker URIs for which also a URL is contained and returns the cleaned and also deduplicated list.
Conceptually, this works like so:
Everything after the last Slash should be identical, e.g. “scanpy:1.7.2–pyhdfd78af_0” in [‘https://depot.galaxyproject.org/singularity/scanpy:1.7.2–pyhdfd78af\_0’, ‘biocontainers/scanpy:1.7.2–pyhdfd78af_0’]
re.sub(‘.*/(.*)’,’1’,c) will drop everything up to the last slash from c (container_id)
d.get(k:=re.sub(‘.*/(.*)’,’1’,c),’’) assigns the truncated string to k (key) and gets the corresponding value from the dict if present or else defaults to “”.
If the regex pattern matches, the original container_id will be assigned to the dict with the k key. r”^$|(?!^http)” matches an empty string (we didn’t have it in the dict yet and want to keep it in either case) or any string that does not start with http. Because if our current dict value already starts with http, we want to keep it and not replace with with whatever we have now (which might be the Docker URI).
A regex that matches http, r”^$|^http” could thus be used to prioritize the Docker URIs over http Downloads
We also need to handle a special case: The https:// Singularity downloads from Seqera Containers all end in ‘data’, although they are not equivalent, e.g.:
‘https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/63/6397750e9730a3fbcc5b4c43f14bd141c64c723fd7dad80e47921a68a7c3cd21/data’ ‘https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/c2/c262fc09eca59edb5a724080eeceb00fb06396f510aefb229c2d2c6897e63975/data’
prompt_compression_type()
Ask user if we should compress the downloaded files
prompt_config_inclusion()
Prompt for inclusion of institutional configurations
prompt_container_download()
Prompt whether to download container images or not
prompt_pipeline_name()
Prompt for the pipeline name if not set with a flag
prompt_revision() → None
Prompt for pipeline revision / branch Prompt user for revision tag if ‘–revision’ was not set If –platform is specified, allow to select multiple revisions Also the static download allows for multiple revisions, but we do not prompt this option interactively.
prompt_singularity_cachedir_creation()
Prompt about using $NXF_SINGULARITY_CACHEDIR if not already set
prompt_singularity_cachedir_remote()
Prompt about the index of a remote $NXF_SINGULARITY_CACHEDIR
prompt_singularity_cachedir_utilization()
Ask if we should only use $NXF_SINGULARITY_CACHEDIR without copying into target
read_remote_containers()
Reads the file specified as index for the remote Singularity cache dir
rectify_raw_container_matches(raw_findings)
Helper function to rectify the raw extracted container matches into fully qualified container names. If multiple containers are found, any prefixed with http for direct download is prioritized
Example syntax:
Early DSL2:
Later DSL2:
Later DSL2, variable is being used:
DSL1 / Special case DSL2:
singularity_copy_cache_image(container: str, out_path: str, cache_path: str | None) → None
Copy Singularity image from NXF_SINGULARITY_CACHEDIR to target folder.
singularity_download_image(container: str, out_path: str, cache_path: str | None, progress:
DownloadProgress) → None
Download a singularity image from the web.
Use native Python to download the file.
- Parameters:
- container (str) – A pipeline’s container name. Usually it is of similar format
to
https://depot.galaxyproject.org/singularity/name:version
- out_path (str) – The final target output path
- cache_path (str , None) – The NXF_SINGULARITY_CACHEDIR path if set, None if not
- progress (Progress) – Rich progress bar instance to add tasks to.
- container (str) – A pipeline’s container name. Usually it is of similar format
to
singularity_image_filenames(container: str) → Tuple[str, str | None]
Check Singularity cache for image, copy to destination folder if found.
- Parameters: container (str) – A pipeline’s container name. Can be direct download URL or a Docker Hub repository ID.
- Returns: Returns a tuple of (out_path, cache_path). : out_path is the final target output path. it may point to the NXF_SINGULARITY_CACHEDIR, if cache utilisation was set to ‘amend’. If cache utilisation was set to ‘copy’, it will point to the target folder, a subdirectory of the output directory. In the latter case, cache_path may either be None (image is not yet cached locally) or point to the image in the NXF_SINGULARITY_CACHEDIR, so it will not be downloaded from the web again, but directly copied from there. See get_singularity_images() for implementation.
- Return type: tuple (str, str)
singularity_pull_image(container: str, out_path: str, cache_path: str | None, library: List[str], progress:
DownloadProgress) → None
Pull a singularity image using singularity pull
Attempt to use a local installation of singularity to pull the image.
- Parameters:
- container (str) – A pipeline’s container name. Usually it is of similar format
to
nfcore/name:version
. - library (list of str) – A list of libraries to try for pulling the image.
- container (str) – A pipeline’s container name. Usually it is of similar format
to
- Raises: Various exceptions possible from subprocess execution of Singularity. –
symlink_singularity_images(image_out_path: str) → None
Create a symlink for each registry in the registry set that points to the image. We have dropped the explicit registries from the modules in favor of the configurable registries. Unfortunately, Nextflow still expects the registry to be part of the file name, so a symlink is needed.
The base image, e.g. ./nf-core-gatk-4.4.0.0.img will thus be symlinked as for example ./quay.io-nf-core-gatk-4.4.0.0.img by prepending all registries in self.registry_set to the image name.
Unfortunately, out output image name may contain a registry definition (Singularity image pulled from depot.galaxyproject.org or older pipeline version, where the docker registry was part of the image name in the modules). Hence, it must be stripped before to ensure that it is really the base name.
wf_use_local_configs(revision_dirname)
Edit the downloaded nextflow.config file to use the local config files
class
nf_core.pipelines.download.WorkflowRepo(remote_url, revision, commit, additional_tags, location=None, hide_progress=False, in_cache=True)
Bases: SyncedRepo
An object to store details about a locally cached workflow repository.
Important Attributes:
: fullname: The full name of the repository, nf-core/{self.pipelinename}
.
local_repo_dir (str): The local directory, where the workflow is cloned into. Defaults to $HOME/.cache/nf-core/nf-core/{self.pipeline}
.
__add_additional_tags() → None
access()
bare_clone(destination)
checkout(commit)
Checks out the repository at the requested commit
- Parameters: commit (str) – Git SHA of the commit
get_remote_branches(remote_url)
Get all branches from a remote repository
- Parameters: remote_url (str) – The git url to the remote repository
- Returns: All branches found in the remote
- Return type: (set[str])
property
heads
retry_setup_local_repo(skip_confirm=False)
setup_local_repo(remote, location=None, in_cache=True)
Sets up the local git repository. If the repository has been cloned previously, it returns a git.Repo object of that clone. Otherwise it tries to clone the repository from the provided remote URL and returns a git.Repo of the new clone.
- Parameters:
- remote (str) – git url of remote
- location (Path) – location where the clone should be created/cached.
- in_cache (bool , optional) – Whether to clone the repository from the cache. Defaults to False.
Sets self.repo
property
tags
tidy_tags_and_branches()
Function to delete all tags and branches that are not of interest to the downloader. This allows a clutter-free experience in Seqera Platform. The untagged commits are evidently still available.
However, due to local caching, the downloader might also want access to revisions that had been deleted before. In that case, don’t bother with re-adding the tags and rather download anew from Github.