Nextflow¶
Operating System:
Terminal:
Shell:
Editor:
Package Manager:
Programming Language:
Database:
Job Scheduler:
Utility:
Extension:
Operating System:
Terminal:
Shell:
Editor:
Package Manager:
Programming Language:
Utility:
Extension:
Operating System:
Terminal:
Shell:
Editor:
Package Manager:
Programming Language:
Utility:
Extension:
Operating System:
Terminal:
Shell:
Editor:
Package Manager:
Programming Language:
Utility:
Extension:
Nextflow is a bioinformatics workflow management system that enables the development of scalable and reproducible scientific workflows. It supports deploying workflows on a variety of execution platforms. It allows the adaptation of pipelines written in the most common scripting languages.
For more information check here and here.
Select Input Parameters¶
To run a pipeline the user must set two parameters:
Input folder: It mounts the folder containing source code and input files.
Pipeline script: It selects the Nextflow pipeline script, which is a
.nf
file containing the workflow instructions.
Initialization¶
For information on how to use the Initialization parameter, please refer to the Initialization: Bash Script, Initialization: Conda Packages, and Initialization: PyPI Packages section of the documentation.
Create a Conda environment¶
The user can also install the required software dependencies via Conda by specifying the packages or the path(s) to the configuration YAML file(s) directly in the pipeline script. In this case, the user must use the option -with-conda
.
Configure SSH Access¶
The app provides optional support for SSH access from an external client. An SSH public key must be uploaded using the corresponding panel in Resources section of the UCloud side menu.
By checking on Enable SSH server a random port is opened for connection. The connection command is shown in the job progress view page.
Import a Configuration File¶
The parameter Configuration is used to upload a Nextflow configuration file. The latter is a simple text file containing a set of properties defined using the syntax:
name = value
More information about configuration settings in Nextflow can be found in the official documentation.
Using Slurm¶
In multi-node Nextflow jobs on UCloud, Slurm is used as the default executor. It is configured in a dedicated configuration profile which is automatically updated to make full use of the resources allocated to the UCloud job.
A Slurm cluster is started by default in Nextflow jobs. If this cluster is not needed, the user can prevent it from starting by setting the optional parameter Start Slurm cluster to false
.
If a Slurm cluster is started, but not used, it will remain idle throughout the job.
Note
The Slurm configuration profile is not used in single-node jobs. Instead, the local executor is used unless otherwise specified by the user. Therefore, users can always set the optional parameter Start Slurm cluster to false
in single-node jobs.
Slurm configuration profile¶
The Slurm configuration profile is specified with the following information:
cpus
: Set to the total number of logical CPUs (per node) for the chosen machine type.memory
: Set to the total memory (per node) in the chosen machine type.time
: Set effectively to infinity (99999h).queue
: Set toCLOUD
, which is the default Slurm partition.
Furthermore, the Slurm configuration profile contains the following two clusterOptions
:
--nodes=1-<num-nodes>
where<num-nodes>
is the number of nodes allocated to the UCloud job.--gres=gpu:<gpu-type>:<num-gpu>
where<gpu-type>
is the type of GPU (e.g.,h100
), and<num-gpu>
is the number of GPUs (per node) allocated to the UCloud job. The option is only included if the UCloud job runs on a GPU machine type.
Adding cluster options¶
The user can customize the Slurm configuration profile with any native configuration option. This can be done using the Cluster options parameter.
Options passed via this parameter are appended to the pre-defined clusterOptions
in the Slurm configuration profile (see above).
The user must specify the additional cluster options as one string where each option is separated by a single whitespace:
--option1=value1 --option2=value2 --option3=value3
In case of duplicate options, Slurm only uses the value from the last occurrence of the given option. This means that the user can override the default clusterOptions
(i.e., --nodes
and --gres
) by adding these options with the desired values using the Cluster options optional parameter.
Warning
Users should not use the Cluster options parameter to override cpus
, memory
, time
, or queue
since doing so can lead to undefined behavior. Use the Configuration optional parameter for this instead.
Interactive Mode¶
The Interactive mode parameter is used to start an interactive job session where the user can open a terminal window from the job progress page and execute shell commands.
Contents