Specification list

On this page, are described BioFlow-Insight’s Functionalities and Guidelines.


Table of Contents


BioFlow-Insight’s Functionalities


Workflow structure reconstruction

Description

Nextflow workflows are based on the dataflow programming model, wherein processes, encapsulating specific bioinformatics tasks using scripts or tools, communicate through channels—either non-blocking unidirectional FIFO queues or single values. Nextflow operators are methods that allow users to manipulate channels, such as filtering, forking, or maths operators. For more information on Nextflow workflows, please refer to Nextflow’s official documentation. Additionally, multiple operators can be associated; we define this association as forming a single operation.

Multiple representations of the workflow are generated by BioFlow-Insight representing it at different levels of granularity.

Here are the 3 structures generated by BioFlow-Insight from the https://github.com/George-Marchment/hackathon workflow.

The specification graph on the left, the dependency graph in the middle and the process dependency graph on the right.

Labelled and unlabeled versions of the graphs are also available. With variants where orphan operations (operations without any inputs or outputs) are represented or removed.

For large workflows, it can be useful to represent only a subset of its processes, by excluding certain ones. This is why BioFlow-Insight provides the option to remove a list of specified processes from the representations. By default, this list is empty, meaning that all processes are represented.

BioFlow-Insight can analyse both DSL1 and DSL2 Nextflow workflows.

Examples

To test BioFlow-Insight’s structure reconstruction, try these example workflows with the 'Submit from a git repository' functionality (add link):



Workflow error detection and handling

BioFlow-Insight analyses Nextflow workflows and code. It is by no means a Nextflow "checker" or "validator". Therefore, while BioFlow-Insight can analyse a workflow, it does not guarantee its functionality during execution. However, it serves as a helpful tool for Nextflow workflow developers to identify some errors in their workflow’s code.

When BioFlow-Insight fails to analyse and generate structures, this can occur for two reasons.

Both of these cases are elaborated upon below.


Errors or ambiguities detected in the workflow’s code

Description

Here is an extensive list of errors and ambiguities in a Nextflow workflow detected by BioFlow-Insight:

Examples

To test BioFlow-Insight’s error detection, try these example workflows with the 'Submit from a git repository' functionality:


BioFlow-Insight’s limited scope

Due to Nextflow's highly flexible workflow definition, there are some limited cases where BioFlow-Insight cannot analyse the workflow’s code. Below is a list of these cases:

Important: If BioFlow-Insight fails to analyse a workflow due to its limited scope, it is easy to rewrite the workflow in a different way to enable successful analysis. When possible BioFlow-Insight also specifies the reason it failed.



Metadata extraction

After the extraction of the graphs, BioFlow-Insight analyses each graph and extracts a certain amount of metadata. Below are the attributes that are calculated:

These metadata are saved in dedicated JSON files.

To obtain the rooted graph, we add 2 nodes: the source and the sink. The source is connected to all nodes which do not have any incoming edges (\(indegree=0\)). The sink is linked to all nodes that do not have any outgoing edges (\(outdegree=0\)).



RO-Crate generation

BioFlow-Insight generates a description of the workflow in the RO-Crate format. RO-Crate serves as a standard for aggregating and describing research data, including associated metadata for workflows and scripts. However, the current framework of RO-Crate does not yet fully accommodate Nextflow workflows. For instance, in the current RO-Crate format one subworkflow is equal to one file. To address this limitation, BioFlow-Insight extends the RO-Crate framework. This extension also enables a comprehensive description of Snakemake workflows. For a description of this extended profile, check-out its description which can be found here: https://gitlab.liris.cnrs.fr/sharefair/posters/swat4hcls-2024.

When analysing from a workflow from GitHub repository, relevant metadata such as authors, keywords and the last update are automatically extracted.





BioFlow-Insight’s Guidelines

Due to the highly flexible nature of Nextflow's workflow definition, BioFlow-Insight may not handle certain code syntaxes. This section provides guidelines for defining your workflow to ensure it can be effectively analysed by BioFlow-Insight. Additionally, for cases not handled by BioFlow-Insight, easy alternatives are provided.

For full information on how to define a Nextflow workflow, please refer to the Nextflow documentation. It is recommended to follow the recommended syntaxe given by Nextflow.

General guidelines:

Below Syntaxe and Functional guidelines are provided.


Syntaxe Guidelines


Functional Guidelines