Training Session: Reproducible analysis pipelines in the cloud with NextflowWorkbench

NB: The last session of the academic year is scheduled for Tuesday, June 7. Note that we have greatly simplified installation instructions and reduced computer requirements. Any laptop Mac/Linux with 4GB of memory is sufficient to take the course. The course will demonstrate how to run workflows in the cloud.

The biomedical informatics core of the CTSC is offering the NextflowWorkbench language to help write data analysis pipelines/workflows. NextflowWorkbench takes advantage of the Nextflow middleware and makes it possible for beginners in bioinformatics to quickly assemble efficient, parallel and reproducible workflows.

The developed workflows are portable: they run either on a personal computer, on institutional Linux clusters or on a commercial cloud (new since April 2016).

The training session (2hrs) will provide an introduction to the development of workflows with NextflowWorkbench. For training, we will use a cluster running on the cloud. In this session, trainees will create a workflow useful to analyze RNA-Seq data, including:

1.    Download read files from the Short Read Archive (SRA)

2.    Estimate quality control measurements (with FastQC)

3.    Estimate counts against the human transcriptome (with Kallisto and an Ensembl Transcript sequence database)

4.    Combine these counts into one matrix, a pre-requisite to using these counts for differential expression (e.g., with the MetaR Limma Voom analysis protocol)

See for more details.

NextflowWorkbench is part of the Data Analysis Workbench and is being developed to facilitate data analysis for biomedical scientists with minimal computational skills. The software is fully functional, open-source and provided free of charge.

The software runs as a desktop application with an interactive user interface, on MacOS X (10.8.3+) and Linux (with Java 8).

Training and assistance in the use of the software are offered to investigators who hold an appointment in one of our CTSC institutions (i.e., Weill Cornell, MSKCC, HSS, and Hunter College).

See for software and video tutorials.

Training Sessions:

Users interested in learning how to use the software are encouraged to attend one of the monthly training sessions. Training sessions are held on select Tuesdays at 10:30 AM.

The sessions are limited to a maximum of 10 participants and pre-registration is required. Please use the registration form ( to reserve a seat.


You must have a MacOS or Linux laptop with at least 4GB of memory. No programming or UNIX skills are required. Trainees will be requested to follow the installation instructions to download and install the software on their laptop before attending the training session.

This software is provided by the Biomedical Informatics Core of the Clinical and Translational Science Center and by the Campagne laboratory. Please contact Dr. Fabien Campagne if you have any questions or comments at 646-962-5613.

This entry was posted in Education, Events, Research and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s