GCB 2024 - Workshops

Workshops

will take place on Monday, 30th September 2024

Overview:

WS1) Bioinformatic metagenome and metaproteome analysis for improved microbiome understanding

WS2) Computational analysis of circular RNAs (circRNAs) from RNA sequencing data using circtools

WS3) Computational Pangenomics Workshop

WS4) Learn the essentials of research data management and data management plans

WS5) Just-in-Time Compiled Python for Bioinformatics Research

WS6) Bioinformatics education

WS7) Standardizing and harmonizing NGS analysis workflows

Detailed Workshop Programme:

WS1) Bioinformatic metagenome and metaproteome analysis for improved microbiome understanding

Organizers: Robert Heyer (ISAS, Bielefeld University); Alexander Sczyrba (Forschungszentrum Jülich, Bielefeld University): Kay Schallert (ISAS); Emanuel Lang (ISAS)

Participants: max. 30

Description: Understanding the taxonomic and functional makeup of microbiomes and their activity is crucial for comprehending various diseases like inflammatory bowel disease, environmental processes such as soil dynamics, and biotechnological applications like biogas production. This understanding can be achieved through the analysis of microbial genes (metagenomics), transcripts (metatranscriptomics), proteins (metaproteomics), or metabolites (metabolomics). Researchers, in addition to experimental expertise, require bioinformatics skills to analyze and integrate data pertaining to these microbial features.

This workshop aims to illustrate a combined bioinformatics workflow for whole-genome sequencing [1] and metaproteomics analysis [2,3] using a microbiome as an example. Additionally, we will demonstrate how to map omics features to metabolomics pathways using the MPA_Pathway_Tool [4] and conduct flux balance analysis.

Provisional schedule:
● Introduction to microbiomes
● Introduction to metagenome data analysis
● Introduction to metaproteome data analysis
● HandsOn: Bioinformatics analysis with the MetaProteomeAnalyzer

WS2) Computational analysis of circular RNAs (circRNAs) from RNA sequencing data using circtools

Organizers: Christoph Dieterich (University Hospital Heidelberg, Department of Internal Medicine III); Tobias Jakobi (University of Arizona, College of Medicine - Phoenix);
Shubhada Kulkarni (University Hospital Heidelberg, Department of Internal Medicine III)

Participants: max. 15 ● Participants are required to bring a laptop, Linux or macOS-based

Description: Circular RNAs (circRNAs) are types of RNA molecules that have been discovered relatively recently and have been found to be widely expressed in eukaryotic cells. Unlike canonical linear RNA molecules, circRNAs form a covalently closed continuous loop structure without a 5′ or 3′ end. They are generated by a process called back-splicing, in which a downstream splice donor site is joined to an upstream splice acceptor site. CircRNAs have been found to play important roles in various biological processes, including gene regulation, alternative splicing, and protein translation and have been shown to be involved in different diseases, including cancer, neurological, and cardiovascular disease.

This workshop covers the biological background of circRNAs and general computational approaches to detect circRNAs from sequencing data. The hands-on part targets the computational detection and analysis of circular RNAs (circRNAs) from RNA-sequencing data employing our circtools software suite.

Provisional schedule:
● Introduction to circular RNAs & computational circRNA analysis
● Preparation of data for circRNA detection
● CircRNA detection using circtools
● Analysis of results and design of follow-up experiments

WS3) Computational Pangenomics Workshop

Organizers: Tizian Schulz; Jens Stoye; Andreas Rempel; Luca Parmigiani; Roland Wittler (Bielefeld University)

Participants: max. 20 ● Participants are required to bring a laptop

Description: Computational pangenomics deals with the joint analysis of all genomic sequences of a species. Further advances in DNA sequencing technologies constantly let more and more genomic sequences become available for many species, leading to an increasing attractiveness of pangenomic studies. Pangenomics approaches have already been successfully applied to various tasks in many research areas.

The focus of this workshop is to give participants an overview and understanding of commonly used pangenomics tools. Besides an introduction into the motivation and theory behind questions from the field of pangenomics, we will look at specific tools (such as panacus, Corer, PLAST, and SANS) and let the participants explore their usage in hands-on sessions. Interested participants should have a basic understanding of Linux operating systems to participate in hands-on sessions of the workshop.

Provisional schedule:
● Introduction to computational pangenomics
● Investigating a pangenome's diversity with panacus and hands-on
● Pangenomic core detection with Corer and hands-on
● Querying a graphical pangenome with PLAST and hands-on
● Phylogenomic reconstruction with SANS and hands-on

WS4) Learn the essentials of research data management and data management plans

Organizers: Helena Schnitzer (Forschungszentrum Jülich GmbH); Daniel Wibberg (Forschungszentrum Jülich GmbH)

Participants: max. 15

Description: Have you ever wondered what research data management really is?
Why is it so important? Then ELIXIR-DE has created the perfect training for you!

More and more funders require the establishment of research data management plans to distribute their grants. Over the course of just three hours, you get the chance to learn about how to transfer research project proposals into proper Data Management Plans (DMPs).
Data management experts will guide you through the basics of research data management, the dos and don’ts and how to improve the management of the data produced in research projects. After a short introduction of the research data life cycle and the FAIR data principles, we will explore in multiple hands-on sessions what a data management plan (DMP) is.
We will sink our teeth into components, language, software and examples of DMPs.
In light of the FAIR principles, we will evaluate possible problems and solutions and self-assess a drafted DMP for our own projects.

Provisional schedule:
● Know the research life cycle
● Know the FAIR principles and understand their importance
● Know the expectations for a DMP
● Be able to sketch a DMP

WS5) Just-in-Time Compiled Python for Bioinformatics Research

Organizers: Sven Rahmann; Johanna Schmitz; Jens Zentgraf (Universität des Saarlandes, Saarbrücken)

Participants: max. 20 ● participants must bring their own laptops

Description: Python has a reputation for being a clean and easy-to-learn language, but slow when it comes to execution, and difficult concerning multi-threaded execution. Nonetheless, it is one of the most popular languages in science, including bioinformatics, because for many tasks, efficient libraries exist, and Python acts as a glue language. In this workshop, we explore hands-on how to write efficient multi-threaded applications in Python using the numba just-in-time compiler. In this way, we can use Python’s flexibility and the existing packages to handle high-level functionality (e.g., design the user interface, run machine learning models), and then use compiled Python for additional custom compute-heavy tasks; these parts can even run in parallel. We use a small but still interesting and relevant problem as an example: efficient search for bipartite DNA motifs. We develop an efficient tool that outputs every match in a reference genome in a matter of seconds. Starting with an introduction to the problem and a (slow) pure Python implementation, we discuss how to write more jit-compiler-friendly code, transition towards a compiled version and observe speed increases until we obtain C-like speed. We parallelize the tool to make it even faster, and add more options for more flexible searching.

Provisional schedule:
09:00 - 10:00
Introduction to the numba just-in-time compiler for Python; small examples, possibilities, limitations, how compilation works 30 minutes are short hands-on exercises (timing iterated execution of a small function in pure vs. compiled Python).

10:00 - 11:00
Introduction to DNA motif search and a “motif description” mini-language, with examples from the literature. Automaton-based pattern search and a bit-parallel algorithm. Hands-on implementation in pure Python (30 min, 15-20 lines).

11:00 – 11:30 Coffee break

11:30 – 12:30
Transforming a Python implementation to a numba-compiled implementation separation of high-level and low-level code parts; managing memory allocations; introduction of type annotations (30 min principles, 30 min supervised coding).

WS6) Bioinformatics education

Organizers: Jan Grau (MLU Halle); Stefan Kurtz (Universität Hamburg); Kay Nieselt (Eberhard Karls Universität Tübingen; Sven Rahmann (Universität des Saarlandes, Saarbrücken); Ralf Zimmer (Ludwig-Maximilians-Universität München)

Participants: max. 30

Description: This workshop shall bring together people involved in bioinformatics education. Currently, bioinformatik.de lists 38 B.Sc. programs with prominent bioinformatics contents (14 with bioinformatics as major topic) and 35 M.Sc. programs (17 bioinformatics major) in Germany. These programs put varying emphasis on certain bioinformatics topics and skills, and have different access requirements. In a previous workshop during GCB 2023, we collected an overview of bioinformatics B.Sc. and M.Sc. programs in Germany as summarized in a synopsis, which will be provided to participants in preparation of this workshop.

This year, we focus on common perspectives on bioinformatics curricula on the B.Sc. level. Specifically, we would like to initiate a discussion among the participants along the following questions:
● Which topics and skills should be covered by a B.Sc. in bioinformatics?
- Which are part of programs mostly for historical reasons, but might not be considered "essential" from a current perspective?
- Which should be included in a modern program, but are often lacking (due to limited ECTS CPs, etc.)?
● More generally, how should ECTS CPs be distributed among mathematics, computer science, biology, (bio-)chemistry, bioinformatics?
● Which courses can be imported from other institutes/faculties (mathematics, biology, etc.) and which should be devised specifically for bioinformatics programs?
● What is a reasonable balance between theoretical and practical (wet lab, programming, etc.) courses?
● Vision: How would we design "the perfect" B.Sc. bioinformatics (if we had unlimited ECTS CPs and resources)?

Provisional schedule:
09.00 am – 09.30 am Introduction, overview of of B.Sc. bioinformatics programs (Jan Grau,
Stefan Kurtz, Kay Nieselt, Sven Rahmann, Ralf Zimmer, participants)
09.30 am – 11.00 am Joint discussion:
- essential, dispensable, desirable topics and skills
- distribution among disciplines
- theoretical vs practical courses
(all participants)
11.00 am – 11.30 am Coffee break
11.30 am – 12.30 pm Joint discussion:
- "the perfect" B.Sc. bioinformatics program
- common perspective on a modern B.Sc. bioinformatics curriculum
(all participants)

WS7) Standardizing and harmonizing NGS analysis workflows

Organizers: Sameesh Kehr (DKFZ / GHGA); Julia Leimeister (Uni Tübingen / GHGA); Kübra Narci (DKFZ / GHGA); Vanessa Gonzalez Ribao (DKFZ / GHGA); Zehra Hazal Sezer (Eberhard Karls University of Tübingen / GHGA)

Participants: max. 30 ● participants must bring their own laptops

Description: With increasing numbers of human omics data, there is an urgent need for adequate resources for data sharing while also standardizing and harmonizing the processing of the data.

In this tutorial, we will explore how FAIR principles enable the standardization and harmonization of nf-core-based NGS analysis workflows within the German Human Genome-Phenome Archive (GHGA). We will start with a short introduction of what GHGA is, its needs for harmonized workflows in the international context, and an overview of the FAIR data principles. We will then demonstrate the adaptability of nf-core workflows and discuss the importance of standardization of workflows. Finally, we will demonstrate how to make workflows scalable, robust, and automated for continuous benchmarks with hands-on exercises using a subset of a public dataset with a variety of configurations like local and cloud settings.

Provisional schedule:
● Introduction to the tutorial: What is GHGA? What are our workflow objectives? What is FAIR data?
● Reproducibility, adaptability, and portability of workflows
● Standardization and benchmarking of workflows using workflow managers
● HandsOn: managing workflows with nextflow and nf-core

Supported by

BIBI (Bielefeld Institute for Bioinformatics Infrastructure)

German Network for Bioinformatics Infrastructure - de.NBI

Gesellschaft für Biochemie und Molekularbiologie, GBM)