Structural and Computational BiologyGenome Data Science

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." John Tukey

Genome Data Science

A deluge of genomic, transcriptomic and phenomic data presents opportunities to learn about the properties of living systems, but it also presents challenges. In order to address outstanding questions in biology and medicine, researchers need to discover meaningful and robust patterns from data.

In the aGENDAS -- a GENome Data Science -- group, we will strive to elucidate the links between mutational processes, natural selection, gene function and phenotype by means of statistical genome analyses. In particular, we use cutting-edge computational techniques and machine learning methodologies for exploring massive genomic data sets.

We aim to answer important biological questions by insightful analysis of data originating from human cancers (somatic mutations, chromosomal alterations, transcriptomes), human populations (germline variants), metagenomics (including human microbiomes) and also fully sequenced microbial genomes.


Research interests of the aGENDAS group are organized into four themes:


Unraveling mutational processes. Cancer is a disease of the genome. Therefore, mutations are the fuel of carcinogenesis and it is imperative to learn what causes them and how they drive evolution in general, and cancer evolution in particular. Our past work has discovered that mutations are unevenly distributed across the human genome due to differential activity of DNA mismatch repair, which preferentially protects early replicating, gene-rich regions (Supek and Lehner 2015 Nature). Moreover, motivated by recent discoveries of APOBEC3 mutagenesis in tumors, we found another prevalent process that creates clustered mutations in many cancer types -- error-prone mismatch repair (MMR), which employs the low-fidelity DNA polymerase eta (POLH). The histone mark H3K36me3 is an important genomic determinant of MMR activity, recruiting both the standard, error-free MMR and the non-canonical, error-prone MMR (Supek & Lehner 2017 Cell).


Genomic signatures of natural selection. Most somatic mutations found in cancer cells are ‘passenger’ mutations, with little phenotypic consequence. Detecting the few mutations among those which are ‘drivers’ is challenging, yet crucial to understand carcinogenic transformation. We have previously discovered that synonymous mutations ie. those that occur in gene coding regions but do not change the amino acid sequence, commonly drive cancer by affecting splicing patterns of oncogenes (Supek et al. 2014 Cell). Moreover, we have learnt how the quality control pathway of nonsense-mediated mRNA decay (NMD) decides which mRNAs to degrade (Lindeboom et al. 2016 Nat Genet), and used these rules of NMD to reveal patterns of positive and negative selection on tumor suppressors.


Automated inference of gene function. Genome sequencing technologies are rapidly advancing, providing an abundance of genomes of prokaryotic and eukaryotic species, and also of populations thereof. This presents an opportunity to learn about the function of the ~1/3 of the genes for which, remarkably, a biological role is still not known. We have devised a methodology to infer gene function from evolution of codon biases, and experimentally validated tens of predictions in E. coli (Krisko et al. 2014 Genome Biol). We have also investigated how best to combine heterogeneous genomic predictors, finding that it often pays off to simply trust a single most confident call, even if not supported in multiple methods (Vidulin et al. 2016 Bioinformatics).


Genetic basis of phenotypes. Various kinds of -omics data accumulate rapidly and are increasingly organized into tidy, structured repositories. In contrast, phenomics data, while very valuable, are less often collected in a systematic manner and encoded in computable formats. This hampers the discovery of genes that underlie various phenotypes. We have used machine learning to text-mine the scientific literature and annotate microbias species with >400 phenotypic traits (Brbic et al. 2016. Nucl Acids Res) and suggest their genetic basis (including prevalent epistasis in gene repertoires). One example are genomes of pathogenic bacteria, which tend to encode proteomes resistant to unfolding, thereby protecting the microbes from the oxidative stress (Vidović et al. 2014 Cell Rep).

M Brbić, M Piškorec, V Vidulin, A Kriško, T Šmuc, F Supek
Nucl Acids Res, (2016)
RGH Lindeboom, F Supek*, B Lehner*
Nature Genetics, (2016)
A Krisko, T Copic, T Gabaldón, B Lehner, F Supek
Genome Biology, (2014)
F Supek, B Miñana, J Valcárcel, T Gabaldón, B Lehner
Cell, (2014)

* shared senior authorship

This group receives financial support from the following sources:

  • Ramón y Cajal Fellowship
  • European Research Council (ERC) 

Group news & mentions

<p>Fran Supek has been awarded with an ERC Starting Grant</p>
28 Nov 2017

La plataforma Biocat ha dedicado un reportaje a tres de los últimos investigadores galardonados con una ayuda ERC Grant del Consejo Europeo de Investigación (ERC, por sus siglas en inglés).

<p>The IRB Barcelona group leader will focus on the genomes of hypermutated tumours to detect cancer vulnerabilities</p>
8 Sep 2017

JutarnjiLife, entre otros medios de Croacia y Bosnia Herzegovina, ha dedicado un artículo a Fran Supek por su obtención de una ayuda Starting Grant del Consejo Europeo de Investigación (ER

<p>In February 2018, Supek will start a five-year project called “HYPER-INSIGHT”, which has received 1.5 M€ of funding</p>
7 Sep 2017

Diario Médico se ha hecho eco de que Fran Supek, jefe del grupo de Genome Data Science del IRB Barcelona, ha sido galardonado con una ayuda Starting Grant por el Consejo Europeo de Investi

<p>Fran Supek has been awarded with an ERC Starting Grant</p>
6 Sep 2017

El Consejo Europeo de Investigación (ERC en sus siglas en inglés) ha anunciado hoy la concesión de los proyectos ERC Starting Grants (subvenciones de inicio de carrera) a 406 jóvenes inves

Upcoming events

21 Feb
Aula Fèlix Serratosa, Parc Científic de Barcelona
Jose C. Reyes, Ph.D. Head of Department Andalusian Molecular Biology and Regenerative Medicine Centre (CABIMER). Sevilla, Andalusia, Spain.
22 Feb
Aula Fèlix Serratosa, Parc Científic de Barcelona
Sakari Vanharanta, Ph.D. MRC Cancer Unit, University of Cambridge, Cambridge, UK
23 Feb
Aula Fèlix Serratosa, Parc Científic de Barcelona
Prof. Michele De Luca, MD, PhD. Centre for Regenerative Medicine “Stefano Ferrari”, University of Modena and Reggio Emilia, Modena, Italy