- Developed by researchers at IRB Barcelona, the tool aims to harmonise and integrate the enormous (and growing) amount of biological data available.
- The Bioteque considers relations between 12 biological entities (such as genes, diseases, and cells) and is based on artificial intelligence algorithms to generate descriptors of these elements that capture distinct levels of functional complexity.
- The work, which is open access, has been published in the journal Nature Communications.
The rapid development of the different disciplines in the fields of biological and biomedical research (such as genomics, proteomics, and transcriptomics) in recent decades has led to exponential growth in the amount of biological data available. For example, at the European Bioinformatics Institute (EMBL-EBI), they have gone from managing a volume of 40 petabytes to working with 250 petabytes in just 6 years.
Scientists led by Dr. Patrick Aloy, ICREA researcher and head of the Structural Bioinformatics and Network Biology laboratory at IRB Barcelona, have developed a computational tool to harmonise, integrate and simplify these data. The result is a knowledge graph that provides information on how different biological entities are related to each other, including more than 30 million functional interactions.
The Bioteque works by integrating different levels of biological complexity and thus can report, for example, on two genes that are related, whether they physically interact, whether they are active in the same type of cells, and whether they are related to the same disease. It can also predict the sensitivity or resistance of a type of cell to a specific drug.
“This computational resource that we've developed is one of the first aimed at unifying biological information and it's the only one to address such diversity and amount of data. It allows access, in an easy and harmonised way, to practically all the biological knowledge currently available, and it has enormous potential to accelerate biomedical research,” explains Dr. Patrick Aloy.
Almost 1,000 descriptors for 12 biological entities
The information held in the Bioteque is structured into 12 types of biological entities, such as gene, disease, tissue, cell, etc. For each of these entities, the tool considers a series of descriptors or characteristics, for example, the pattern of mutations of a gene, the profile of physical interactions of the resulting proteins, the expression of said gene in different cell types, or its relationship with different diseases. Among the 12 biological entities, the system covers around 1,000 types of descriptors.
“We have worked with information from 150 different databases, so first we had to integrate them, that is, put them all in the same “language”. And then we converted that knowledge into numerical descriptors that could be interpreted by algorithms, and that way we could computationally exploit these networks and connections,” concludes Adrià Fernández, the first author of the article and a doctoral student in the same laboratory.
The Bioteque will be expanded periodically with new databases, as they are made public. Both the tool and the databases and algorithms are open access and are available here: https://bioteque.irbbarcelona.org/.
Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
Adrià Fernández-Torras, Miquel Duran-Frigola, Martino Bertoni, Martina Locatelli and Patrick Aloy
Nature Communications (2022) DOI: 10.1038/s41467-022-33026-0
About IRB Barcelona
Created in 2005 by the Generalitat de Catalunya (Government of Catalonia) and University of Barcelona, IRB Barcelona is a Severo Ochoa Centre of Excellence, a seal that was awarded in 2011. The institute is devoted to conducting research of excellence in biomedicine and to transferring results to clinical practice, thus improving people’s quality of life, while simultaneously promoting the training of outstanding researchers, technology transfer, and public communication of science. Its 27 laboratories and eight core facilities address basic questions in biology and are orientated to diseases such as cancer, metastasis, Alzheimer’s, diabetes, and rare conditions. IRB Barcelona is an international centre that hosts 400 employees and more than 30 nationalities. It is located in the Barcelona Science Park. IRB Barcelona is a CERCA center, and a member of the Barcelona Institute of Science and Technology (BIST).