Research

Welcome to the hub of information you maybe are more interested in. Here I trace a bit the journey that has led me to where I stand today, while also delving into the compelling questions around topics that occupy most of our time. You'll discover not only the results we've achieved through publications and software tools but also the ongoing inquiries that drive our endeavors.

My journey in research commenced at the dawn of the new millennium, within the field of Software Engineering. I delved into areas such as software maintenance, evolution, empirical software engineering, and mining software repositories. It was a very fervid period in which I learned the first steps of doing research and teaching students in a prolific and intellectually rich environment, then known as RCOST (Research Centre on Software Technology), led by Gerardo Canfora, my PhD supervisor who transmitted to me the passion for research. From 2006 to 2010, I worked on Mining Software Repositories, which I consider one of the most exciting and pioneering topics in software maintenance and empirical software engineering. The availability of extensive data repositories on software changes and bug issues fueled quantitative studies, allowing researchers to glean insights and validate hypotheses. With Massimiliano Di Penta, I developed novel approaches that, for example, suggest which part of a software system needs to be fixed and who is the best developer to achieve such a task. Both and other similar approaches were appreciated by the community, at least in terms of citations and awards obtained.

Around 2008, fate led me to Michele Ceccarelli , who was embarking on a journey into bioinformatics, determined to infuse quantitative methods into a traditional biology department. He invited me to join this audacious endeavor, aiming to establish and nurture a bioinformatics research group at the University of Sannio. Reflecting on that decision, while it may have seemed risky at the time, it ultimately proved rewarding. Biology has always fascinated me, and computer science has been my main interest since childhood. So the opportunity to do both was perhaps the reason to justify the most irrational choice. At that time, I was particularly fascinated by the connection between software engineering and bioinformatics. Originally, bioinformatics - as intended by Paulien Hogeweg, who coined the term - was the study of informatic processes in biotic systems, like cells, analogously to other bio-prefixed disciplines, such as biochemistry - the study of chemical processes within and related to living organisms - and biophysics - the study of physical principles of living things and biological processes. On the other hand, software engineering researchers question software quality, asking, for example, to what extent a software system is maintainable, or evolvable, or corresponds to the original design intent. Can such software engineering questions make sense also for programs and genetic circuits encoded inside living organisms? Can we test if genetic programs meet cell's requirements to survive and to function? Questions like these stimulated my approach to bioinformatics, but over time, such issues slowly faded away, and when the availability of biological data started to grow exponentially, bioinformatics also changed its original focus, becoming a data science-oriented discipline. Now, bioinformatics is supporting and, in many cases, guiding the big questions in biology and biomedicine. Several subfields have emerged, and probably the original bioinformatics intents have vanished, becoming parts of subfields like Systems Biology, which strongly overlaps with information engineering disciplines, such as automatic control.

Currently, my focus lies in harnessing the potential of cutting-edge machine learning algorithms, particularly deep learning, in fields such as cancer bioinformatics and computational genomics. My primary objective is to develop and validate computational and statistical methods - specifically machine learning algorithms and architectures - to tackle pertinent biological challenges and formulate novel research hypotheses. Here, you'll find an overview of my ongoing research interests, collaborations, and the significant topics driving my current endeavors.

[Topics] [Collaborators] [Software] [Funds] [Hardware]

Research topics

Description Work Tools
Reverse engineering of gene regulatory networks
The regulation of genes activities that permit to living organisms to function and survive is quite complex and is actuated at different levels. The main mechanism takes place at the level of transcription which controls when and how often a given gene is transcribed. The information encoded into a gene is carried by RNA messengers to other places of a cell and further processed to act its function. Particular genes, known as Transcription Regulators, bind directly on cis-regulatory DNA sequences and trigger the transcription of other genes, with a logic that resembles wired logic connection in digital electronics. More complexity, especially in eukariotes, is added by other complexes acting as co-activators, co-repressors, and mediators. From a mathematical point of view such complexity can be effectively modeled with a system of differential equations describing how the concentration of complexes vary over time. But the network of gene-gene interactions is not known in advance to make mathematically modeling feasible and additionally the network is dynamic and strongly dependent on the functioning context of a cell. Reverse engineering approaches aim to reconstruct the gene-gene control interactions from a collection of trascriptomic profiles representing the context of interest. There are almost 20 years of methods that have been proposed in literature. We are currently working on novel deep graph neural network based architectures to learn gene regulatory interactions and trajectories from transcriptomics single cell data.
  • (Chang et al., Genome Biology, 2020)
  • (Mall et al., Nucleic Acids Research, 2018)
  • (Mall et al., BMC System Biology, 2017)
  • (Ceccarelli et al., FormaliSE, 2015)
  • (Ceccarelli et al., Methods, 2014)
  • (Cerulo et al., BMC Bioinformatics, 2013)
  • (Cerulo et al., BMC Bioinformatics, 2010)
  • RGBM
  • Learning features and functions from sequences
    It is known that genetic information encoded in DNA genomes is shared and transmitted among cells where such information is appropriately adoped to implement and actualize all necessary living functions. A not new question is whether a cell function can be predicted just from the nucleotide sequence carried by DNA molecules. Is the nuclobase language sufficient to encode all the information needed for an organism to survive? Are there high level languages encoding higher level cell functions? We are currently trying to answer such questions using recent deep learning advances on feature learning and self-supervised learning.
  • (Noviello et al., Plos CB, 2020)
  • (Noviello et al., BMC Bioinformatics, 2018)
  • (Noviello et al., BMC Bioinformatics, 2017)
  • ncrna-homologs
  • ncrna-deep
  • Ligand-receptor interaction prediction
    In addition to the self controlling mechanisms administered by gene regulatory networks, cells communicate with other cells and with the environment they live by receiving and sending signals. It is relevant to know - in a particular context of interest - which kind of signals are exchanged between cells, what kind of cells are sending such signals, and what kind of cells are receiving them through their receptors. Trascriptomic profiles, usually at single cell resolution, are the starting point to reconstruct such kind of extracellular communication. We are adopting unsupervised graph autoencoder approches to learn which ligand-receptor interactions are active in a context of interest.
    Drug synergy prediciton

    Current and past collaborators

    Software tools

    Name Language Description References
    TDMDfinder Web tool TDMDfinder is an integrative online-platform for in-silico predicted High Confidence Target-Directed miRNA Degradation in mammalian genomes (Homo Sapiens and Mus Musculus).
  • (Simeone et al., Nucleic Acids Research, 2022)
  • MGST Web tool MGST (Massive Gene Set Test) is an improved implementation of gene set enrichment analysis based on Mann-Whitney-Wilcoxon test with additional scoring parameters.
  • (Frattini et al., Nature, 2018)
  • (Cerulo and Pagnotta, Entropy, 2022)
  • CovidLiterature R scripts CovidLiterature is a collection of scripts to perform literature analysis based on topic models. In particular the analysis of Covid related articles is shown to review the literature of tools and resources.
  • (Caruso et al., Briefings in Bioinformatics, 2021)
  • ncrna-deep Python scripts ncrna-deep is a deep learning method based on convolutional neural network to predict the Rfam classes of short non coding RNAs.
  • (Noviello et al., Plos CB, 2020)
  • ncrna-homologs R scripts ncrna-homologs is a collection of scripts to search for ncrna homologous in different species based on alignment-free text similarity metrics.
  • (Noviello et al., BMC Bioinformatics, 2018)
  • RGBM R package RGBM is a Regularized TreeBoost algorithm for Gene Regulatory Network inference from trascriptomic data.
  • (Mall et al., Nucleic Acids Research, 2018)
  • ldiff Perl script LDIFF is an enhanced language-independent line differencing tool built upon the Unix diff. It is able to classify text lines that has been changed by manual text editing from those that has been just added and/or deleted.
  • (Canfora et al., MSR 2007)
  • (Canfora et al., IEEE Software, 2008)
  • Research grants

    What let us to do research.

    Research project Years Funded by Amount Partners Links
    Strategic infrastructure for translational research in genomics to fight cancer and for protection and improvement of human health GENOMA E SALUTE 2017-2020 European Union 800,000 Euro
  • Unisa
  • Unisannio
  • Biogem
  • GRGS
    Non-Coding RNA Explosion: Novel Implications in Neurotrophin Biology 2013-2018 FIRB 2012 Ministry of Research and University (FIRB2012-RBFR12QW4I) 700,000 Euro
  • Stazione Zoologia
  • Unisannio
  • Unina
  • Unisapienza
  • Metodi e strumenti per l'integrazione di dati e conoscenze nella biologia dei sistemi 2012-2017 Regione Campania Assessorato alla Ricerca, Legge 5 12,500 Euro
  • Unisannio
  • Biogem
  • High performance computational infrastructure

    Who do the hard job.

    Superdome Flex server (224 CPU cores Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 1.5 TB RAM, 150TB storage, 1 GPU NVIDIA Tesla V100) hosted at the Department of Science and Technology (University of Sannio) in Benevento
    Cluster of 7 CPU servers and 1 GPU unit (collecting a total of 400 CPU cores, 4 TB RAM, 600TB storage, 6 GPU NVIDIA) hosted at Biogem - Molecular Biology and Genetics Research Institute in Ariano Irpino (AV)

    [Home] [Publications] [Research] [Lectures] [Jobs] [Contacts]
    © Copyright Luigi Cerulo