Hicberg: Reconstruction of genomic signals from repeated elements

Published in Pending, 2025

Increasingly detailed investigations of the spatial organization of genomes reveal that chromosome folding influences or regulates dynamic processes such as transcription, DNA repair and segregation. Hi-C approach is commonly used to characterize genome architecture by quantifying physical contacts’ frequency between pairs of loci through high-throughput sequencing. These sequences cause challenges during the analysis’ alignment step, due to the multiplicity of plausible positions to assign sequencing reads. These unknown parts of the genome architecture, that may contain biological information, remains hidden throughout downstream functional analysis. To overcome these limitations, we have developed Hicberg, a method combining statistical inference with input from DNA polymer behavior characteristics and features of the Hi-C protocol to assign with robust confidence repeated reads in a genome and "fill-in" empty vectors in contact maps. Hicberg is intended to be applicable to different types of organisms. We will present the program and key validation tests, before applying it to unveil hidden parts of the genomes of E.coli, S.cerevisiae and P.falciparum. Hicberg shows that repeated sequences may be involved in singular genomic architectures. Our method can provide an alternative visualization of genomic contacts under a wide variety of biological conditions allowing a more complete view of genome plasticity.