Thesis director: Gaël CRISTOFARI
Abstract : A major part of gene introns contains cryptic splicing, termination signals, and abundant DNA and RNA protein-binding sites. These sequences are often found in mobile DNA units known as transposable elements embedded in long genes. These sequences could interfere with the transcription elongation of the genes in which they are present, leading to premature transcription termination, a process known as attenuation. Since attenuation appears to be uncommon under normal conditions, we hypothesized that safeguarding mechanisms may protect long human genes. To study such mechanisms, we developed an R package, called tepr, to analyze datasets from nascent transcript RNA sequencing (e.g., TT-seq, PRO-seq, mNET-seq). The software can identify genes potentially attenuated, measure the extent of attenuation, and compare different experimental conditions. Applying this strategy to human fibroblast and hepatocarcinoma cells treated with heat-shock (HS), a condition known to induce attenuation in long genes, we identified genes specifically attenuated in HS. Moreover, we showed that a group of genes is attenuated in both cell types while another group of genes experiences cell-type-specific attenuation. To find potential cellular factors involved in promoting or preventing attenuation, we further screened publicly available eCLIP datasets and discovered RNA-binding proteins enriched in transcripts from HS-attenuated genes, suggesting that these factors could regulate attenuation. Altogether, this work provides a quantitative framework for studying attenuation and sheds light on the regulatory mechanisms protecting transcription elongation through long human introns.