Lenhard group

Bioinformatics of transcription and transcriptional gene regulation

Transcriptional regulation has been identified as the top-priority research subject of advanced post-genome bioinformatics. Binding site specificities of individual eukaryotic transcription factors are notoriously low, which precludes their application to genome-wide analysis. The only available biologically meaningful transcription factor binding site data comes from relatively rare experimental analysis of inferred regulatory regions. Some progress was made upon observation that some tissue-specific genes are regulated by cis-regulatory modules (CRM), i.e. clusters of binding sites for tissue-specific regulatory elements. Another major leap was facilitated by cross-species comparisons of regulatory sequences (phylogenetic footprinting). This step has been successfully combined with other detection methods such as CRM detection, resulting in significant increase in the specificity of predictions.

We are working on several new approaches to harness the new discoveries and newly available data into a next-generation gene regulation bioinformatics platform. We assembled JASPAR, the world’s first open access database of transcription factor binding site profiles from higher eukaryotes. In addition, we developed a computational framework for transcription factor binding site analysis (TFBS) and applied it to quantitatively demonstrate the ability of cross-species comparisons to drastically improve detection rate of transcription factor binding sites. ConSite is our web-based application for the phylogenetic footprinting enhanced detection of transcription factor binding sites.

Based on conceptual framework inherited from studies in bacteria and yeast, previous methods primarily focused on regions 5' upstream from the inferred transcription binding sites. New incoming data is starting to invalidate this as a general approach: many genes are regulated by non-coding elements distributed along the length of the entire gene. Our analysis of the genomic context and organization of 3583 ultra-conserved non-coding regions (UCRs) in the human genome, revealed that they tend to cluster near and around genes involved in fundamental developmental processes in vertebrates, and most often have known homologs in invertebrates (e.g. in Drosophila). The genomic organization of SCR clusters revealed a striking array of long-range enhancers around key genes, sometimes spanning areas of more than 1 MB. This discovery provides an argument against focusing on proximal promoter regions in search for key regulatory elements, and implies the existence of long-range, chromatin-level regulatory mechanisms. We continue to explore the long range regulatory elements across higher eukaryotes.

Another exciting area of research research we are involved in is the analysis of mammalian transcriptome. In collaboration with RIKEN Genome Science Center (Japan), we are dissecting the loci with demonstrated complex transcription patterns, including the occurences of natural antisense, bidirectional promoters and cis-regulatory chains.

Future projects and goals:

  • Elucidation of the genomic organization long range regulatory elements in metazoan genomes and inference of their physiological function from hints provided by their sequence, genomic organization and evolution
  • Building predictive models for regulatory determinants of context-specific gene expression that include long-range regulatory elements
  • Bioinformatics of vertebrate development - transcriptional regulatory network approach to vertebrate embryonic development circuitry
  • Exploring the structure and establishment of classification scheme for vertebrate core promoters and transcription start sites
  • Development of methods for predicting the effects of regulatory variation in the genome

Selected publications:

  • Pang KC, Stephen S, Dinger ME, Engström PE, Lenhard B, Mattick JS (2007) RNAdb 2.0 - an expanded database of mammalian non-coding RNAs. Nucleic Acids Res. 35: Advance Access published on December 1, 2006; doi: doi:10.1093/nar/gkl926
  • Ponjavic J, Lenhard B, Kai C, Kawai J, Carninci P, Hayashizaki Y, Sandelin A (2006) Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters. Genome Biol. 7:R78
  • Bailey PJ, Klos JM, Andersson E, Karlén M, Källström M, Ponjavic J, Muhr J, Lenhard B, Sandelin A, Ericson J (2006) A Global Genomic Transcriptional Code Associated with CNS Expressed Genes. Exp. Cell Res. 312:3108-19
  • Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CAM, Taylor MA, Engström PG, Frith MC, Forrest ARR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y (2006) Genome-wide analysis of mammalian promoter architecture and evolution based upon human and mouse CAGE data. Nat. Genet. 38:626-35.
  • Engström, PG, Suzuki H, Ninomiya N, Brozzi NA, Luzi L, Sessa L, Lavorgna G, Tan SL, Yang L, Kunarso G, Lian-Chong Ng E, Batalov S, Wahlestedt C, Kai C, Kawai J, Carninci P, Hayashizaki Y, Wells C, Bajic VB, Orlando V, Reid JF, Lenhard B, Lipovich L (2006) Complex loci in human and mouse genomes. PLoS Genet. 2:e62.
  • Gómez-Skarmeta, JL, Lenhard, B, and Becker, TS (2006) New technologies, new findings and new concepts in the study of vertebrate cis-regulatory sequences. Review, Dev. Dyn. 235:870-85.
  • Vlieghe D, Sandelin A, De Bleser PJ, Vleminckx K, Wasserman WW, van Roy F, Lenhard B (2006) A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34:D95-7
  • Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Engström PG, Mizuno Y, Faghihi MA, Sandelin A, Chalk AM, Mottagui-Tabar S, Liang Z, Lenhard B, Wahlestedt C (2005) Antisense transcription in the mammalian transcriptome. Science 309:1564-6.
  • Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, ... , Engström PG, ..., Sheng Y, ... , Kawai J, Hayashizaki Y; FANTOM Consortium; RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) (2005) The transcriptional landscape of the mammalian genome. Science 309:1559-63.
  • Pang K, Stephen S, Engström PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS (2005) RNAdb - a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 33:D125-30
  • Sandelin A, Bailey P, Bruce S, Engström PG, Klos JM, Wasserman WW, Ericson J, Lenhard B (2004) Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics 5:99
  • Sandelin A, Wasserman WW, Lenhard B. (2004) ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res. 32:W249-52
  • Alkema WA, Lenhard B, Wasserman WW (2004) Regulog analysis: detection of conserved regulatory networks across bacteria. Genome Res. 14:1362-73
  • Sandelin A, Alkema W, Engström PG, Wasserman WW, Lenhard B (2004) JASPAR: An Open-Access Database for Eukaryotic Transcription Factor Binding Profiles. Nucleic Acids Res. 32: D91-4
  • Lenhard B., Sandelin A., Mendoza L., Engström P., Jareborg N., Wasserman W.W. (2003) Identification of conserved regulatory regions by comparative genome analysis. J. Biol. 2:13
  • Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato, ..., Lenhard B, ..., Lander ES, Rogers J, Birney E, Hayashizaki Y. [150 authors total] (2002) Analysis of the Mouse Transcriptome Based Upon Functional Annotation of 60770 full-length cDNAs. Nature 420: 563-73
  • Lenhard B. , Wasserman W.W. (2002) TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics 18:1135-6
  • Lenhard B., Hayes W.S., Wasserman W.W. (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res. 11:2151-7

Boris


Boris Lenhard

Group Leader (CBU)

Lenhard research group page

  • Ph.D. from University of Zagreb (Croatia), 1999
  • Postdoctoral Researcher, Karolinska Institutet (Sweden), 2000-2002
  • Group Leader/Assistant Professor, Karolinska Institutet (Sweden), 2002-2005

Group members:

  • David Fredman (Postdoc)
  • Pär Engström (Ph.D. student)
  • Ying Sheng (Ph.D. student)
  • Kairi Tammoja (Ph.D. student)
  • Altuna Akalin (Ph.D. student)
  • Xianjun Dong (Ph.D. student)