主 题: Statistical methods for inferring gene regulatory modules and networks
报告人: 懈军 教授 (普度大学统计系)
时 间: 2007-06-04 下午 4:00
地 点: 理科一号楼 1303 
  
 
  
    This talk is about probability and statistical methods for analysis 
   
of genomic data. Our focus is on a specific problem of inferring gene 
   
regulatory module, which is defined as a set of coexpressed genes that 
   
are regulated by a common set of transcription factors (proteins). 
  
 
  
      
  
 
  
    We propose a series of statistical methods that combine information from 
   
multiple types of genomic data, including DNA sequences, genome-wide 
   
location analysis (ChIP-chip experiments), and mRNA gene expression 
   
microarray. More specifically, we have developed a hidden Markov model, 
   
which models combinations of transcription factor binding sites in DNA 
   
sequences (strings of nucleotides A, C, G, T). The predictions are 
   
refined by regression analysis on mRNA gene expression microarray data 
   
and/or ChIP-chip binding data. In regression analysis, we formulate a 
   
variable selection problem and show that all available methods, including 
   
standard stepwise selection and LASSO/LARS, fail to select the right set 
   
of covariates, due to complicated interdependence among genes. This 
   
biological application posts a challenge in probability and statistics. 
   
In addition to our attempt, other new methodologies will be of great 
   
interest.