Identifying the nucleotides that trigger gene expression variation is certainly a

Identifying the nucleotides that trigger gene expression variation is certainly a critical part of dissecting the genetic basis of complicated traits. algorithms regarding (Erb and van Nimwegen 2006). The existing research builds on our Pdpn prior function by incorporating a big set of brand-new TF motifs from latest proteins binding microarray experiments (Badis et al. 2008; Zhu et al. 2009), enabling us to considerably raise the scope of the network while maintaining a higher amount of specificity. We anticipate that our annotations are likely to be of independent interest to the community, and they are freely available online (http://www.swissregulon.unibas.ch/). Our second major result is the identification of a subset of genes for which we can significantly correlate changes in the predicted TFBSs with gene expression divergence. The problem of predicting gene expression from sequence alone is well known to be difficult because of the complexity of (Yuan et al. 2007). For example, the effects of a mutation at a given TFBS may depend on the constellation of other TFBSs in the promoter. Several authors examined the correlation between differences in TFBSs and gene expression divergence between different yeast species (Doniger and Fay 2007; Tirosh et al. 2008) or duplicated genes within (Zhang et al. 2004; Leach et al. 2007). These studies had only limited success in correlating expression with sequence that we hypothesize is usually partly because of purchase Fisetin the large evolutionary distances used in the comparisons. For example, the sequence divergence between two commonly studied strains, S288c and RM11-1a, is usually 0.5%, purchase Fisetin whereas the divergence between and is as much as 12% for coding sequence and 18% for noncoding sequence (Cliften et al. 2001). Likewise, most gene duplications in yeast are ancient, with the majority of the duplication events occurring around the time of the eukaryoteCprokaryote split (Gu et al. 2005). Therefore, at these larger evolutionary distances promoters typically differ at multiple positions. We reasoned that fewer complex changes in strains, allowing us to more readily correlate sequence and expression divergence. Taken together, our evolutionary and gene expression analyses demonstrate that our new TFBS predictions significantly improve on the previous annotations and that for a subset of genes, changes in predicted TFBSs correlate significantly with changes in gene expression divergence. Our fine-scale sequence-based computational approach thus complements the classical phenotype-based approach in which quantitative trait locus (QTL) mapping methods are used to identify genomic loci associated with the phenotype. Ultimately, we expect that a combination of the two approaches will be necessary for elucidating the mapping of genotype to phenotype. Materials and Methods TF Binding Site Predictions We combined 89 position-specific weight matrices (PWMs) from Zhu et al. (2009), 112 PWMs from Badis et al. (2008), and 72 PWMs from Erb and van Nimwegen (2006). Overall, visual inspection of TF motifs inferred by more than one method suggests that there is good agreement between the three data sets. Single purchase Fisetin PWMs for each TF were obtained using a Bayesian procedure that takes a set of PWMs as input and determines the relative alignment of the PWMs that maximizes purchase Fisetin the probability that the entire set derives from a single underlying PWM and also infers this underlying PWM (FANTOM Consortium 2009). This method also determines whether the data are consistent with all PWMs deriving from one common PWM. For 12 TFs, two of the methods agreed while the third was an outlier, so for each of these TFs, the outlier was manually removed and the two remaining PWMs were aligned. For two TFs, the protein binding microarray methods disagreed between a dimer and monomer motif so we resolved these cases manually. For the other TFs, we initial aligned both proteins binding microarray PWMs and aligned the resulting ordinary proteins binding microarray PWM with the ChIP-chip PWM. Finally, we manually trimmed the motif boundaries to exclude positions with small information articles and discarded the motif for FHL, a forkhead-like TF that, based.