Transcription by RNA polymerase (RNAP) is interrupted by pauses that play

Transcription by RNA polymerase (RNAP) is interrupted by pauses that play diverse regulatory roles. transcript (Fig. 1A) allowing us to define RNAP locations along ~2 0 genes with single nucleotide resolution (table S2). Fig. 1 Bacterial NET-seq provides a genome-wide view of transcription dynamics The number of mapped reads at each genomic position is proportional to the number of RNAP molecules at that position. We observed Beloranib well-defined single-nucleotide peaks within transcribed regions at known regulatory pause sites including sites that synchronize transcription with translation mediate RNA folding or recruit transcription factors (Fig. 1B and fig. S4A-E). NET-seq profiles also revealed a large number of other highly reproducible peaks in RNAP density throughout the genome (example gene in Fig. 1C). In total we identified ~20 0 previously undocumented pause sites across well-transcribed genes representing an average frequency of 1 1 per 100 bp (Fig. 1D). Thus known regulatory pause sites represent a tiny fraction of actual pause positions. We found that pause propensity depended strongly on the sequence identity at the 3′-end of the transcript (87% of paused transcripts end with either cytosine or uracil) as well as on the identity of the incoming NTP substrate (70% of pause sites occur prior to GTP addition) (Fig. 2A). Sequence dependence extends outside the RNAP active site to 11 nucleotides (nt) upstream and 5 nt downstream of the pause position consistent with the extent of core nucleic-acid contacts made within the elongation complex (8). To determine the contribution of each base to pause duration we used the density of reads in the NET-seq profile to calculate the relative dwell time Mouse monoclonal to KLHL22 of RNAP at each well-transcribed position in the genome. Modeling the addition of the next nucleotide as a process with a single activation barrier we calculated the effective energetic Beloranib barrier to nucleotide addition as the logarithm of the RNAP occupancy signal (supplementary materials). We used these values to determine the sequence dependence of this barrier for all positions within 15 bases of the transcript 3′ end. The resulting plot provides an energetic view of sequence-dependent pausing in which peaks indicate bases that increase the relative RNAP dwell time (Fig. 2B). These observations implicate a 16-nt consensus pause sequence whose prominent features include GG at the upstream edge of RNA:DNA hybrid and TG or CG at the location of the 3′-end of the nascent transcript and incoming NTP (Fig. 2A). Fig. 2 Transcriptional pauses are driven by RNAP-nucleic acid interactions We used the energetic profile as Beloranib a metric to determine whether most pauses could be explained by the consensus pause sequence. The energetics of nucleotide addition (Fig. 2B) allowed us to compute the propensity for pausing at every well-transcribed position by Beloranib summing the energetic contribution of each base from position ?1 to ?11. The predicted energies were grouped into two categories: sequences for which pausing was observed and sequences for which pausing was Beloranib undetectable. A cumulative histogram of the energetics for the two populations shows that pause-associated sequences were well-separated in sequence space from non-pause sequences (Fig. 2C). Using a receiver-operating characteristic (ROC) analysis we determined the optimal threshold for distinguishing these two populations (fig. S5) and found that the majority of pause sequences lay above the threshold (Fig. 2C). Furthermore the same threshold correctly classified the group of “canonical” regulatory pauses previously identified in (10 11 By limiting the concentration of GTP which is the nucleotide most frequently associated with pausing RNAP at over 300 unique positions in a segment of the gene (Fig. 2D). These position-specific rates which ranged over ~2-3 orders of magnitude yielded activation energy barriers well-correlated to those computed from NET-seq (Fig. 2E-F). Moreover they are qualitatively consistent with an consensus proposed previously from a small set of pause-inducing elements.