"On it's way..."
To having a reduced genome. So, it turns out Hflu is barreling headlong into a future with a reduced genome, including all the benefits: low GC content, few transcription factors, more pseudogenes, etc. And what's to stop it there? Eventually, if all goes according to Hflu's cunning plan, it will end up a small pile of adenine and thymine - forever dispensing with all those useless Gs and Cs.
This Douglas Adamsesque view of how the universal tape will play out for our favourite little bug has evolved out of a small wager in our lab. An all knowing post doc (the one who foresees Hflu eventually shedding it's too-large genome) predicted that Hflu's intergeninc regions has lower GC content than does coding sequence. An over-confident grad student who felt he had his fingers on the pulse of Hflu's intergenic regions figured that the GC content would be fairly constant between genic and intergenic regions. He reasoned that because Hflu's average GC content (38.14%) is about the same as E. coli's intergenic regions (~40%), and as E. coli is THE model organism, whatever works for E. coli should work for Hflu. This would allow Hflu to maintain a constant GC content throughout genic and intergenic regions. Because E. coi and Hflu have very similar transcriptional regulatory networks and employ many of the same transcription factors, it seems a fair assumption that they would have intergenic regions with comperable composition. However, I've just done the math (ie. used Word to count the number of A,T,G, and Cs in Hflu's 220,505bp of intergenic sequence), and it turns out that Hflu has only 33.2% GC in its intergenic regions.
So, what pressures select for low GC intergenic regions, and would E. coli go lower if it could? Melting temperature doesn't seem like a big deal; RNA polymerase melts DNA at AT rich -10 regions, but this only accounts for a small portion of intergenic sequence. More likely, DNA flexibility is an advantage. Many transcription factors bend DNA, and many repressors form DNA loops, while larger nucleoprotein complexes (which involve multiple proteins binding in close proximity) involve DNA bending and kinking. As A-T runs are more flexible than G-C runs, AT rich intergenic regions may be advantageous because they allow for DNA deformation by regulatory factors.