Using PAML to detect pairwise dNdS ratios

Caveat: Many of the intricacies of Molecular evolution are fuzzy to me!

Intro: Due to the triplet code, changes in third codon position are mostly silent (synonymous), in that they do not result in a change to the encoded amino acid. In contrast, changes that occur in first and second codon positions nearly always result in an amino acid substitution (non-synonymous). Non-synonymous changes are rare as they mostly cause deleterious mutations that are removed from the population through purifying selection. However, in cases where positive selection is acting on a protein sequence, accumulation of changes that lead to the adaptive shift in protein function tend to result in an excess of Non-synonymous mutations. This means the ratio of non-silent to silent changes, as a proportion of each kind of site in the sequence, can be used as a measure of adaptive evolution. More specifically, a dNdS ratio of greater than 1 is used to indicate that a protein is/has been subject to positive selection. Here I’ll describe some of the important parameters when using arguably the most popular method for detecting selection based on dNdS in coding genes, that being the PAML package.

Preliminaries: The predominant program that we will use is called codeml, which (for the most part) takes an alignment of homologous coding regions. The most important aspect of the alignment is that the codon positions are correctly aligned. Codeml make calculations based on the nucleotide positions in the coding triplet, so the the open reading frame in the DNA needs to be maintained if the alignment contains indels or missing bases. The best way to guarantee this outcome is to constrain the DNA alignment based on a protein alignment (i.e. any gaps ‘-’ should be inserted as ‘—’ as not to disrupt reading frame) and in a sense any phylogeny based on coding DNA sequences should always be based such an alignment (evolution acts at the level of the protein after all). A tool for doing this is called transalign, but be warned that 1) your protein and DNA sequences need to be completely complementary (DNA must = protein, no partial or extra triplets etc) (2), they need to share the same names (3), they must be in the same order.

Pairwise dNdS analysis: Codeml reads a control file that contains the run parameters, typically this file is called codeml.ctl, the basic file for doing pairwise dNdS is below.

seqfile = seqfile.txt   * sequence data filename
outfile = results.txt   * main result file name
treefile = treefile.txt *tree file
noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
verbose = 1      * 1:detailed output
runmode = -2     * -2:pairwise
seqtype = 1      * 1:codons
CodonFreq = 0      * 0:equal, 1:F1X4, 2:F3X4, 3:F61
model = 0      *
NSsites = 0      *
icode = 0      * 0:universal code
fix_kappa = 1      * 1:kappa fixed, 0:kappa to be estimated
kappa = 1      * initial or fixed kappa
fix_omega = 0      * 1:omega fixed, 0:omega to be estimated
omega = 0.5    * initial omega value

I’ve highlighted some important parameters in red, “seqfile” and “outfile” are self explanatory, except to say that the infile needs to be in a “typical” format, I use fasta because it is what I use(-;. The “runmode” is set to -2 to indicate that you are doing pairwise dNdS. “CodonFreq” and “kappa” incorporate parameters into the model that account for biological relevant aspects of protein evolution. CodonFreq controls the codon frequency parameter of the model, setting it to 0 implies that all codons are equally likely, 1 assigns the value based on the average nucleotide frequencies, 2 assigns the values based on the average frequencies of nucleotides in the codon, 3 allow the parameters to be estimated freely. All codons are generally not equally likely due to differing tRNA availability, ignoring these processes can lead to biases in the estimated dNdS. Kappa is the transition to transversion ratio. The chemical properties of DNA mutation create a bias toward transitions, and as transitions at third codon positions are more likely to be silent than transversions, ignoring this parameter nearly always results in an overestimation of dS and thus a correspondingly underestimation of dNdS. If the kappa value is set to 1, the transition bias is ignored, while setting it to 0 allows it to be estimated from the data. With these reference to these parameters, counting models such as NG86 (Nei and Gojobori (1986)) can be mirrored by setting K=1 and codonFreq = 0. Alternative settings for these values comes at a cost in that it increase the number of parameters that need to be estimated, however, their incorporation into the model makes for a more biologically relevant estimation of dNdS.

Final notes: It should be noted that these estimates of dNdS are not very sensitive to detecting positive selection. This is for two main reasons, firstly many amino acids are structurally important for the proteins core function and are thus correspondingly nearly invariably, this results in positive selection acting on a relative small number of codons. Secondly, selection tends to be episodic (occurs in spurts). This means it is difficult to detect positive selection by just taking the average dNdS across the entire protein from a single pairwise comparison. To overcome this methods have been developed that detect changes in dNdS across a phylogeny derived from multiple sequence alignment, and I’ll focus on these methods in the next addition!

I’ve been very slack with references for this post because I wonted to focus on the core considerations of using codeml, however, most of the above is based on reading the excelling coverage of this topic in: Maximum likelihood methods for detecting adaptive protein evolution

Bielawski, JP and Yang, Z (2005) Maximum likelihood methods for detecting adaptive protein evolution. In: Nielsen, R, (ed.) Statistical Methods in Molecular Evolution. (103 – 124). Springer-Verlag: New York.

Advertisements

A week with Samsung’s and Google’s new Chromebook.

In one line. I’m sold – it will only get better – shelving plans for an ultrabook First impressions. Firstly, its small, thin and light! Certainly smaller than I was expecting, think larger netbook rather than ultrabook/notebook. That said the keyboard is full size and the screen at 11.6 inches is perfectly reasonable for most tasks. This looks pretty nice, however, it does feel a little cheap due to its all plastic body. Does it feel like a $250 netbook? No way, I doubt the casual observer would think you aren’t using anything but the latest and greatest little ultrabook. So far the keyboard has been great, the keys respond well so touch typing is a breeze. The track pad is also above average based on my experience, in fact its the first laptop I have owned where I haven’t automatically gone for the USB mouse. It should be noted that the keyboard lacks a dedicated cap’s lock and delete keys, however, easy short-cuts are available in their place ([alt] [backspace]=delete, [alt][search]=caps). As for additional keys, the dedicated back and forward browser keys, as well as brightness and sound keys are a nice touch. All in all using this little machine is a pleasant experience, and I doubt the casual user will have much to complain about with regard to typing, using the touch pad, and the screen. I have noticed some negative reaction to the screen over the twittersphere regarding black white contrast, however, I certainly don’t have anything to complain about and the matte finish on this should work well for outside use. Battery life for me has been easily 6-7 hours with normal light internet use (its rated for 5-6 hours with heavier use). Overall, very happy with the look and feel.

ChromeOS. Well what can I say, its pretty much like using the chrome web browser. Startup times are ridiculously quick, pretty much instant for coming out of sleep and less than 10 seconds for a full reboot. It does have a desktop and a launch bar at the bottom of the screen, but most of the time you’ll find yourself working in tabs within the browser window. Perhaps to highlight this best, outside the typical chrome browser settings, there are only a small handful of settings that actually relate to ChromeOS itself (available by clicking the clock on the bottom right corner). Basically, what this means is that setup is nearly zero work, in fact after you provide your google details all your browser history and settings are automatically synced onto the new machine. Everything is backed up to the cloud too, so performing a factory reset is cost less, you are back up and running in a matter of minutes. As long as you trust google, this means no more backups!

Google apps and the cloud. This machine is designed to work well with google’s ecosystem and they give you 100GB free storage for two years, as I’m already a google user for my music, most docs, and email, obviously for me it just works. Importantly, offline access is available via the google drive app, as is email and many other apps in the offline category of the chrome webstore. For some reason I had problems getting offline Docs to work, in retrospect I probably didn’t allow enough time for the online files to sync to the machine (the main offline settings are available at the bottom of the cog options on your google drive homepage). Well anyway, now I can happily edit docs away from wifi and they will automatically synced with Drive once I am back online. Apart from google apps, I also

Offline drive setup can be found under the cog

installed a simple calculator app, angry birds (works well), the weatherbug app, a light image editing (sorry no links, I haven’t really tried these so can’t really recommend them, but their all in the popular downloads section of the web apps store). Coding is bit more of a challenge, there’s a nice little python shell app that is really only good for testing, perhaps a better long term solution will be with a cloud based IDE like cloud9IDE , and I’m sure more solution will be coming online as the popularity of cloud computing grows. Perhaps the best app I have discovered is Chrome Remote Desktop (CRD), this allows you to access all your computers through the browser using pin-codes. Despite the awesomeness of CRD, the big drawback for me was that there is no way of adding a Linux machine to your “computers”, so my plans to run my work computer as a code slave for my chromebook went up in smoke.

Chrome Remote Desktop is brilliant

Hopefully, future updates will add better Linux support for this awesome app (you can access a Linux computer but only through the “share” function that requires you or someone else to be sitting at the slave computer). That said, if you want to give granny this computer helping her out in real time via the CRD will save your life and sanity.

Issues. Besides the ones related to using google docs (please please integrate docs with google scholar as a citation manager), perhaps the biggest is this constant page reloading that seems to occur when you have “too many” tabs open (I’m talking 10ish). The reloading slows everything down and tends to cause docs and especially music (via google music) to hang for a frustrating number of seconds. Hopefully this “feature” will work better in future chromeOS updates.

Bottom line. This thing would be awesome for the road worrier or as a general use house laptop, especially if they were already part of the google docs ecosystem. With an external monitor it might even be perfect starter for your granny. With the SSD its pretty robust, fast to start up, has a small footprint make it a computer that I’ll happily drop into my backpack for that I need a browser now moment.