Tips For Codon Matrices

From Biowiki
Jump to: navigation, search

Codon matrices in xrate

(See also known issues with DART)

When using XRATE to estimate large rate matrices (e.g. 61*61 codon matrices), the following tips are recommended:

  1. Decrease the --mininc parameter. This command-line parameter determines the minimum fractional increase in log-likelihood that is required for EM to continue. For models with lots of parameters, small changes in the log-likelihood can represent substantial changes in the lesser-used parameters. You may therefore want to set --mininc to a lower value than its default. For example
    xrate --mininc .00001  [...other arguments...]
  2. Increase the --forgive parameter. This command-line parameter sets the number of "bad" iterations of EM that will be forgiven (these are iterations where the likelihood decreases, instead of increasing; this should never happen in practice - the likelihood should keep asymptotically increasing - but due to precision error, it can occasionally happen in practice, especially with these big rate matrices. I usually set "--forgive 2" for codon matrices, which means that at worst xrate will do 2 unnecessary iterations of EM)
  3. Start from a uniform (flat) seed. In other words, all the initial rates and probabilities for the model should be the same. Additionally they should be under-estimates (i.e. start with rates a bit lower than the eventual values you anticipate). These both help convergence and reduce the chances of the EM algorithm getting stuck in a local maximum.

-- Ian Holmes - 30 Apr 2010