Tips For Codon Matrices
From Biowiki
Codon matrices in xrate
(See also known issues with DART)
When using XRATE to estimate large rate matrices (e.g. 61*61 codon matrices), the following tips are recommended:
- Decrease the --mininc parameter. This command-line parameter determines the minimum fractional increase in log-likelihood that is required for EM to continue. For models with lots of parameters, small changes in the log-likelihood can represent substantial changes in the lesser-used parameters. You may therefore want to set --mininc to a lower value than its default. For example
xrate --mininc .00001 [...other arguments...]
- Increase the --forgive parameter. This command-line parameter sets the number of "bad" iterations of EM that will be forgiven (these are iterations where the likelihood decreases, instead of increasing; this should never happen in practice - the likelihood should keep asymptotically increasing - but due to precision error, it can occasionally happen in practice, especially with these big rate matrices. I usually set "--forgive 2" for codon matrices, which means that at worst xrate will do 2 unnecessary iterations of EM)
- Start from a uniform (flat) seed. In other words, all the initial rates and probabilities for the model should be the same. Additionally they should be under-estimates (i.e. start with rates a bit lower than the eventual values you anticipate). These both help convergence and reduce the chances of the EM algorithm getting stuck in a local maximum.
-- Ian Holmes - 30 Apr 2010