Biologists often face the following problem: Given a set of taxa and
information about themwhich can be morphological characters, corresponding
segments of biomolecular sequences, gene order in their whole genomes, etc.  ,
find the true phylogenetic tree.
Note that not every physical process allows to reconstruct its history,
therefore it is good luck if phylogeny can be reconstructed at all. The choice
of data and methods for phylogeny reconstruction determine the output, although
reasonable choices give similar results. I will discuss the sources of
disagreement about the available methods. If data type and method are agreed on,
we face a largescale computing problem that has two complexity issues: the
amount of data it needs and the usual computational complexity.
Analytic investigation of the methods is made possible by models of
biomolecular sequence evolution. These models involve randomness.
A widelystudied model for generating binary sequences is to `evolve'
them on a tree according to a symmetric Markov process. This model is
known as the CavenderFarrisNeyman model in phylogeny, and the
`symmetric binary channel' or the `symmetric 2state Poisson model'
in other areas. The CFN model provides a simple model for the evolution
of purinepyrimidine sequences. The significance of this simple model is,
that phenomena shown for the CFN model often extend to more realistic
models of sequence evolution.
The abstract phylogeny reconstruction problem is telling the true
(model) CFN tree from the generated sequences with high probability (whp).
We show that under the CFN model distinguishing the true tree from a false one whp,
using sequences generated on the true tree, is substantially `easier'
(in terms of the sequence length needed) than determining the true tree whp.
