|
The treatment of conflict within data sets is among the most debated questions in
phylogenetics, and it is an especially relevant concern with the accumulation of large
data sets that combine data from many sources. Indeed, recent genomic-scale phylogenetic
analyses have called attention to the extent of the variation in the phylogenetic
signal among loci. This variation results from a several causes, including differing relative
branch lengths, recombination, and horizontal gene transfer. Regardless of its
source, identifying phylogenetic conflict is an important step in building reliable evolutionary
trees.
We describe a method to partition phylogenetic data sets of discrete characters
based on the pairwise compatibility of characters. Unlike previous approaches, our
method requires no knowledge of the phylogeny, model of evolution, or characteristics
of the data. The method is based on a similarity scoring scheme that measures
how close pairs of characters are to compatibility. The goal is to partition the characters
into clusters so that characters within a cluster are more compatible with each
other than they are with characters in other clusters. While partitioning according to
these criteria is computationally intractable, we show that spectral methods quickly
provide high-quality solutions. We demonstrate that our partitioning method effectively
identifies conflicting phylogenetic signals in simulated and empirical data sets.
This is joint work with Duhong Chen, of the Department of Computer Science,
Iowa State University, and J. Gordon Burleigh, of the Section of Evolution and Ecology,
University of California, Davis.
|