Monday, 5 June 2017

So maybe a web of life makes more sense?

by Nicholas Galuszynski

In my previous post I attempted to outline some of the methods used to build phylogenetic trees. While these are useful tools for explorers of the evolutionary history of life, they all have one fundamental problem. 
They fail to address reticulate evolutionary processes such as hybridization and horizontal gene transfer . While these processes may be rare for most animals, they play an important role in the formation of new plant species . Phylogenetic trees therefore, evoke an oversimplified view of evolution and require non-tree based topologies to accurately express evolutionary histories. These non-tree based topologies are generally found in the form of phylogenetic networks. Phylogenetic networks, like their branching counterparts (trees), incorporate either distance (much like neighbour-joining trees) or discrete (as with maximum likelihood and parsimony trees) data sets to produce a visual representation of the evolutionary relationships between taxa. As with trees, the branch lengths reflect the amount of evolutionary change between taxa, or genetic distance .
Ideally a network display would consist of both tree like as well as reticulated portions . The tree like areas would represent sections of the phylogeny that have no conflict among characters. Areas with a lot of reticulation on the other hand, would represent those portions of the phylogeny where there is either insufficient data to accurately construct a phylogeny or parts of the phylogeny where there are conflicting character –state patterns. Thus, phylogenetic networks provide a visual cue as to the number of potential phylogenetic trees that can fit the data, with more reticulate networks having the greater number of potential trees due to increased conflicts within the data.

Neighbour-net (quality testing)

Neighbour-net phylogenetic networks construct split networks from distance based data sets . Since it is a distance based method, based partly on the neighbour-joining algorithm , it has the advantage of being faster that discrete data based methods. The method attempts to generalise the tree building technique of neighbour-joining by slowing the rate at which connections are made in a similar manner to pyramid clustering . Further similarities between neighbour-net and neighbour-joining are that both are agglomerative algorithms, have similar selection criteria and are both considered to be consistent . The method is straight forward to apply as there are few choices to be made, other than the distance measure. While the display of conflicts has been reported to respond well to increased complexity, making it ideal for the analysis of complex and ambiguous phylogenies .
There are however, a number of issues that have been raised regarding neighbour-net. Firstly, there has been criticism of its use in phylogenetics due to its lack of an obvious tree interpretation, an issue further complicated by the lack of informative theorems about neighbour-net and the need to understand T-theory . This has resulted in sense that much of the interpretations of neighbour-nets have been affected by some degree of subjectivity. Furthermore, neighbour-net has been recognised as a greedy algorithm , meaning that it follows a heuristic construction path that selects for the shortest branch length at each step, finding the local minimum and not necessarily the global minimum. Even though there are these limitations, neighbor-net provides a powerful means to visually inspect conflicts between probable trees produced from large data sets that are prone to fall victim to systematic errors . 

Neighbour-Net for AFLP data (dice distance computed) for the genus Baldellia. Neighbor-Net plots can be quite overwhelming when large data sets are analysed, such as those associated with entire genera. Fortunately labels make cluster recognition simple, leading to subjective interpretations at times. Neighbour-Nets are normally interpreted in conjunction with additional analyses such as parsimony based networks or phylogenetic trees. Image from Arrigo et al., 2011 .

Statistical parsimony (data displaying)

Statistical parsimony builds a network by sequentially connecting taxa in order of increasing character state differences until the parsimony connecting limit is reached . That is, the limit at which parsimony can be considered a reliable method for phylogenetic inference. The analysis is able to process both discrete sequence data as well as distance values. The method is straightforward, requiring no parameter selection and displays evolutionary change in a similar manner to a parsimony tree; the distances between taxa reflect evolutionary change. Since it is based on parsimony this method faces some of the same issues effecting parsimony tree constructions, such as the inability to process large, complex data sets. Under these conditions of increased complexity statistical parsimony disconnects the network, producing an array of separate networks, rather than a single diagram; making the analysis more prone to false negatives . Furthermore, since the network is built sequentially the most parsimonious connections are made at each step, possibly resulting in incomplete character conflicts due to a lack of indirect links and the production of overall networks that do not achieve a maximum parsimony . 
The value of parsimony networks is that they provides additional insight into the phylogenetic history of the haplotype under investigation (inferring populations or species history) as well as the relative abundance of the haplotype .

Parsimony network representing the genealogical relationships of haplotypes with in the Little Karoo endemic Berkheya cuneata (Asteraceae). The nodes (small dots) indicate sequence changes between haplotypes A-E and the size of the coloured circles (haplotypes) indicate the relative frequency of these haplotypes occurring within the samples. The large circles are the most common haplotypes and therefore are generally considered to be older. Out group species included B. fruticosa (a), B. coriacea and B. spinosa. Image taken from Potts et al., 2014.

Relevant literature:

Arrigo, N., Buerki, S., Sarr, A., Guadagnuolo, R., Kozlowski, G., 2011. Phylogenetics and phylogeography of the monocot genus Baldellia (Alismataceae): Mediterranean refugia, suture zones and implications for conservation. Molecular Phylogenetics and Evolution 58, 33–42.

 Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23, 254–267.
Levy, D., Pachter, L., 2011. The neighbor-net algorithm. Advances in Applied Mathematics 47, 240–258.

Moret, B.M.E., Nakhleh, L., Warnow, T., Linder, C.R., Tholse, A., Padolina, A., Sun, J., Timme, R., 2004. Phylogenetic networks: Modeling, reconstructibility, and accuracy. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 13–23.
Morrison, D.A., 2005. Networks in phylogenetic analysis: New tools for population biology. International Journal for Parasitology 35, 567–582.

Potts, A.J., Hedderson, T.A., Vlok, J.H.J., Cowling, R.M., 2013. Pleistocene range dynamics in the eastern Greater Cape Floristic Region: A case study of the Little Karoo endemic Berkheya cuneata (Asteraceae). South African Journal of Botany 88, 401–413.

No comments:

Post a Comment