Potts Research Group Blog: 2017

Thursday, 21 September 2017

Spekboom rooting — testing the stem-damage hypothesis.

In a previous post (To the root of the problem...), I highlighted that the roots of planted truncheons invariably only grow from the very base of the stem — thus, rooting takes place 15-20 cm below the soil surface where the stem has been cut. This was based on a few field observations that have been bolstered by digging up many more plants (must be over 50 by now, all in very different environments). In that same post, I suggested that damage to the stem was required for roots to develop. Together with Nicholas Galuszynski, we had an Honours student conduct a series of treatments to determine if this stem-damage for rooting was correct. Unfortunately, the Honours student left with all the data (another lesson on data management for me), so I'm left with a few photos — but the results are, nonetheless, quite clear.

We had five treatments:

Control, where the truncheon was cut and planted with no addition damaging.
Vertical slits, where two shallow vertical slits were cut (quite shallow: <0.5 cm) for ~10 cm length-ways along the stem ending at the base where using the sharp blade of a pair of secateurs.
Horizontal slits, where a number of horizontal slits were cut (also shallow) cross-ways across the stem.
Peeling, where the sharp secateur-blade was used to shallowly peel (or skin) the outer bark, and,
Deep gouging, where deep cuts were gouged into the stem.

And the overall results are summarised below with photos of single replicates.
1. Control: no roots along the stem

2. Vertical (or longitudinal cuts): no roots along the stem

However, there was some swelling at the base with more (visually assessed) root biomass.

3. Horizontal (or perpendicular cuts): no roots along the stem

4. Scraping: no roots along the stem

5. Vertical (or longitudinal cuts): hooray! roots along the stem

Rooting was consistent across stems with gouges, but not every gouge developed roots. Nonetheless, those that did not develop roots still healed over.

So, the deep gouging makes a substantial contribution to stimulating rooting. The great thing about these results is that they've been replicate by an independent experiment!

Thankfully, Yondela Norman — an Honours student at Rhodes University — conducted a completely independent and different experiment on rooting (at the same time as we were!) and also found that the deep gouges substantially promoted root growth. So, my predictions in this earlier blog were correct. It's great when the data actually supports a hypothesis.

Wednesday, 20 September 2017

Thicket Restoration: Is Spekboom the only answer?

Restoring thicket has largely focused on using Portulacaria afra (spekboom), and to a much lesser extent on "late-successional" tree species, as the agents for returning ecosystem functioning. I write "late-successional" because we are a far way off from understanding the successional patterns in thicket; nonetheless, the case is quite well established that, in arid thicket, the presence of many tree species — such as Pappea capensis — is due to the presence of spekboom (e.g. van der Vyver et al., 2013). This is largely due to the increase in available soil moisture that is a by-product of a spekboom dominant layer (van Luijk et al., 2013; Wilman et al., 2014).

However, does this necessarily mean that spekboom is the best pioneer species to re-introduce or bolster into degraded lands? Recent field observations lead me to think that we might have overlooked an pioneer candidate: Crassula tetragona.

While looking investigating a donga (a dry gulley formed by erosion) in Addo that was surrounded by dense intact thicket, it struck me that although there were plenty of spekboom plants in the vicinity (e.g. the orange arrow in the picture below), I could only find one small individual in the donga — although, I argue elsewhere (The number one invasive species in arid lands globally: Spekboom?) that this is due to a lack of an adequate dispersal agent. What had colonised this inhospitable environment was Crassula tetragona (blue arrow in the picture below). Also, where it had colonised, colonies of clonelets were forming wherever leaf-clusters had broken off the parent plant. This clump is acting as a silt and litter trap, with the soil already at least 5 cm above the surrounding soil. This is an example of spontaneous rehabilitation and C. tetragona is changing the microclimate of this eroded donga testing.

However, this is in a herbivore-exclusion area and along a drainage line where there will be some moisture and soil (the donga had not hit bed rock yet). How will this species fair in a harsher environment?

Well, the images below were taken from Kaboega Private Game Reserve on a heavily eroded north-facing (and so hot!) slope. There is no topsoil - it is right down to the eroding shale bedrock. The light green plants in the image below are C. tetragona. These plants are filling up the small rills on a slope where all the topsoil has been stripped off — and they are doing it via vegetation growth: a crown piece gets knocked off (falls off?), and this readily takes root. This is an environment where a range of herbivores are present, including kudu and impala. Why is it not being eaten? This species is likely to be unpalatable (however, Curtis & Perrin, 1979, do list it as a preferred species for rodents).

In discussing this as an option with Jan Vlok, he highlighted that this is a species that does not do well in areas with high herbivore activity due to trampling. So some activity is okay, but not too much.

Given that this is a species that has is readily self-propagating across a range of arid and hot environments, it should certainly be considered as part of the rehabilitation suite. Coupling this with erosion control measures, such as mini-ponds, could provide the boost to create vegetated hotspots.

References:

Curtis BA & Perrin MR (1979) Food preferences of the vlei rat (Otomys Irroratus) and the four-striped mouse (Rhabdomys pumilio). South African Journal of Zoology, 14:224-229. DOI: 10.1080/02541858.1979.11447675
van der Vyver ML, Cowling RM, Mills AJ, Difford M (2013) Spontaneous return of biodiversity in restored subtropical thicket: Portulacaria afra as an ecosystem engineer. Restoration Ecology 21:736-744.
van Luijk G, Cowling RM, Riksen MJPM & Glenday J (2013) Hydrological implications of desertification: Degradation of South African semi-arid subtropical thicket. Journal of Arid Environments 91:14-21.
Wilman, V, Campbell EE, Potts AJ & Cowling, RM (2014) A mismatch between germination requirements and environmental conditions: Niche conservatism in xeric subtropical thicket canopy species? South African Journal of Botany 92: 1-6.

Dr John Kani: why science and universities have failed South Africa

by Alastair Potts

I had the unusual pleasure of attending an awards ceremony for researchers and lecturers where the guest speaker, Dr John Kani, berated us in an unstinting barrage of uncomfortable truth. His voice, as unique as that of Morgan Freeman, captivated us as he poured his scorn and derision upon the state of the country — and how the universities are, in part, to blame. An interesting choice of topic to an event aimed at celebrating research, teaching and engagement — but an entirely necessary call to arms.

So what did John Kani say? It was a wide ranging talk, but below are two of the highlights that stuck with me (Crucial note: I was not prepared for note taking — it was an awards event after all — and although I've remembered these points, they are [largely] in my own words):

Scientists — what is it that you actually do? Dr Kani highlighted that scientists get so wrapped up in their private worlds [and usually only hang out with other scientists] that they forget to explain what it is they do and why it is important. But worse, we don't give much thought on how to make our research accessible to the lay person, or school children. We publish in rarefied journals that the even the most ardent of non-scientist science enthusiasts will a. battle to understand, and (even worse) b. won't have access to [locked behind publisher paywalls]. He shares the story of the selection committee for the prestigious Order of Mapungubwe battling to understand what it is that any given scientist has done that may warrant the bestowing of such an award. If we cannot (do not?) share our research findings with the public at large, then what is the point of science? I feel that this point is spot on: we cannot expect non-specialists to rise to the level of specialists (who have spent years learning the theory and jargon). It is the specialists who need to rise to the level of simplicity [Einstein had much to say about simplicity!].

Universities — why are you so quiet? Another point that that he raised was that Universities have, in his view, shied away from politics and advising government. He makes a good point: where are the qualifications specially designed for councillors and others in public service? Surely it is the universities who have the responsibility to create the diploma and degree benchmarks to ensure that those in public service are qualified to serve the public. Dr Kani relates the historical narrative of the NATS gaining power, based on a large part to the argument of the Afrikaans culture being under threat of erosion or extinction — and within a few years (six?) it was possible to study from pre-primary all the way through to University in Afrikaans. The NATS did this by pouring money into this agenda — i.e. into Universities, the Arts etc. Dr Kani believes that no such effort, nor commitment, has been made by the current government, and that universities have accepted this. Universities have not made a strong enough call to demand to be the agents of change for the country. [Rather, the funding for universities has declined without much active outcry which led to the burden being passed onto students].

Dr Kani left us with a challenge: be the driving agents of change in South Africa. We have the most powerful tool available — education. Given the new name of Nelson Mandela University and the university's slogan, "Change the World", this message could not have come at a more appropriate time.

It is time to to turn the ivory towers into the pillars that support a better nation.

Wednesday, 7 June 2017

The joys of recieving DNA sequences.

by Nicholas Galuszynski

The struggle is real! Image courtesy Sketching Science.

So you've done it!
After hours (that felt like years) battling with DNA extraction methods and PCR protocols you have finally received the sacred e-mail, "please see attached sequences..." and it all starts seeming worthwhile, maybe.
But now what, you've done the lab work but what are you supposed to do with these .ab1 files? What are .ab1 files any way? I WANT MY SEQUENCES!!!

So firstly, .ab1 files are are the output format for raw DNA data produced by Applied Biosystems' Sequencing Analysis Software. These files contain an electropherogram as well as the DNA base sequence, which can be viewed using a DNA viewer program. Secondly, there are many ways of opening these files, however, freely available software options for handling your .ab1 files, normally have limited capabilities compared to their expensive counterparts such as CodonCode Aligner.

So I've recently been using the trial version of Codon to check out some Cyclopia sequences, and damn this program is easy to use once you know where and what the different functions do. After checking out a few tutorial video on YouTube, I was able to do the basics such as import my sequences, create contigs based on names and visually inspect and correct these contigs. If you are starting out with Codon, watch those videos! From that base you can start trouble shooting and finding your own ways to do things, which is made much easier with the wide range to tutorials available from the manufacturers.

Oddly enough, one of the most useful features of the software, comparing all the contigs you've assembled, needs to be added to the toolbar. But this is easily explained by CodonCode. Before you get too into creating contigs and aligning them all to each other, it is often really useful to run a base call on your freshly relieved sequences. This can improve the quality of your sequence, resulting in better alignment of the forward and reverse sequence. I tended to have a lot of strange indels pop up in the sequences for a cDNA region, possible due to sequencing errors in highly repetitive A and T rich sections. These made contig comparisons difficult and I ended up selecting the best quality sequence as a reference sequence (make a reference seq by selecting the sample and pressing ctrl+alt+m), I then selected the Clustal-Omega method of aligning the contigs as apparently this option handles these gaps a bit better than Muscle that tends to delete them. But maybe deleting them isn't the worst option if all you're interested in is detecting SNPs and if that is your desire, you can then use Codon to find primer sites around those SNPs without even having to open another program. That is however, until the end of the 30 day trial after which you'll have to buy a licence or like reinstall windows or something.

So CodonCode is a pretty useful tool for the molecular biologist, offering a lot of functionality in one neat package. This however, comes at a price so make the best use of your trial and check out the helpful link's in this post before you activate your copy.

Monday, 5 June 2017

So maybe a web of life makes more sense?

by Nicholas Galuszynski

In my previous post I attempted to outline some of the methods used to build phylogenetic trees. While these are useful tools for explorers of the evolutionary history of life, they all have one fundamental problem.

They fail to address reticulate evolutionary processes such as hybridization and horizontal gene transfer . While these processes may be rare for most animals, they play an important role in the formation of new plant species . Phylogenetic trees therefore, evoke an oversimplified view of evolution and require non-tree based topologies to accurately express evolutionary histories. These non-tree based topologies are generally found in the form of phylogenetic networks. Phylogenetic networks, like their branching counterparts (trees), incorporate either distance (much like neighbour-joining trees) or discrete (as with maximum likelihood and parsimony trees) data sets to produce a visual representation of the evolutionary relationships between taxa. As with trees, the branch lengths reflect the amount of evolutionary change between taxa, or genetic distance .

Ideally a network display would consist of both tree like as well as reticulated portions . The tree like areas would represent sections of the phylogeny that have no conflict among characters. Areas with a lot of reticulation on the other hand, would represent those portions of the phylogeny where there is either insufficient data to accurately construct a phylogeny or parts of the phylogeny where there are conflicting character –state patterns. Thus, phylogenetic networks provide a visual cue as to the number of potential phylogenetic trees that can fit the data, with more reticulate networks having the greater number of potential trees due to increased conflicts within the data.

Neighbour-net (quality testing)

Neighbour-net phylogenetic networks construct split networks from distance based data sets . Since it is a distance based method, based partly on the neighbour-joining algorithm , it has the advantage of being faster that discrete data based methods. The method attempts to generalise the tree building technique of neighbour-joining by slowing the rate at which connections are made in a similar manner to pyramid clustering . Further similarities between neighbour-net and neighbour-joining are that both are agglomerative algorithms, have similar selection criteria and are both considered to be consistent . The method is straight forward to apply as there are few choices to be made, other than the distance measure. While the display of conflicts has been reported to respond well to increased complexity, making it ideal for the analysis of complex and ambiguous phylogenies .

There are however, a number of issues that have been raised regarding neighbour-net. Firstly, there has been criticism of its use in phylogenetics due to its lack of an obvious tree interpretation, an issue further complicated by the lack of informative theorems about neighbour-net and the need to understand T-theory . This has resulted in sense that much of the interpretations of neighbour-nets have been affected by some degree of subjectivity. Furthermore, neighbour-net has been recognised as a greedy algorithm , meaning that it follows a heuristic construction path that selects for the shortest branch length at each step, finding the local minimum and not necessarily the global minimum. Even though there are these limitations, neighbor-net provides a powerful means to visually inspect conflicts between probable trees produced from large data sets that are prone to fall victim to systematic errors .

Neighbour-Net for AFLP data (dice distance computed) for the genus Baldellia. Neighbor-Net plots can be quite overwhelming when large data sets are analysed, such as those associated with entire genera. Fortunately labels make cluster recognition simple, leading to subjective interpretations at times. Neighbour-Nets are normally interpreted in conjunction with additional analyses such as parsimony based networks or phylogenetic trees. Image from Arrigo et al., 2011 .

Statistical parsimony (data displaying)

Statistical parsimony builds a network by sequentially connecting taxa in order of increasing character state differences until the parsimony connecting limit is reached . That is, the limit at which parsimony can be considered a reliable method for phylogenetic inference. The analysis is able to process both discrete sequence data as well as distance values. The method is straightforward, requiring no parameter selection and displays evolutionary change in a similar manner to a parsimony tree; the distances between taxa reflect evolutionary change. Since it is based on parsimony this method faces some of the same issues effecting parsimony tree constructions, such as the inability to process large, complex data sets. Under these conditions of increased complexity statistical parsimony disconnects the network, producing an array of separate networks, rather than a single diagram; making the analysis more prone to false negatives . Furthermore, since the network is built sequentially the most parsimonious connections are made at each step, possibly resulting in incomplete character conflicts due to a lack of indirect links and the production of overall networks that do not achieve a maximum parsimony .

The value of parsimony networks is that they provides additional insight into the phylogenetic history of the haplotype under investigation (inferring populations or species history) as well as the relative abundance of the haplotype .

Parsimony network representing the genealogical relationships of haplotypes with in the Little Karoo endemic Berkheya cuneata (Asteraceae). The nodes (small dots) indicate sequence changes between haplotypes A-E and the size of the coloured circles (haplotypes) indicate the relative frequency of these haplotypes occurring within the samples. The large circles are the most common haplotypes and therefore are generally considered to be older. Out group species included B. fruticosa (a), B. coriacea and B. spinosa. Image taken from Potts et al., 2014.

Relevant literature:

Arrigo, N., Buerki, S., Sarr, A., Guadagnuolo, R., Kozlowski, G., 2011. Phylogenetics and phylogeography of the monocot genus Baldellia (Alismataceae): Mediterranean refugia, suture zones and implications for conservation. Molecular Phylogenetics and Evolution 58, 33–42.

Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23, 254–267.

Levy, D., Pachter, L., 2011. The neighbor-net algorithm. Advances in Applied Mathematics 47, 240–258.

Moret, B.M.E., Nakhleh, L., Warnow, T., Linder, C.R., Tholse, A., Padolina, A., Sun, J., Timme, R., 2004. Phylogenetic networks: Modeling, reconstructibility, and accuracy. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 13–23.

Morrison, D.A., 2005. Networks in phylogenetic analysis: New tools for population biology. International Journal for Parasitology 35, 567–582.

Potts, A.J., Hedderson, T.A., Vlok, J.H.J., Cowling, R.M., 2013. Pleistocene range dynamics in the eastern Greater Cape Floristic Region: A case study of the Little Karoo endemic Berkheya cuneata (Asteraceae). South African Journal of Botany 88, 401–413.

Friday, 14 April 2017

Balancing drought versus herbivory survival in Spekboom restoration

by Alastair Potts

There are three major stresses that spekboom cuttings need to survive: drought, herbivory and frost. The last stress has been discussed in previous posts (see Rob Duker's "Thicket restoration and frost: the forgotten enemies") and can be overcome quite simply by avoiding areas that experience frosts.

Drought survival was the focus of initial spekboom restoration research. This suggested that smaller cuttings (so-called "fingerlings") had extremely low survival rates, and survival increased with the size of the cutting. This led to the suggestion of the current truncheon size of ~>4 cm stem diameter as part of the restoration protocol.

These truncheons are planted to a depth of ~10-20 cm (sometimes deeper) sticking upright out of the ground. In a previous post ("To the root of the problem with Spekboom restoration"), I highlighted that the roots tend to only grow from the cut base, which has been been buried deep in the soil and limiting the access the plant has to soil moisture after light rainfall events. However, an additional problem is that these cuttings take a long time to develop a root-to-shoot ratio that is sufficient to survive drought, but more importantly, withstand herbivory.

A root-to-shoot ratio describes the amount of plant tissue that has supportive functions (i.e. the roots) relative to the amount of tissue that has growth functions (i.e. above-ground stems and leaves). Roots allow a plant to absorb water and nutrients from the surrounding soil, and thus root-to-shoot ratios are usually discussed in terms of plant health and water/nutrient absorption. This is certainly an important consideration for spekboom truncheons, but what is of greater importance is the effect that this low root-to-shoot ratio has on herbivory survival.

Without an established root system (relative to the above-ground plant), a spekboom truncheon is more susceptible to browsing and catastrophic herbivory-related uprooting. Browsers eating spekboom generally include a pulling motion after the bite; this pulling motion can easily uproot a truncheon. (Note that baboons have also been observed pulling up truncheons).

Spekboom has evolved to withstand heavy browsing from indigenous game. Even in an area with arguably the highest elephant-browsing density in the world — at Hapoor Dam in Addo Elephant National Park where the elephants have decimated the thicket vegetation surrounding the dam (Landman et al. 2012) — spekboom is one of the first species to return to canopy dominance as you move away from the dam. Also, in relation to its annual rainfall, AENP supports the highest biomass of herbivores anywhere in Africa (Mills et al. 2014), and likely the world. As an aside, an untested hypothesis is that the primary dispersal agents of spekboom are elephants as there habit of walking while munching on a spekboom branch results in a rain of vegetative parts that can propagate new individuals.

So, why are spekboom truncheons susceptible to indigenous herbivory? The root-to-shoot ratio is wrong. There is not enough going on under the ground to support what is going on above the ground.

We need the truncheons to be large to withstand drought-stress, but being large makes them more apparent in the landscape, take a long time to develop the necessary roots, and ultimately highly susceptible to herbivory.

How can we deal with this catch twenty-two?

My suggestion is to return to smaller truncheon sizes (the "fingerlings"). The fingerlings:

are less apparent in the landscape, and
can develop the right root-to-shoot ratio in a shorter time period.

Also, this is how I imagine spekboom has been distributed through the landscape, via propagule-rain driven by elephant browsing. But this rain would have been throughout the year (not a once off planting as in restoration), and so the fingerlings would have been rained down during windows of opportunity and when establishment would not be possible (e.g. in droughts).

Thus, this still leaves the problem of drought survival. A lone spekboom fingerling out in the open has little chance of survival. However, there are a few things that can be done to improve these chances:

Planting spekboom close together in a contoured band has been shown to have outstanding results at the Camdeboo National Park restoration sites established by Peter Burdett (results shared by Bruce Taplin at the Thicket Forum 2015).
The pure mechanical effect of creating a contour line, by rip-ploughing, soil bags, car tyres etc. creates drainage lines of increased soil moisture. If positioned correctly, these lines can also create an ameliorated microclimate for a small plant.
Loosening the soil prior to planting has also been reported (reference needed) to improve truncheon survival and growth.
Hydromulch, a woodfibre mulch applied with water and a tackifier (natural glue), traps water and dew and creates an excellent medium for encouraging root development in fingerlings (see the images in my post about the rooting window hypothesis).

Thus, the combination of mechanical contour ploughing that loosens the soil (or some other water-trapping technique) and creates microclimates (on south-facing micro-slopes in the ditch and in the soil), hydromulch and a high density of spekboom fingerlings could very well be the recipe for seeding the landscape with spekboom resilient to drought and herbivory. As the mechanical structure of the contour breaks down from erosion, the spekboom will likely take over as a biological contour, filtering water and erosion and increasing water infiltration.

Saturday, 1 April 2017

Proteinase K, can it improve the quality of DNA extracted from Sulfhydryl rich leaves ?

By Timothy Macqueen

I have recently begun work on attempting to extract DNA from the leaves of Protea susannae and P. eximia hybrids using the Doyle&Doyle CTAB method. There have been several set-backs in the quality of the DNA I have extracted. In many of the DNA gels I have run, only one or two of my samples have shown successful amplification.

An early gel run. The only amplified DNA sample was the second to right. The rest were PCR and gel positives

The 260/230 ratios of my samples has been consistently low. I wanted to determine the reason for this, so I posted my problem on the scientific community website, Researchgate. On it, H. Ayyez of the University of Al-Qadisiyah suggested that this was because I was experiencing organic contamination due to not adding Proteinase K as a part of my extraction.

Before I continue, I'd better explain the importance of the 260/280 and 260/230 values to DNA amplification and analysis. A Nano-drop is used to measure these values. The 260/280 value of a DNA sample indicates the purity of the DNA extracted. A value around 1.8 is considered to be pure.

Low 260/280 ratios may be caused by:
• Presence of residual phenol or other reagents associated with the extraction protocol
• A very low concentration( > 10 ng/ul).of nucleic acid

The 260/230 ratio is important as a second measure of DNA purity after the 260/280 ratio. An optimum value for this ratio is between 2.0-2.2.

Low 260/230 ratios may be the result of:
• Carbohydrate carryover (often a problem with plants).
• Residual phenol from nucleic acid extraction. OR most importantly-

The presence of organic contaminants, such as (but not limited to): phenol, TRIzol, chaotropic salts and other aromatic compounds. Samples with 260/230 ratios below 1.8 are considered to have a significant amount of these contaminants that will interfere with downstream applications. This is especially true for reverse transcription.

Protea hybrid at Van Staadens nature reserve. It contains many of the distinct morphological traits of P. susannae. That distinct sulphurous odor, distinctive to leaves containing sulfhydryl compounds. Could these organic compounds be the cause for all my troubles trying to amplify the DNA of my hybrid proteas?

One such compound is a Thiol. This is a organosulphur compound that contains carbon bonded sulfhydryl group. This is known to occur in my samples because the leaves of P. susannae and its hybrids when crushed, produce a distinct sulfurous odor. This odor is distinct to Thiols.

The 260/280 values in my samples were very close to or at 'pure' ratio values while my 260/230 (yellow highlighting) ratios were very low.

So what can Proteinase K do to improve the quality of my 260/230 ratios? Well, Proteinase K is a serine proteinase which is able neutralise a wide variety of contaminating proteins from organic compounds. It is also used to inhibit nucleases from degrading the nucleic acids to be extracted.

Will it work? Well, the only way I can know for certain is by testing this hypothesis. Currently our lab is out of stock of Proteinase K. We have just ordered a new batch though. Once it is received I will use it in my extraction method and report back to this blog on whether or not it improved the quality and purity of my DNA extractions.

References

Cremlyn, R.J. 1996. An Introduction to Organosulfur Chemistry. Chichester: John Wiley and Sons. ISBN 0-471-95512-4.

Patai, S. 1974. The chemistry of the thiol group. London: Wiley. ISBN 0-471-66949-0.

Links to website references (All accessed on 01/04/2017)

https://www.ohio.edu/plantbio/staff/showalte/MCB%20730/MCB%207300%20Lab%203%20minipreps.pdf

http://agscientific.com/blog/2013/02/10-questions-you-want-to-ask-about-proteinase-k/

http://www.nanodrop.com/Library/T042-NanoDrop-Spectrophotometers-Nucleic-Acid-Purity-Ratios.pdf

Tuesday, 14 March 2017

The trees of life: what are they

Evolution has become synonymous with the image of the tree of life where each branch depicts the diversification of life on our planet. While these images are often nothing more than a reflection of an evolutionary narrative, their validity is rooted in the mathematically derived structures produced from the data sets painstakingly extracted from hours of lab work. But not all of these trees tell you the same things, some have branches whose lengths represent evolutionary time between taxa (ultrametric trees) while others (cladograms) only give you the relative pattern of common ancestry.
The various approaches to constructing a tree can be separated into two general groups; distance methods (eg: Neighbour-joining) and discrete methods (eg: Maximum-likelihood, Maximim-parsimony and Bayesian inference). Distance methods have the advantage that they can analyse both sequence data as well as banding patterns as these formats are easily converted into pair-wise distance matrices, which are then used in the construction of a tree. Discrete methods are limited to sequence data where each character is processed as information. While this limits the type of data that can be analysed, the explicit functions that relate the tree to the data allow for the analysis and comparison of different evolutionary hypotheses to the observed data . These different methods are discussed below.


While some trees may look alike, they are telling us different stories. A) Neighbour joining trees and C) Maximum-likelihood trees both have scales representing evolutionary distance, while B) Maximum-parsimony does not. Trees taken from Potts et al., 2004, doi:10.1093/sysbio/syt052

Neighbour joining (shortest steps wins)

Neighbour joining involves the calculation of evolutionary distances between each pair of taxa. These values are then placed into a pair wise difference matrix and the relationships between these distance vales are used to construct a tree. The process of constructing a tree from such data starts with a completely unresolved tree (a star with arms of equal length). The pair of taxa with the lowest distance score are identified and connected to form a new node, the branch lengths (distance) of these paired taxa to the node are calculated. New distance values are then calculated between the remaining taxa and the node (each time replacing the paired taxa involved in its formation). This new distance matrix is then evaluated and the process begins again with next shorted distance pair is then added to form the next new node.
The main benefit of using a neighbour joining approach for tree construction is that it is much faster than other, computationally demanding, options. This allows for the analysis of large data sets (with more than 100 taxa) as well as validity tests such as bootstrapping and jack-knifing. Furthermore, the correctness of the tree produced is generally good and maintains it statistical constancy when the matrix is of an additive nature, while not functioning on the assumption of a constant rate of evolution allowing for a variety of evolutionary theories to be evaluated. However, this relatively straight forward approach follows a problem solving heuristic, making the best possible choice at each step by reassessing the distance matrix based on newly produced nodes. Such an approach does not always identify the shortest tree overall, resulting in tree topologies that don’t achieve a Balanced Minimum Evolution (BME). Neighbour joining approaches have generally been superseded by more accurate discrete data methods at the cost of speed and computational ease.

Maximum-parsimony (most simple tree wins)

Maximum-parsimony attempts to construct a tree that requires the fewest number of steps to explain the observed variability in the data, thus achieving a BME. Evolutionarily speaking this means that the most likely phylogeny is the one that requires the lowest amount of evolutionary change. This requires that every possible tree topology be run and scored based on how well they imply a parsimonious distribution of data. Once every potential tree has been scored, the tree that produces the most parsimonious data distribution is selected. This required a great deal of computational power, limiting the number of samples that can be analysed in this way. Heuristic methods have been developed to overcome this challenge. This come with a trade-off though, and the most parsimonious tree is not guaranteed when a hill-climbing algorithm is adopted.
Further issues associated with this method include its tendency to underestimate evolutionary change since homoplasies are often likely to be over looked in order to produce the most parsimonious tree, but this might not always be the case. A second problem associated with this method is that there have been reports of it being statistically inconsistent. This means that under some conditions there is no guarantee that a true evolutionary tree will be produced even with sufficient data available. Maximum parsimony does offer value to intraspecific phylogenies where relatively small number of taxa with relatively recent evolutionary divergence histories, as this will reduce the potential for long-branch attractions. Long-branch attractions are likely to occur when there is a high level of divergence between sequences, or when rates of evolution between sequences are variable.

Maximum-likelihood (most probable tree wins)

Maximum likelihood is used to estimate some unknown descriptor of a probability model. In phylogenetic analyses there may be many parameters that need to be addressed in a model, maximum likelihood will select a value for a parameter that results in the maximum probability of the observed data. This results in the development of an evolutionary tree that makes the observed data most probable. Following this maximum likelihood analysis phylogenetics underwent a steady development and the computational challenges associated with the method being overcome as well as the models becoming more biologically realistic. A general maximum likelihood approach was finally produced, overcoming the computational difficulty of the analyses. While sequence data is the most practical for maximum likelihood analysis, due to the rate of genetic divergence being associated with nucleotide changes, there are cases where restriction site data have been used where an appropriate model was available.
Due to the fact that maximum likelihood methods require differences in evolutionary rate between sites and lineages, it is well suited to the analysis of distantly related taxa. However, the sample size is often limited by computational power as it analyses all potential combinations of tree topology and branch length. This limitation has , however, been addressed by using pruning algorithms that reduce the amount of data being processed independently, rather the likelihood of subtrees is calculated. The advantages of using maximum likelihood to produce a phylogenetic tree is that the statistics behind the selection of the most likely tree topology are well understood, the explicit model of evolution can be made to fit the data and the branch lengths are better accounted for, resulting in more realistic branch lengths that reflect evolutionary rates.

Bayesian inferences (most visited tree wins)

Bayesian inference of phylogeny applies a likelihood function to determine the posterior probability of trees. That is, the probability of a tree after new evidence or background information has been taken into account. Initially, all possible trees are considered equally probable and its likelihood calculated under a Markov model of character evaluation. This involves evaluating every potential tree and for each tree, integration over every combination of branch length and model parameter. This would be exceptionally computationally taxing, but fortunately there are numerical models such as the Marcov chain Monte Carlo that has allowed for the application of Bayesian inference to determine evolutionary tree typologies. This involves two steps. First, a new tree is proposed from a stochastic modification of the existing tree. This new tree is then accepted or rejected based on its probability. If this new tree is accepted, it is subjected to further modifications and retested. If the Marcov chain is correctly constructed and run, the number of times a specific tree is visited is proportional to it posterior probability.
Bayesian inference of phylogeny offers the opportunity to analyse large data sets and produce tree topologies that are correlated to an evolutionary model of your choice, allowing for the selection of a model that best fits the data. It is however, important to select the correct mode as it has been shown that over simplified models are likely to produce higher posterior probability values. Furthermore, posterior probability values tend to be higher than bootstrap values calculated for maximum likelihood and parsimony phylogenies.

So there you have it, the trees of life.

Pretty straight forward right? But what happens when using a tree doesn’t make sense, like when a new species results from hybridization. No two tree branches grown into each other and then split again into three or maybe even five new branches, although this is the case for some species phylogenies. In my next post I will be discussing gene networks, the solution to these issues of reticulate evolution.