Wednesday, 7 June 2017

The joys of recieving DNA sequences.

by Nicholas Galuszynski

The struggle is real! Image courtesy Sketching Science.
So you've done it!
After hours (that felt like years) battling with DNA extraction methods and PCR protocols you have finally received the sacred e-mail, "please see attached sequences..." and it all starts seeming worthwhile, maybe.
But now what, you've done the lab work but what are you supposed to do with these .ab1 files? What are .ab1 files any way? I WANT MY SEQUENCES!!!

So firstly, .ab1 files are are the output format for raw DNA data produced by Applied Biosystems' Sequencing Analysis Software. These files contain an electropherogram as well as the DNA base sequence, which can be viewed using a DNA viewer program. Secondly, there are many ways of opening these files, however, freely available software options for handling your .ab1 files, normally have limited capabilities compared to their expensive counterparts such as CodonCode Aligner

So I've recently been using the trial version of Codon to check out some Cyclopia sequences, and damn this program is easy to use once you know where and what the different functions do. After checking out a few tutorial video on YouTube, I was able to do the basics such as import my sequences, create contigs based on names and visually inspect and correct these contigs. If you are starting out with Codon, watch those videos! From that base you can start trouble shooting and finding your own ways to do things, which is made much easier with the wide range to tutorials available from the manufacturers.

Oddly enough, one of the most useful features of the software, comparing all the contigs you've assembled, needs to be added to the toolbar. But this is easily explained by CodonCode. Before you get too into creating contigs and aligning them all to each other, it is often really useful to run a base call on your freshly relieved sequences. This can improve the quality of your sequence, resulting in better alignment of the forward and reverse sequence. I tended to have a lot of strange indels pop up in the sequences for a cDNA region, possible due to sequencing errors in highly repetitive A and T rich sections. These made contig comparisons difficult and I ended up selecting the best quality sequence as a reference sequence (make a reference seq by selecting the sample and pressing ctrl+alt+m), I then selected the Clustal-Omega method of aligning the contigs as apparently this option handles these gaps a bit better than Muscle that tends to delete them. But maybe deleting them isn't the worst option if all you're interested in is detecting SNPs and if that is your desire, you can then use Codon to find primer sites around those SNPs without even having to open another program. That is however, until the end of the 30 day trial after which you'll have to buy a licence or like reinstall windows or something.

So CodonCode is a pretty useful tool for the molecular biologist, offering a lot of functionality in one neat package. This however, comes at a price so make the best use of your trial and check out the helpful link's in this post before you activate your copy. 

Monday, 5 June 2017

So maybe a web of life makes more sense?

by Nicholas Galuszynski

In my previous post I attempted to outline some of the methods used to build phylogenetic trees. While these are useful tools for explorers of the evolutionary history of life, they all have one fundamental problem. 
They fail to address reticulate evolutionary processes such as hybridization and horizontal gene transfer . While these processes may be rare for most animals, they play an important role in the formation of new plant species . Phylogenetic trees therefore, evoke an oversimplified view of evolution and require non-tree based topologies to accurately express evolutionary histories. These non-tree based topologies are generally found in the form of phylogenetic networks. Phylogenetic networks, like their branching counterparts (trees), incorporate either distance (much like neighbour-joining trees) or discrete (as with maximum likelihood and parsimony trees) data sets to produce a visual representation of the evolutionary relationships between taxa. As with trees, the branch lengths reflect the amount of evolutionary change between taxa, or genetic distance .
Ideally a network display would consist of both tree like as well as reticulated portions . The tree like areas would represent sections of the phylogeny that have no conflict among characters. Areas with a lot of reticulation on the other hand, would represent those portions of the phylogeny where there is either insufficient data to accurately construct a phylogeny or parts of the phylogeny where there are conflicting character –state patterns. Thus, phylogenetic networks provide a visual cue as to the number of potential phylogenetic trees that can fit the data, with more reticulate networks having the greater number of potential trees due to increased conflicts within the data.

Neighbour-net (quality testing)

Neighbour-net phylogenetic networks construct split networks from distance based data sets . Since it is a distance based method, based partly on the neighbour-joining algorithm , it has the advantage of being faster that discrete data based methods. The method attempts to generalise the tree building technique of neighbour-joining by slowing the rate at which connections are made in a similar manner to pyramid clustering . Further similarities between neighbour-net and neighbour-joining are that both are agglomerative algorithms, have similar selection criteria and are both considered to be consistent . The method is straight forward to apply as there are few choices to be made, other than the distance measure. While the display of conflicts has been reported to respond well to increased complexity, making it ideal for the analysis of complex and ambiguous phylogenies .
There are however, a number of issues that have been raised regarding neighbour-net. Firstly, there has been criticism of its use in phylogenetics due to its lack of an obvious tree interpretation, an issue further complicated by the lack of informative theorems about neighbour-net and the need to understand T-theory . This has resulted in sense that much of the interpretations of neighbour-nets have been affected by some degree of subjectivity. Furthermore, neighbour-net has been recognised as a greedy algorithm , meaning that it follows a heuristic construction path that selects for the shortest branch length at each step, finding the local minimum and not necessarily the global minimum. Even though there are these limitations, neighbor-net provides a powerful means to visually inspect conflicts between probable trees produced from large data sets that are prone to fall victim to systematic errors . 

Neighbour-Net for AFLP data (dice distance computed) for the genus Baldellia. Neighbor-Net plots can be quite overwhelming when large data sets are analysed, such as those associated with entire genera. Fortunately labels make cluster recognition simple, leading to subjective interpretations at times. Neighbour-Nets are normally interpreted in conjunction with additional analyses such as parsimony based networks or phylogenetic trees. Image from Arrigo et al., 2011 .

Statistical parsimony (data displaying)

Statistical parsimony builds a network by sequentially connecting taxa in order of increasing character state differences until the parsimony connecting limit is reached . That is, the limit at which parsimony can be considered a reliable method for phylogenetic inference. The analysis is able to process both discrete sequence data as well as distance values. The method is straightforward, requiring no parameter selection and displays evolutionary change in a similar manner to a parsimony tree; the distances between taxa reflect evolutionary change. Since it is based on parsimony this method faces some of the same issues effecting parsimony tree constructions, such as the inability to process large, complex data sets. Under these conditions of increased complexity statistical parsimony disconnects the network, producing an array of separate networks, rather than a single diagram; making the analysis more prone to false negatives . Furthermore, since the network is built sequentially the most parsimonious connections are made at each step, possibly resulting in incomplete character conflicts due to a lack of indirect links and the production of overall networks that do not achieve a maximum parsimony . 
The value of parsimony networks is that they provides additional insight into the phylogenetic history of the haplotype under investigation (inferring populations or species history) as well as the relative abundance of the haplotype .

Parsimony network representing the genealogical relationships of haplotypes with in the Little Karoo endemic Berkheya cuneata (Asteraceae). The nodes (small dots) indicate sequence changes between haplotypes A-E and the size of the coloured circles (haplotypes) indicate the relative frequency of these haplotypes occurring within the samples. The large circles are the most common haplotypes and therefore are generally considered to be older. Out group species included B. fruticosa (a), B. coriacea and B. spinosa. Image taken from Potts et al., 2014.

Relevant literature:

Arrigo, N., Buerki, S., Sarr, A., Guadagnuolo, R., Kozlowski, G., 2011. Phylogenetics and phylogeography of the monocot genus Baldellia (Alismataceae): Mediterranean refugia, suture zones and implications for conservation. Molecular Phylogenetics and Evolution 58, 33–42.

 Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23, 254–267.
Levy, D., Pachter, L., 2011. The neighbor-net algorithm. Advances in Applied Mathematics 47, 240–258.

Moret, B.M.E., Nakhleh, L., Warnow, T., Linder, C.R., Tholse, A., Padolina, A., Sun, J., Timme, R., 2004. Phylogenetic networks: Modeling, reconstructibility, and accuracy. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 13–23.
Morrison, D.A., 2005. Networks in phylogenetic analysis: New tools for population biology. International Journal for Parasitology 35, 567–582.

Potts, A.J., Hedderson, T.A., Vlok, J.H.J., Cowling, R.M., 2013. Pleistocene range dynamics in the eastern Greater Cape Floristic Region: A case study of the Little Karoo endemic Berkheya cuneata (Asteraceae). South African Journal of Botany 88, 401–413.