Villegas-Ramirez, Berenice [1], Yant, Levi [1], Xi, Zhenxiang [1], Webb, Campbell [2], Mathews, Sarah [3].

High quality sequence data from poor quality herbarium DNAs through hybridization-based target enrichment.

We sought to obtain a fully resolved phylogeny of Alangium Lam. using DNAs from herbarium specimens only. To do so, we developed protocols for hybridization-based target enrichment and sampled 255 regions in the nuclear genomes of 19 of 25 accepted species, from some subspecies, and from related genera. To design hybridization probes, we identified broadly conserved regions in transcriptomes from five genera across Cornales, and then filtered the resulting sequence clusters based on similarity and length. 74 herbarium specimens were sampled, ranging in age from 8 to 97 years old. DNAs were highly degraded, with fragment sizes mostly below 1 kb, and concentrations ranging from 0-115 ng/µL. DNA libraries were successfully prepared from DNA samples of low concentration (as low as 13.9 to 20 ng/µL). Up to 11 of these libraries were pooled for hybridization with 80-bp baits, and enriched pools were further pooled for sequencing. Pooled samples were spiked into a lane with whole-genome libraries (from an unrelated project) being sequenced on a paired-end 150-bp lane. (Spike = dilution, 1 part hyb pool:1000 parts genome library.) We are testing a range of spiking dilutions from 1:40 to 1:2000. Per individual proportions of missing data from an initial, low-coverage run ranged from 5-92%. In a matrix of the 9 taxa with the best coverage (0-20% missing data compared to the reference), 2184 of the 208,524 nucleotide sites are parsimony informative. Two of the oldest specimens (1933) have fewer than 20% missing data. Hybridisation-based target enrichment is an efficient and rapid approach for sampling degraded DNA for a large number of loci (255 in this case). Potentially high costs are avoided by pooling libraries for enrichment and combining hybridisation pools for sequencing. High costs are further avoided by opportunistic sequencing (spiking whole genome libraries with small aliquots of our enriched libraries).

1 - Harvard University, Department of Organismic and Evolutionary Biology, Cambridge, MA, 02138, USA
2 - Harvard University, Arnold Arboretum, Boston, MA, 02131, USA
3 - CSIRO National Research Collections Australia, The Australian National Herbarium, Canberra, ACT, 2601, Australia

species phylogeny
herbarium DNA
Sequence capture.

Date: Wednesday, July 29th, 2015
