Supplementary Materials for the manuscript "Gene markers for exon capture and phylogenomics in ray-finned fishes"
Basic information of the 4,435 loci
Target sequences of the 4,434 loci for all eight model fishes
The sequences are needed when you want to design baits using those target sequences or you want to design your own baits combining a new genome or transcriptome.
Pipeline and scripts for ray-finned fishes baits design
The above link has pipelines and scripts for retrieving target sequences of the 4,434 loci if new genome sequences or transcriptomes provided.
Pipeline and scripts for baits refinement
The above link has pipelines and scripts for: I. merge data from different project; II. select loci with less missing data and high phylogenetic decisiveness; III. find and mask region with extraordinary read depth for bait redesign.
EvolMarkers EvolMarkers is a database based on genome
comparison to find conserved single-copy exon (CDS) and intron (EPIC) markers
for phylogenetic and population studies (Li et al., 2010; Li et al., 2007).
Unfortunately, now we are lack of resource to maintain the web server. You
could download the scripts and run it on your own computer, if you are
interested in searching for useful markers.
Scripts for reads assembly The above link has pipelines and scripts for
assembling Illumina sequencing reads into contigs and output aligned sequences
for subsequent data analyses. More introduction and tutorial materials could be
found in Learning.
Finding target loci for gene capture Perl scripts for finding single-copy loci conserved among interested species, which can be used in gene capture.
Misc. Perl Scripts
target markers for Tapeworms
We identified 3,641 single-copy nuclear coding loci
by comparing the genomes of Hymenolepis microstoma, Echinococcus granulosus, and Taenia solium.
We designed RNA baits based on the sequence of H. microstoma, and applied target
enrichment and Illumina sequencing to test the utility of those baits to
recover loci useful for phylogenetic analyses. We captured DNA from five
species of tapeworms representing two families of cyclophyllideans. We obtained
an average of 3,284 (90%) of the targets from the test samples and then used
captured sequences (2,181,361 bp in total; fragment size ranging from 301 bp to
6,969 bp) to reconstruct a phylogeny for the five test species plus the three
species for which genomic data are available.