HyDe Data Files

The main data files for running HyDe are (1) the DNA sequences and (2) the map of sampled individuals to their respective taxa/populations. Examples of these files can be found in the test/ and examples/ directories within the main HyDe folder. Below we describe each file in more detail. If you installed HyDe from PyPI and do not have a clone of the GitHub repository, these files can be viewed here.

Sequence Data

The DNA sequence data should be in sequential Phylip format with individual names and sequence data all on one line and separated by a tab. In previous versions of HyDe (<= 0.3.1), the header information that is typically present in a Phylip file (# individuals and # sites) was not included. We maintain this backwards compatibility. Individual names do not have length restrictions (within reason) and do not all need to be the same number of characters.

Example: data.txt

<NumInd>  <NumSites>
Ind1  AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
Ind2  AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
Ind3  AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
.
.
.
Ind23 AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
Ind24 AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
Ind25 AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
.
.
.
Ind(N-2)  AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
Ind(N-1)  AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...
IndN      AGATTGAGCTAGCAGACGTGACAGACAGATGACAGTGACGA...

Taxon Map

The taxon map file is organized similarly to the sequence data file with one individual per row and a tab separating the individual’s name from the name of the taxon/population it belongs to. The individuals should be in the same order as the DNA sequence data file with all individuals in a particular taxon grouped together sequentially.

Example: map.txt

Ind1  Pop1
Ind2  Pop1
Ind3  Pop1
.
.
.
Ind23 Pop3
Ind24 Pop3
Ind25 Pop3
.
.
.
Ind(N-2)  PopM
Ind(N-1)  PopM
IndN  PopM

Triples File

Each of the Python scripts that are part of HyDe take a file of triples that are to be tested for hybridization. The format of this triples file is a three column table where each row is a triple to be tested. Column one of the table is parent one, column two is the putative hybrid, and column three is parent two. You can name the populations anything you like as long as the name doesn’t have spaces.

Pop1  Pop2  Pop3
Pop1  Pop3  Pop4
Pop2  Pop4  Pop5
.
.
.
Pop4  Pop5  PopM

We have also written the scripts to take previous output files from HyDe as input. It does this by ignoring the first-row header and using the triples specified in the rows that follow.