Hooty: a fast DNA distance matrix tool with innovative handling of ambiguous sites

Hooty computes the minimum and maximum K2P distance between predetermined groups in an alignment and outputs results in a matrix format. It also includes an option for the interpretation of ambiguous bases.

If you are looking for the old python version, click here.

INSTALLATION

Hooty is available as a Rust crate, so first of all you need to install Rust. After that, you can install Hooty by running the following command in the terminal:

cargo install hooty

After installation, you can run Hooty by simply typing hooty in your terminal (see the required input files and all the available options)

hooty <options>

Alternatively, you can clone the repository

git clone https://github.com/Princic-1837592/Hooty.git

and then run Hooty using cargo with the following command from inside the cloned folder:

cargo run --release -- <options>

EXAMPLES

Some examples can be found in the examples folder of the repository.

USAGE

Hooty requires two input files: a FASTA file containing the sequences to be analyzed and a partition file containing the groups of sequences.

hooty <fasta file> <partition file> [OPTIONS]

INPUT FILES

<fasta file> a file containing the sequences to be analyzed. Sequences must have the same length. Sequence names should refer to a name in the partition file.

>COD-001_species01                                             ├── sequence name
CACACTCTATTTAATTTTTGGTATTTGAGCCGGGCTAGTCGGAACCGGACTCRGCCTACT ──┐
AATCCGAGCAGAACTTAGTCAGCCCGGAACTTTACTAGGGGACGATCAACTATATAATGT   ├── DNA sequence
GGTTGTCACTGCTCACGCATTTATCATAATTTTCTTCTTAGTAATACCGTTAATAATTGG   │
CGGATTCGGAAATTGATTAGTCCCATTAATATTAGGTGCTCCAGATATAGCCTTTCCTCG   │
AATAAACAACATAAGATTTTGATTACTACCTCCATCACTAACACTACTATTAACCTCAGC ──┘
>COD-002_species01                                             ├── sequence name
CACACTCTATTTAATTTTTGGTATTTGAGCCGGGCTAGTCGGAACCGGACTCAGCCTACT ──┐
AATCCGAGCAGAACTTAGTCAGCCCGGAACTTTACTAGGGGACGATCAACTATATAATGT   ├── DNA sequence
GGTTGTCACTGCTCACGCATTTATCATAATTTTCTTCTTASTAATACCGTTAATAATTGG   │
CGRATTCGGAAATTGATTAGTCCCATTAATATTAGGTGCTCCAGATATAGCCTTTCCTCG   │
AATAAACAACATAAGATTTTGATTACTACCTCCATCACTAACACTACTATTAACCTCAGC ──┘

<partition file> a text file containing one group per line. Please pay attention to the following:

A name can appear in only one group.
Name matching is case-sensitive (species01 is different from Species01).
A name should not be substring of another (having something like species1 and species11 could result in an error)

species01
species02
species03
species04

Multiple names for the same group must be on the same line, separated by commas.

species01
species02a, species02b
species03a, species03b, species03c
species04

OPTIONS

-h | --help print list of commands and usage
-f | --full-matrix <string> write file with pairwise distance for all sequences
-o | --output <string> specify name for output file. By default, the output file will be placed in the same folder as the input, with the same name but extension .csv
-s | --separator <x> separator to use in output file. x can be tab, comma, or semicolon (default)
-t | --threshold <number> maximum percentage of ambiguous bases allowed within a group. The value must be between 0 and 1. Default is 0.0
-u | --unambiguous always treat ambiguous sites as similarities. When this is active, the threshold parameter is ignored
-V | --version check the installed version of Hooty

CONTRIBUTION AND BUG REPORT

If you want to contribute to the project or report a bug, you can open a pull request or an issue on the GitHub repository

AUTHORS

Written and developed by Andrea Princic and Giacomo Chiappa, Sapienza University of Rome, Italy.