diff --git a/README.md b/README.md index 67bfe14..e055b5f 100644 --- a/README.md +++ b/README.md @@ -130,23 +130,24 @@ When the script finishes, it creates `graph.pkl` in the specified output directory. `graph.pkl` is a Python pickle file representing the constructed gene matches graph. -The script also creates Python pickles for the pairwise BLAST results. The BLAST -results can be found in the `od2` subdirectory of the output directory. +The script also stores HDF5 files (formerly Python pickles) for the pairwise +BLAST results. The BLAST results can be found in the `od2` subdirectory of the +output directory. ### Phase 2: Calculating distances The `filtered_distance.py` Python script may be used to compute distances or similarities from a gene matches graph. Basic usage of the command requires -only that we provide the pickles for the gene matches graph and the pairwise -BLAST results. +only that we provide the pickles for the gene matches graph and the HDF5 files +for the pairwise BLAST results. ```bash -python filtered_distance.py -g GRAPH -c COMPARISONS_DIR/*.pkl +python filtered_distance.py -g GRAPH -c COMPARISONS_DIR/*.h5 ``` In the above command, GRAPH should be the path to the `graph.pkl` created in the first phase, and COMPARISONS_DIR should be the directory that contains the BLAST -result pickles. (This will be the `od2` subdirectory of the output directory +result HDF5 files. (This will be the `od2` subdirectory of the output directory from Phase 1 if you used the `typical_filtering_step.sh` script.) The script outputs a genetic similarity matrix to standard output by default. To diff --git a/docs/tutorials/reads2tree/README.md b/docs/tutorials/reads2tree/README.md index 1d800ff..ea1c3af 100644 --- a/docs/tutorials/reads2tree/README.md +++ b/docs/tutorials/reads2tree/README.md @@ -264,10 +264,10 @@ ls "$TUTORIAL_DIR/rna_clique_out/graph.pkl" If you want a tree, you can create one using RNA-clique and Biopython. The code below, also found in `docs/tutorials/reads2tree/make_tree.py`, computes the -distance matrix from the `graph.pkl` and `od2/*.pkl` files and constructs a tree -using the neighbor-joining algorithm. The tree is also rooted at its -midpoint. The tree is saved to `nj_tree.tree`, and a visualization is saved to -`nj_tree.svg` in the `rna_clique_out` directory. +distance matrix from the `graph.pkl` and `od2/*.h5` (or `od2/*.pkl`) files and +constructs a tree using the neighbor-joining algorithm. The tree is also rooted +at its midpoint. The tree is saved to `nj_tree.tree`, and a visualization is +saved to `nj_tree.svg` in the `rna_clique_out` directory. ```python --8<-- "docs/tutorials/reads2tree/make_tree.py" diff --git a/docs/usage.md b/docs/usage.md index 72d35a6..b588259 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -10,11 +10,11 @@ This script builds the gene matches graph from gene matches tables. ### Options -| Short name | Long name | Description | Default | Required | -|------------|-----------|-----------------------------------|---------|----------| -| `-h` | `--help` | Print a help message and exit. | | No | -| `-i` | | Gene matches table pickles. | | Yes | -| `-o` | | Output gene matches graph pickle. | | Yes | +| Short name | Long name | Description | Default | Required | +|------------|-----------|------------------------------------------|---------|----------| +| `-h` | `--help` | Print a help message and exit. | | No | +| `-i` | | Gene matches table HDF5 or pickle files. | | Yes | +| `-o` | | Output gene matches graph pickle. | | Yes | ## do\_filtering\_step.sh