Skip to content

Commit

Permalink
Update documentation for change to HDF5 format.
Browse files Browse the repository at this point in the history
  • Loading branch information
actapia committed Dec 5, 2024
1 parent c0a79a2 commit 062e7e6
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 15 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,23 +130,24 @@ When the script finishes, it creates `graph.pkl` in the specified output
directory. `graph.pkl` is a Python pickle file representing the constructed
gene matches graph.

The script also creates Python pickles for the pairwise BLAST results. The BLAST
results can be found in the `od2` subdirectory of the output directory.
The script also stores HDF5 files (formerly Python pickles) for the pairwise
BLAST results. The BLAST results can be found in the `od2` subdirectory of the
output directory.

### Phase 2: Calculating distances

The `filtered_distance.py` Python script may be used to compute distances or
similarities from a gene matches graph. Basic usage of the command requires
only that we provide the pickles for the gene matches graph and the pairwise
BLAST results.
only that we provide the pickles for the gene matches graph and the HDF5 files
for the pairwise BLAST results.

```bash
python filtered_distance.py -g GRAPH -c COMPARISONS_DIR/*.pkl
python filtered_distance.py -g GRAPH -c COMPARISONS_DIR/*.h5
```

In the above command, GRAPH should be the path to the `graph.pkl` created in the
first phase, and COMPARISONS_DIR should be the directory that contains the BLAST
result pickles. (This will be the `od2` subdirectory of the output directory
result HDF5 files. (This will be the `od2` subdirectory of the output directory
from Phase 1 if you used the `typical_filtering_step.sh` script.)

The script outputs a genetic similarity matrix to standard output by default. To
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/reads2tree/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,10 +264,10 @@ ls "$TUTORIAL_DIR/rna_clique_out/graph.pkl"

If you want a tree, you can create one using RNA-clique and Biopython. The code
below, also found in `docs/tutorials/reads2tree/make_tree.py`, computes the
distance matrix from the `graph.pkl` and `od2/*.pkl` files and constructs a tree
using the neighbor-joining algorithm. The tree is also rooted at its
midpoint. The tree is saved to `nj_tree.tree`, and a visualization is saved to
`nj_tree.svg` in the `rna_clique_out` directory.
distance matrix from the `graph.pkl` and `od2/*.h5` (or `od2/*.pkl`) files and
constructs a tree using the neighbor-joining algorithm. The tree is also rooted
at its midpoint. The tree is saved to `nj_tree.tree`, and a visualization is
saved to `nj_tree.svg` in the `rna_clique_out` directory.

```python
--8<-- "docs/tutorials/reads2tree/make_tree.py"
Expand Down
10 changes: 5 additions & 5 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ This script builds the gene matches graph from gene matches tables.

### Options

| Short name | Long name | Description | Default | Required |
|------------|-----------|-----------------------------------|---------|----------|
| `-h` | `--help` | Print a help message and exit. | | No |
| `-i` | | Gene matches table pickles. | | Yes |
| `-o` | | Output gene matches graph pickle. | | Yes |
| Short name | Long name | Description | Default | Required |
|------------|-----------|------------------------------------------|---------|----------|
| `-h` | `--help` | Print a help message and exit. | | No |
| `-i` | | Gene matches table HDF5 or pickle files. | | Yes |
| `-o` | | Output gene matches graph pickle. | | Yes |


## do\_filtering\_step.sh
Expand Down

0 comments on commit 062e7e6

Please sign in to comment.