-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.md.orig
111 lines (83 loc) · 5.23 KB
/
README.md.orig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# ROAGUE: **R**econstruction **o**f **A**ncestral **G**ene Blocks **U**sing **E**vents
## Purpose
ROAGUE is a tool to reconstruct ancestors of gene blocks in prokaryotic genomes. Gene blocks are genes co-located on the chromosome. In many cases, gene blocks are
conserved between bacterial species, sometimes as operons, when genes are co-transcribed. The conservation is rarely absolute: gene loss, gain, duplication, block
splitting and block fusion are frequently observed.
ROAGUE accepts a set of species and a gene block in a reference species. It then finds all gene blocks, orhtologous to the reference gene blocks, and reconsructs their
ancestral states.
## Requirements
* Python 2.7.6+
* Biopython 1.63+ (python-biopython)
* Muscle Alignment
* BLAST2 (blast2)
* BLAST+ (ncbi-blast+)
* ete3 (python framework for trees)
## Installation
User can either use github interface Download or type the following command in command line:
```bash
git clone https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction
```
For the requirements, everything but ete3 can be installed using the following command line:
```bash
sudo apt-get install python-biopython blast2 ncbi-blast+ muscle
```
For ete3, user can check installation instructions on this website: http://etetoolkit.org/download/
## Usage
The easiest way to run the project is to execute the script [roague](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/blob/master/roague.py). The user can run this script on the two data sets provided in directory [E.Coli](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/tree/master/E.Coli) and [B.Sub](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/tree/master/B.Sub). The two following command line will run [roague](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/blob/master/roague.py) on our 2 directories. The final results (pdf file of our ancestral reconstruction) are stored in `E.Coli/visualization` and `B.Sub/visualization` directory.
### E.Coli
```bash
./roague.py -g E.Coli/genomes/ -b E.Coli/gene_block_names_and_genes.txt -r NC_000913 -f E.Coli/phylo_order.txt -m global
```
<<<<<<< HEAD
### B.Sub
```bash
./roague.py -g B.Sub/genomes/ -b B.Sub/gene_block_names_and_genes.txt -r NC_000964 -f B.Sub/phylo_order.txt -m global
```
Each accompanying script can be run on its own as well, and each help for each script can be found by
using the -h or --help option.
```bash
./roague.py -h
=======
* Step 3: Provide a visualization of the ancestral reconstruction using ete3. ete3 also provides a grouping theme depends on the class of the taxa. You can uncomment the line 103 to render the file into image, however, you need to provide the where to output the render file
* Use the command line below for each operon that you like, here I use not so conserved operon paaABCDEFGHIJK:
```bash
./show_tree.py -g group.txt -i reconstruction_global/paaABCDEFGHIJK
```
![Image of paaABCDEFGHIJK](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/blob/master/image/paa_global.jpg)
* Use the command line below for each operon that you like, here I use a highly conserved operon rplKAJL-rpoBC:
```bash
./show_tree.py -g group.txt -i reconstruction_global/rplKAJL-rpoBC
>>>>>>> idoerg-master
```
usage:
roague.py [-h] [--genomes_directory GENOMES_DIRECTORY]
[--gene_blocks GENE_BLOCKS] [--reference REFERENCE]
[--filter FILTER] [--method METHOD]
optional arguments:
-h, --help show this help message and exit
--genomes_directory GENOMES_DIRECTORY, -g GENOMES_DIRECTORY
The directory that store all the genomes file in genbank format.
(`E.Coli/genomes`)
--gene_blocks GENE_BLOCKS, -b GENE_BLOCKS
The gene_block_names_and_genes.txt file, this file
stores the operon name and its set of genes.
--reference REFERENCE, -r REFERENCE
The ncbi accession number for the reference genome.
(NC_000913 for E.Coli and NC_000964 for B.Sub)
--filter FILTER, -f FILTER
The filter file for creating the tree.
(`E.Coli/phylo_order.tx` for E.Coli or
`B.Sub/phylo_order.txt` for B.Sub)
--method METHOD, -m METHOD
The method to reconstruct ancestral gene block, we
support either global or local approach.
## Examples
Here are two gene blocks that were generated through our program.
1. Gene block paaABCDEFGHIJK:
This gene block codes for genes involved in the catabolism of phenylacetate and it is not conserved between the group of studied bacteria
![paaABCDEFGHIJK](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/blob/master/image/paa_global.jpg "Gene block paaABCDEFGHIJK")
2. Gene block atpIBEFHAGDC:
This gene block catalyzes the synthesis of ATP from ADP and inorganic phosphate and it is very conserved between the group of studied bacteria
![atpIBEFHAGDC](https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction/blob/master/image/atp_global.jpg "Gene block atpIBEFHAGDC")
## Credits
1. http://bioinformatics.oxfordjournals.org/content/early/2015/04/13/bioinformatics.btv128.full