Ancient Greek Genre Classifier

We are data mining a corpus of ancient texts to train machine learning classifiers that distinguish between different genres.
Replication code for Gianitsos et al., "Stylometric Classification of Ancient Greek Literary Texts by Genre," LaTeCH-CLfL 2019
Link to paper: https://www.aclweb.org/anthology/W19-2507/

Setup (Instructions for non-technical users on Mac)

Open the Terminal app

Check that you have Python 3.6 installed:
```
which python3.6
```
If it is installed, this command should have output a path. For example: /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6. If nothing was output, download Python 3.6 here: https://www.python.org/downloads/release/python-368/
Ensure that you have the Xcode command-line tools installed on your Mac by running the following. If the tools are already installed, it will not do anything harmful. This step ensures you have git and svn installed which are necessary to run the code in this project.
```
xcode-select --install
```
Install pipenv¹. If already installed, this command will not do anything harmful.
```
pip install pipenv
```
Clone this repository - click on green 'clone' button on the right side of the Github webpage for this repo to copy the link:
```
git clone <link you just copied>
```
Navigate inside the project folder:
```
cd <the project folder you just cloned>
```
Now that you are in the project directory, run the following command. This will generate a virtual environment called .venv in the current directory² that will contain the Python dependencies for this project.
```
PIPENV_VENV_IN_PROJECT=true pipenv install
```
This will activate the virtual environment. After activation, running Python commands will ignore the system-level Python version & packages, and only use the packages from the virtual environment.
```
pipenv shell
```

Using exit will exit the virtual environment i.e. it restores the system-level Python configurations to your shell. You can also simply close the terminal. Whenever you want to resume working on the project, run pipenv shell while in the project directory to activate the virtual environment again.

Analysis

Here are examples of commands you can run:

Run the demo (this does a feature extraction for a small sample of files, and analyzes the results in one step):

python demo.py

Extract features from all files:

python run_feature_extraction.py all_data.pickle

Extract features from only drama and epic files:

python run_feature_extraction.py drama_epic_data.pickle drama epic

Run all model analyzer functions on the data from all files to classify prose from verse:

python run_ml_analyzers.py all_data.pickle labels/prosody_labels.csv all

Run all model analyzer functions on the data from only drama and epic files to classify drama from epic:

python run_ml_analyzers.py drama_epic_data.pickle labels/genre_labels.csv all

Footnotes

1) The pipenv tool works by making a project-specific directory called a virtual environment that hold the dependencies for that project. After a virtual environment is activated, newly installed dependencies will automatically go into the virtual environment instead of being placed among your system-level Python packages. This precludes the possiblity of different projects on the same machine from having dependencies that conflict with one another. ↩

2) Setting the PIPENV_VENV_IN_PROJECT variable to true will indicate to pipenv to make this virtual environment within the same directory as the project so that all the files corresponding to a project can be in the same place. This is not default behavior (e.g. on Mac, the environments will normally be placed in ~/.local/share/virtualenvs/ by default). ↩

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
demo_files		demo_files
labels		labels
tokenizers		tokenizers
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE.txt		LICENSE.txt
LaTeCH-CLfL_2019_GreekClassification.sublime-project		LaTeCH-CLfL_2019_GreekClassification.sublime-project
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
analyze_models.py		analyze_models.py
color.py		color.py
corpus_categories.py		corpus_categories.py
demo.py		demo.py
extract_features.py		extract_features.py
greek_features.py		greek_features.py
ml_analyzers.py		ml_analyzers.py
model_analyzer.py		model_analyzer.py
progress_bar.py		progress_bar.py
run_feature_extraction.py		run_feature_extraction.py
run_ml_analyzers.py		run_ml_analyzers.py
stylometric_classification_of_ancient_greek.pdf		stylometric_classification_of_ancient_greek.pdf
textual_feature.py		textual_feature.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ancient Greek Genre Classifier

Setup (Instructions for non-technical users on Mac)

Analysis

Footnotes

About

Releases

Packages

Contributors 2

Languages

License

QuantitativeCriticismLab/LaTeCH-CLfL-2019-GreekClassification

Folders and files

Latest commit

History

Repository files navigation

Ancient Greek Genre Classifier

Setup (Instructions for non-technical users on Mac)

Analysis

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages