This tool contains basic tools for creating a custom domain voice model for use with the PocketSphinx decoder. It is also possible to use the voice models created by this tool as the basis for a test-to-speech engine.
Note this tool has only been tested with Linux Mint 17.3 & 18 and Ubuntu GNOME 17.04. No good reason it should fail elsewhere, but use at your own risk.
Please see the LICENSE file for terms of use.
This is tested on Ubuntu GNOME 17.04. Further testing has not been performed.
You should install dependencies first; this ensures that python-dev,
PocketSphinx, etc. are available. Second, install vmc. Some of the packages
need to be installed within the user's home directory; ~/tools is recommended.
This should be specified when installing the dependencies.
Commands:
cd ~/Downloads
git clone https://github.com/umhau/vmc.git
cd ./vmc
bash install.sh
If the dependencies involved were already installed, use the following to install only the vmc program files (old versions will be automatically removed).
cd ./vmc
sudo bash install.sh -no-deps
To remove vmc, run either of the following commands:
vmc -remove
See use examples in the next section.
Add to a preexisting set of recordings, and adapt an existing acoustic model. Use model name 'model-name' and require 5 recordings of every item in the dictation file.
vmc model-name \
-adapt /extant/model/location \
-addrecordings /audio/files/location /dictation/file/location.txt 5
Create a new model, and create a new set of audio recordings.
vmc model-name \
-create /place/to/put/model \
-newrecordings /place/to/put/audio/files /dictation/file/location.txt 5
Import a previously created set of recordings, and adapt a preexisting model.
vmc model-name \
-adapt /extant/model/location \
-importrecordings /audio/files/location
Two folders are involved: the audio recordings folder and the acoustic model folder. These can be kept in separate places. The acoustic model folder may be part of the python-pocketsphinx installation, in which case it is kept at '/usr/local/lib/python2.7/dist-packages/pocketsphinx/model/en-us'. Some files are generated by vmc.
Note the model name is only used with files created from audio recordings. All the en-us files have very default names.
Most files have default names, or are named according to the model name. File structure is as follows (incomplete, only showing commonly-used files):
[audio-recordings]
- [model name].fileids
- [model name].transcription
- mdef
- mdef.txt
[acoustic-model]
- feat.params
This tool brings together a number of disparate data files that are needed for creating a voice model. This graph illustrates the algorithm:
word domain
+
|
v
+-------+ sentence list+----------+
| + |
| | |
v v v
dictionary grammar: LM voice samples
+ + +
| | |
| v |
+--------> voice model <----------+
training
+
|
v
voice model
Each of these steps, starting with the sentence list (given) and ending with the voice model are contained within this tool.
The 'word domain' is the set of sentences, words and phrases used in the training and in the use case scenario. They must be as identical as possible to enable accurate recognition.
-
clean up VMC script (add functions, make options tidier, etc.)
-
make sure that the process of removing conflicting libs actually works.