Voice model creator for CMU Sphinx

This tool contains basic tools for creating a custom domain voice model for use with the PocketSphinx decoder. It is also possible to use the voice models created by this tool as the basis for a test-to-speech engine.

Note this tool has only been tested with Linux Mint 17.3 & 18 and Ubuntu GNOME 17.04. No good reason it should fail elsewhere, but use at your own risk.

Please see the LICENSE file for terms of use.

installation

This is tested on Ubuntu GNOME 17.04. Further testing has not been performed.

You should install dependencies first; this ensures that python-dev, PocketSphinx, etc. are available. Second, install vmc. Some of the packages need to be installed within the user's home directory; ~/tools is recommended.
This should be specified when installing the dependencies.

Commands:

cd ~/Downloads
git clone https://github.com/umhau/vmc.git
cd ./vmc
bash install.sh

If the dependencies involved were already installed, use the following to install only the vmc program files (old versions will be automatically removed).

cd ./vmc
sudo bash install.sh -no-deps

To remove vmc, run either of the following commands:

vmc -remove

See use examples in the next section.

Usage Examples

Add to a preexisting set of recordings, and adapt an existing acoustic model. Use model name 'model-name' and require 5 recordings of every item in the dictation file.

vmc model-name \
-adapt /extant/model/location \
-addrecordings /audio/files/location /dictation/file/location.txt 5

Create a new model, and create a new set of audio recordings.

vmc model-name \
-create /place/to/put/model \
-newrecordings /place/to/put/audio/files /dictation/file/location.txt 5

Import a previously created set of recordings, and adapt a preexisting model.

vmc model-name \
-adapt /extant/model/location \
-importrecordings /audio/files/location

File Structure

Two folders are involved: the audio recordings folder and the acoustic model folder. These can be kept in separate places. The acoustic model folder may be part of the python-pocketsphinx installation, in which case it is kept at '/usr/local/lib/python2.7/dist-packages/pocketsphinx/model/en-us'. Some files are generated by vmc.

Note the model name is only used with files created from audio recordings. All the en-us files have very default names.

Most files have default names, or are named according to the model name. File structure is as follows (incomplete, only showing commonly-used files):

[audio-recordings]
- [model name].fileids
- [model name].transcription
- mdef
- mdef.txt

[acoustic-model]
- feat.params

Background

This tool brings together a number of disparate data files that are needed for creating a voice model. This graph illustrates the algorithm:

               word domain
                    +
                    |
                    v
    +-------+ sentence list+----------+
    |               +                 |
    |               |                 |
    v               v                 v
dictionary      grammar: LM    voice samples
    +               +                 +
    |               |                 |
    |               v                 |
    +--------> voice model <----------+
                training
                    +
                    |
                    v
               voice model

Each of these steps, starting with the sentence list (given) and ending with the voice model are contained within this tool.

The 'word domain' is the set of sentences, words and phrases used in the training and in the use case scenario. They must be as identical as possible to enable accurate recognition.

Todo

clean up VMC script (add functions, make options tidier, etc.)
make sure that the process of removing conflicting libs actually works.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
lib		lib
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
lmc		lmc
vmc		vmc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice model creator for CMU Sphinx

installation

Usage Examples

File Structure

Background

Todo

About

Releases

Packages

Languages

License

umhau/vmc

Folders and files

Latest commit

History

Repository files navigation

Voice model creator for CMU Sphinx

installation

Usage Examples

File Structure

Background

Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages