The CMU-Sphinx Speech Recognition Toolkit: First Steps
š Hi, itās Josh here. Iām writing you this note in 2021: the world of speech technology has changed dramatically since CMU-Sphinx. Before devoting significant time to deploying CMU-Sphinx, take a look at šø Coqui Speech-to-Text. It takes minutes to deploy an off-the-shelf šø STT model, and itās open source on Github. Iām on the Coqui founding team so Iām admittedly biased. However, you can tell from this blog that Iāve spent years working with speech technologies like CMU-Sphinx, so I understand the headaches.
With šø STT, weāve removed the headaches and streamlined for production settings. You can train and deploy state-of-the-art šø Speech-to-Text models in just minutes, not weeks. Check out the šø Model Zoo for open, pre-trained models in different languages. Try it out for yourself, and come join our friendly chatroom š
Some Background
I recently installed Ubuntu 14.04 on my Lenovo Yoga, and itās time to reinstall SPHINX.
When I installed SPHINX for the first time in September 2015, it was not a fun experience. I originally followed the instructions on CMUās website, but I couldnāt seem to get it right. I tried a number of different approaches, using different blogs as guides, but I got nowhere. I first tried downloading Pocketsphinx, Sphinxtrain, Sphinxbase and Sphinx4 from CMUās downloads page, but that didnāt work. I also tried installing the version hosted on SourceForge, but no luck there either. I finally decided to try cloning and installing the version on GitHub, and that seemed to do the trick. However, at the end of this post I show how to install CMUCLMTK from SourceForge, because they donāt have it on GitHub.
So, Iām going to go through installation process again here.
First, in case itās relevant for others Iām going to show a little info about my current setup.
You can see the exact kernel on my version of Ubuntu below:
Installing Dependencies
To install on Ubuntu (or any other unix-like system), we first need to install a few dependencies. Hereās the list:
Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR(1) parser tables.
Header files, a static library and development tools for building Python modules, extending the Python interpreter or embedding Python in applications.
Headers and libraries for developing applications that access a PulseAudio sound server via PulseAudioās native interface.
Hereās the command to get everything at once:
Installing CMU-SPHINX
Installing sphinxbase
Whether youāre using pocketsphinx or sphinx4, youāre going to need to install sphinxbase first.
The README for the sphinxbase repository says:
This package contains the basic libraries shared by the CMU Sphinx trainer and all the Sphinx decoders (Sphinx-II, Sphinx-III, and PocketSphinx), as well as some common utilities for manipulating acoustic feature and audio files.
To get sphinxbase running, we need to clone the repository from GitHub and then run a few commands to configure and install it in the right spot.
I usually make a folder on my desktop to store the source code, and then when itās all been installed you can just throw away all those extra files.
So, first we need to get to the Desktop, make a new directory and cd into it.
Now we can clone the source from GitHub, and you should get something like this:
Now can see that our once empty dir sphinx-source now has a new directory, sphinxbase:
Letās look at whatās inside this new dir, sphinxbase:
Now we need to run the autogen.sh shell script you can see in the sphinxbase directory. This will generate our Makefiles and other important scripts for compiling and installing. Weāre going to get a long output here, so I only show some of it here:
Before we charge right ahead to compilation with the make command, lets take a look at what new files were generated from running autogen.sh.
You can see that we now have the scripts needed for compiling, configuring, and installing sphinxbase. Now we can run make to do our installation. As nicely summarized on Wikipedia, āMake is a utility that automatically builds executable programs and libraries from source code by reading files called Makefiles which specify how to derive the target program.ā
When you run the make command without any arguments (still in the local version of the cloned sphinxbase repository), you will get a long output that ends something like this:
The next step is the last step. Run the command sudo make install. Root permission is important, because otherwise you will get some error without any Permision Denied warning.
You will see a good amount of output with some sections that look like this:
Thatās it! You should have successfully installed sphinxbase. To check if youāve actually installed it, just go to the terminal and do a tab-completion for sphinx_. You will see all the options of what youāve just installed.
At this point, if you try to run any one of these by entering it at the command line, you get an error:
This error has been answered by Nikolay Shmyrev on stackoverflow already, and the reason for this error is the following:
This error means that system fails to find the shared library in the location where it is installed. Most likely you installed it with default prefix /usr/local/lib which is not included into the library search path.
Thereās a few ways to solve this problem. You may have come across this one which doesnāt work well:
The problem is, this solution will work for as long as youāre in the same session in your terminal. When you logout and log back in, you will have to reset the variable again.
Rather, we can edit the file /etc/ld.so.conf so we always look into the right directory when we need to. If you take a look at the Linux Programmerās Manual you find the following description:
/etc/ld.so.conf: File containing a list of directories, one per line, in which to search for libraries.
So, this is the right place to make a change.
If you take a look into the config file right now, you will probably just see one line:
We want to add /usr/local/lib to the file. So, you can use nano to open it up, and add a new line that just says /usr/local/lib. Thatās it. Donāt delete anything else or add anything else or you might get some headaches.
If youāve added that new line in via nano, you should see something like this:
Now save the modified file (CTRL+o) and exit (CTRL+x).
Re-configure with the following command:
Now you can check that your computer is finding the shared libraries with the following:
Now you should be able to run the sphinxbase executables, and get a more reasonable error:
Installing pocketsphinx
Now that weāve got sphinxbase installed successfully, we can move onto installing pocketsphinx. According to the description on the pocketsphinx GitHub repository:
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop.
Still using sphinx-source as our current working directory, we can clone pocketsphinx from GitHub with the following command:
If we peek inside the current working directory, we will see we have a new directory:
Now lets take a look at all the stuff weāve just cloned:
Looks pretty similar to what we found in our sphinxbase source directory, right?
It basically is, and we can run the same installation procedure as we did above. So now we cd into the dir itself and run autogen.sh. We get some output that looks like the following (again, Iāve truncated the output here).
Now weāve made all our necessary Makefiles, and we can see them in the pocketsphinx directory.
Same as we did above for sphinxbase, we run make now.
And now we can actually do the installation with make install and root privledges.
Letās see if we got something. If you type in pocketsphinx_ and do a tab completion to list all options, you should see something like this:
Now if you try to run one of them, we get a sensible error that says we didnāt supply any of the needed arguments.
Huzzah! We now have a functional version of pocketsphinx installed with all itās sphinxbase dependencies (if you followed the first section). If you already have a language model, an acoustic model, and a phonetic dictionary, youāre good to go!
However, if youād like to train or adapt an acoustic model, you need to install sphinxtrain as shown below.
Installing sphinxtrain
Letās clone sphinxtrain into the temporary directory weāve been using to store our source code (sphinx-source):
If we look inside the temorary directory, we see sphinxtrain right where it should be, alongside our other directories of source code.
Now, if we look inside this new sourcecode, we will see something pretty familiar.
Letās cd into sphinxtrain and run the script which generates the Makefiles.
Letās take a look at what we just did.
As with all the other installations, we now compile with make.
Moving right along, we can run make install to seal the deal.
Hopefully now you can try out sphinxtrain and get some sensible output:
You should be ready to go now!
Hopefully this was helpful for you. If you ran into issues or have suggestions on how to make this better, be sure to leave a comment!
Installing cmuclmtk
I canāt seem to find the code on CMU-Sphinxās GitHub account, so Iām just went through sourceforge instead.
NB A reader recommended to try this link from svn instead: svn://svn.code.sf.net/p/cmusphinx/code/trunk/cmuclmtk
As you can see below, we just downloaded pretty much everything theyāve got. Importantly, cmuclmtk is there, too.
Letās cd into cmuclmtk and take a look:
Familiar set up, right? We do the same steps as before, starting with ./autogen.sh.
Hereās all the things weāve just generated:
Now we run make.
And finally, sudo make install.
Now we can see a couple of the executables if we do a tab completion as such:
And if we run one without input, it hangs up and runs for a while, but works.