ESpeak NG is an open-source, formant speech synthesizer which has been integrated into various open-source projects (e.g. Ubuntu, NVDA). ESpeak NG can be also be used as a stand-alone text-to-speech converter to read text out loud on a computer. ESpeak NG is the ‘New Generation’ fork of the older, eSpeak.
To add a new language to eSpeak NG, you need to have an understanding of the sounds of the language you’re interested in. Knowledge of programming, while helpful, is not necessarily needed.
Specifically, you need to know two things: (1) how the sounds of the language work, and (2) how the spelling of the language relates to those sounds.
There is indeed good documentation out there already on how to add a new language to eSpeak NG (and a lot of the information in this post comes directly from the official documentation). I decided to create yet another set of instructions, because in my experience of adding the Kyrgyz langauge to eSpeak NG, it took a while to digest some of the steps and links, and what I’ve written here is a version that makes sense to me.
In adding a new language, these are the files that you will need to edit or create. In the following table, new-language refers to the full name of the language you’re working on (e.g. in my case, this will be kyrgyz). The word family refers to the directory of the language family (e.g. in my case trk because Kyrgyz is a Turkic language). Finally, the two letters nl refer to the international two letter code of the new langauge you’re adding (e.g. in my case, ky).
path/file
action
./Makefile.am
edit
./phsource/phonemes
edit
./phsource/ph_new-language
create
./dictsource/nl_extra
create
./dictsource/nl_list
create
./dictsource/nl_rules
create
./espeak-data/voices/family/nl
create
Installing eSpeak NG
First, let’s clone eSpeak NG from Github:
Now that we’ve got it downloaded, let’s take a peek into the folder and see what we’ve got:
Now that we’ve got everything downloaded and in place, we run the autogen.sh shell script. This is a very short script whose main function is to call three of the GNU Autotools: autoheader (which helps configure work smoothly), automake (generates the Makefiles) and autoconf (generates the configure file).
Now, we’re ready to configure. The ./configure script basically checks to make sure that everything is ready to compile.
The eSpeak NG and Speak NG programs, along with the eSpeak NG voices, can then be compiled with make. After this, you should have working, compiled code, but it will only be accessible from the relevant directory (e.g. for me, just a folder on my Desktop).
Finally, we can install eSpeak NG (and make it accessible from anywhere on the computer) with the following command:
Huzzah! You should now have a functional installation of eSpeak NG! You should be able to use any existing language at this time. Here’s a command to test out the US English version:
If you found this post you probably want to know how to add a new language to eSpeak NG, and that’s what we will start doing now.
Neccesary Files for A New Language
The Voice File
The first file we’re going to add is the so-called “Voice File”. In a nutshell, this is a simple file that defines the language and how that language is to be spoken.
In this file you must define the language name and its two letter code, and then you optionally can specify a male or female voice, define what the pitch should be, and other characteristics of the voice.
This file needs to be located in /espeak-data/voices/family/ where family is the language family of your new language.
Since Kyrgyz is a Turkic language, I’m going to save the voice file in /espeak-data/voices/trk/. The name of the voice file should just be the 2-letter code of the language. That’s ky for Kyrgyz.
I’d recommend looking through the other voice files for other languages to get an idea of what kinds of things you may want to define. There are many options, but to get started all you need to do is define the language name and code.
Here’s how to create and save the most simple voice file with the simple Linux text editor nano.
Once you press enter, you should see something like this, and then enter the two lines needed and then WriteOut the file and save.
Huzzah! We’ve just created one of the five files we need to create.
Phoneme Definition File
We now need to define what the sounds of the new language actually sound like.
That is, if we want to read text out load on the computer, we need to be able to generate an acoustic output that will come out our computer speakers.
We accomplish this by creating a file which defines the acoustic output for each sound (aka for each phoneme). This is called the phoneme definition file.
Here’s a section of my ph_kyrgyz phoneme definition file which defines some of the short vowels:
More specifics about the syntax of this file can be found best explained by the official documentation.
In the beginning, however, you can get a long way by just finding similar sounds in other languages, and copy-and-pasting those sounds into your new phoneme definition file.
The Dictionary Files
The dictionary files (which should be saved into the /dictsource directory), are responsible for converting text into sounds (aka phonemes).
Practically all languages are written in a way that is not exact, but we need something extact if we are telling a computer how to read text out-loud.
The dictionary files help us create more precise transcriptions of words. These files take written words and convert them into a phonetic transcription. If these dictionary files work well, we will be able to produce a phonetic transcription for any text. Pretty cool, right?!
The problem is, for almost any ‘regular’ rule we find in a language, there will be ‘exceptions’. This is why there are at least two dictionary files: one for regular rules (ky_rules) and one for exceptions (ky_list).
My rules file for Kyrgyz looks something like this, just showing from lines 33 to 90:
You can see that there are some groups defined at the beginning (e.g. .L01, .L02, etc), and they show up in the rules later on. You can define and use groups like this to make rules about certain contexts.
Some rules can be very simple, like all the rules for vowels seen here.
Every language is going to have different rules, so all I want to do here is give you an idea of what this file does. These rules translate a letter (on the left) to a sound (on the right). In this case, since I’m working on Kyrgyz, all the letters are Cyrillic and all the sounds are represented by Latin letters. Every sound (written in Latin letters) should have a definition in the phoneme definition file.
Now, let’s take a look at my ‘exceptions’ file. In fact, this file includes exceptions as well as definitions of symbols. Numbers have to be defined here as well.
You’ll find a much more detailed explaination of these files in the official documentation.
Editing the Master Phoneme File
We now have to link up our specific phoneme file for the new language to the master set of all phonemes. This master file is located at phsource/phonemes.
In the beginning, the file defines common vowels and consonants, and then at the end we find references to individual languages. Here, at the end of the file, is where we should include a reference to our new language.
Before we edit anything, the tail of the file looks like this:
After I make the edit to add a reference to the Kyrgyz language, the file looks like this:
For this file, that’s all we have to do!
Editing Makefile.am
This is the point where I ran into some issues. I couldn’t find anywhere on the original documentation a mention of editing the Makefile.am, so I left it alone, did all the other steps, and then ran into lots of issues.
Finally I decided to take a longer look at the Makefile, and I realized that there were a few simple things that needed to be added for my Kyrgyz voice to compile and install correctly.
First, there’s a reference to the phoneme file that should be added:
You see at the end of this section all the listings of the ph_language files? That’s where we need to add a line for our new langauge.
After I add a reference to ph_kyrgyz, the file looks like this:
Second, we can file in the Makefile.am references to all the dictionary files for the existing languages. Likewise, we have to add a reference to our dictionary file:
I made the change, adding ky_dict, and I get this:
Thirdly, we can find a list of definitions putting all the language information together to compile the dictionary:
Likewise, we need these instructions for our new language. For Kyrgyz, I added the same definitions and I get this:
With these final three changes to Makefile.am, you should be ready to go!
Conclusion
If you’ve gone through all these steps successfully, you should be ready to start making updates to the various language definition files.
Whenever you make an update, (for instance to a file like ph_kyrgyz) you should be able to quickly listen to your changes by re-running the make and sudo make install commands.
And then
There shouldn’t be a need to re-run ./autogen.sh or ./configure, as far as I can tell.
My workflow has been (1) make a change, (2) run make and sudo make install, (3) listen to change, (4) go back to (1).
I hope you found this post useful!
If you have any questions or comments or find an issue, you can email me or leave a comment!