πŸ‘‹ Hi, it’s Josh here. I’m writing you this note in 2021: the world of speech technology has changed dramatically since Kaldi. Before devoting weeks of your time to deploying Kaldi, take a look at 🐸 Coqui Speech-to-Text. It takes minutes to deploy an off-the-shelf 🐸 STT model, and it’s open source on Github. I’m on the Coqui founding team so I’m admittedly biased. However, you can tell from this blog that I’ve spent years working with Kaldi, so I understand the headaches.

With 🐸 STT, we’ve removed the headaches of Kaldi and streamlined everything for production settings. You can train and deploy state-of-the-art 🐸 Speech-to-Text models in just minutes, not weeks. Check out the 🐸 Model Zoo for open, pre-trained models in different languages. Try it out for yourself, and come join our friendly chatroom πŸ’š





logo

Installation via GitHub

Kaldi is primarily hosted on GitHub (not SourceForge anymore), so I’m going to just clone the official GitHub repository to my Desktop and go from there.

josh@yoga:~/Desktop$ git clone https://github.com/kaldi-asr/kaldi.git
Cloning into 'kaldi'...
remote: Counting objects: 63320, done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 63320 (delta 5), reused 0 (delta 0), pack-reused 63298
Receiving objects: 100% (63320/63320), 74.94 MiB | 8.26 MiB/s, done.
Resolving deltas: 100% (49427/49427), done.
Checking connectivity... done.

Taking a look inside to see what I just cloned:

josh@yoga:~/Desktop$ cd kaldi/
josh@yoga:~/Desktop/kaldi$ la
COPYING  .git            .gitignore  misc       src    .travis.yml
egs      .gitattributes  INSTALL     README.md  tools  windows

Now there’s a lot of good official documentation for Kaldi, but I think the best install info will always be in the INSTALL file on the latest version is. So, let’s take a look:

josh@yoga:~/Desktop/kaldi$ cat INSTALL 
This is the official Kaldi INSTALL. Look also at INSTALL.md for the git mirror installation.
[for native Windows install, see windows/INSTALL]

(1)
go to tools/  and follow INSTALL instructions there.

(2) 
go to src/ and follow INSTALL instructions there.

First things first, it says to go to tools/ and follow those instructions. So, lets cd into tools/ and see what’s there:

josh@yoga:~/Desktop/kaldi$ cd tools/
josh@yoga:~/Desktop/kaldi/tools$ la
CLAPACK  INSTALL           install_pfile_utils.sh  install_speex.sh  Makefile
extras   install_atlas.sh  install_portaudio.sh    install_srilm.sh

Looking into the INSTALL file, we see:

josh@yoga:~/Desktop/kaldi/tools$ cat INSTALL 

To install the most important prerequisites for Kaldi:

 first do

  extras/check_dependencies.sh

to see if there are any system-level installations or modifications you need to do.
Check the output carefully: there are some things that will make your life a lot
easier if you fix them at this stage.

Then run

  make

If you have multiple CPUs and want to speed things up, you can do a parallel
build by supplying the "-j" option to make, e.g. to use 4 CPUs:

  make -j 4

By default, Kaldi builds against OpenFst-1.3.4. If you want to build against
OpenFst-1.4, edit the Makefile in this folder. Note that this change requires
a relatively new compiler with C++11 support, e.g. gcc >= 4.6, clang >= 3.0.

In extras/, there are also various scripts to install extra bits and pieces that
are used by individual example scripts.  If an example script needs you to run
one of those scripts, it will tell you what to do.

So, first we need to check out dependencies:

josh@yoga:~/Desktop/kaldi/tools$ extras/check_dependencies.sh
extras/check_dependencies.sh: all OK.

I’m OK on this one, but I have a feeling others will need to do some installing of dependencies before they move on. I’d recommend running that check_dependencies.sh script after you do your installs to make sure you actually did install what you needed and that it’s in the right spot.

Moving along, we need to run make. There’s an option here for parallelizing this step, so I’m going to check how many processors I have:

josh@yoga:~/Desktop$ nproc
4

So I can run make on all 4 of my processors like this:

josh@yoga:~/Desktop/kaldi/tools$ make -j 4
                .
                .
                .
make[3]: Entering directory `/home/josh/Desktop/kaldi/tools/openfst-1.3.4'
make[3]: Nothing to be done for `install-exec-am'.
make[3]: Nothing to be done for `install-data-am'.
make[3]: Leaving directory `/home/josh/Desktop/kaldi/tools/openfst-1.3.4'
make[2]: Leaving directory `/home/josh/Desktop/kaldi/tools/openfst-1.3.4'
make[1]: Leaving directory `/home/josh/Desktop/kaldi/tools/openfst-1.3.4'
rm -f openfst
ln -s openfst-1.3.4 openfst



Warning: IRSTLM is not installed by default anymore. If you need IRSTLM
Warning: use the script extras/install_irstlm.sh
All done OK.
josh@yoga:~/Desktop/kaldi/tools$ 

Those last lines recommend we install a language modeling toolkit IRSTLM, and I want to make my own language models, so I’m going to install it. If you’re using some pre-existing language model, you can skip these next few steps.

josh@yoga:~/Desktop/kaldi/tools$ extras/install_irstlm.sh
                           .
                           .
                           .
make[1]: Entering directory `/home/josh/Desktop/kaldi/tools/irstlm'
make[2]: Entering directory `/home/josh/Desktop/kaldi/tools/irstlm'
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Nothing to be done for `install-data-am'.
make[2]: Leaving directory `/home/josh/Desktop/kaldi/tools/irstlm'
make[1]: Leaving directory `/home/josh/Desktop/kaldi/tools/irstlm'
readlink: missing operand
Try 'readlink --help' for more information.
***() Installation of IRSTLM finished successfully
***() Please source the tools/env.sh in your path.sh to enable it

Now we should have a working installation of IRSTLM on the computer, and you can verify by looking into /usr/local:

josh@yoga:~/Desktop/kaldi/tools$ cd /usr/local/
josh@yoga:/usr/local$ ls
bin  etc  games  include  irstlm  lib  libexec  man  MATLAB  sbin  share  src
josh@yoga:/usr/local$ ls irstlm/
bin  include  lib

We don’t have to do anything else with IRSTLM right now because we’re just installing. But it’ll be there when you need it!

So, at this point we’ve done part (1) of the kaldi/INSTALL file (i.e. following the steps in the kaldi/tools/INSTALL file).

Now let’s go on to step (2), and follow the instructions in kaldi/src/INSTALL.

josh@yoga:~/Desktop/kaldi/tools$ cd ../src/
josh@yoga:~/Desktop/kaldi/src$ la
base        Doxyfile  gmm         ivector     lm         nnet2     online      sgmm2      tree
bin         feat      gmmbin      ivectorbin  lmbin      nnet2bin  online2     sgmm2bin   util
configure   featbin   gst-plugin  kws         Makefile   nnet3     online2bin  sgmmbin
cudamatrix  fgmmbin   hmm         kwsbin      makefiles  nnet3bin  onlinebin   thread
decoder     fstbin    INSTALL     lat         matrix     nnetbin   probe       TODO
doc         fstext    itf         latbin      nnet       NOTES     sgmm        transform

Looking into the INSTALL file itself:

josh@yoga:~/Desktop/kaldi/src$ cat INSTALL 

These instructions are valid for UNIX-like systems (these steps have
been run on various Linux distributions; Darwin; Cygwin).  For native Windows
compilation, see ../windows/INSTALL.

You must first have completed the installation steps in ../tools/INSTALL
(compiling OpenFst; getting ATLAS and CLAPACK headers).

The installation instructions are:
./configure
make depend
make

Note that "make" takes a long time; you can speed it up by running make
in parallel if you have multiple CPUs, for instance 
 make depend -j 8
 make -j 8
For more information, see documentation at http://kaldi-asr.org/doc/
and click on "The build process (how Kaldi is compiled)".

Like it says, the first step is to run the ./configure script:

josh@yoga:~/Desktop/kaldi/src$ ./configure
Configuring ...
Checking OpenFST library in /home/josh/Desktop/kaldi/tools/openfst ...
Checking OpenFst library was patched.
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Successfully configured for Debian/Ubuntu Linux [dynamic libraries] with ATLASLIBS =/usr/lib/libatlas.so.3  /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3  /usr/lib/liblapack_atlas.so.3
CUDA will not be used! If you have already installed cuda drivers 
and cuda toolkit, try using --cudatk-dir=... option.  Note: this is
only relevant for neural net experiments
Static=[false] Speex library not found: You can still build Kaldi without Speex.
SUCCESS

Now we run make depend:

josh@yoga:~/Desktop/kaldi/src$ make depend -j 4
                    .
                    .
                    .
make[1]: Leaving directory `/home/josh/Desktop/kaldi/src/online2'
make -C online2bin/ depend
make[1]: Entering directory `/home/josh/Desktop/kaldi/src/online2bin'
g++ -M -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/home/josh/Desktop/kaldi/tools/ATLAS/include -I/home/josh/Desktop/kaldi/tools/openfst/include  -g  *.cc > .depend.mk
make[1]: Leaving directory `/home/josh/Desktop/kaldi/src/online2bin'
make -C lmbin/ depend
make[1]: Entering directory `/home/josh/Desktop/kaldi/src/lmbin'
g++ -M -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/home/josh/Desktop/kaldi/tools/ATLAS/include -I/home/josh/Desktop/kaldi/tools/openfst/include -Wno-sign-compare -g  *.cc > .depend.mk
make[1]: Leaving directory `/home/josh/Desktop/kaldi/src/lmbin'

And finally, make:

josh@yoga:~/Desktop/kaldi/src$ make -j 4
               .
               .
               .
make -C lmbin 
make[1]: Entering directory `/home/josh/Desktop/kaldi/src/lmbin'
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/home/josh/Desktop/kaldi/tools/ATLAS/include -I/home/josh/Desktop/kaldi/tools/openfst/include -Wno-sign-compare -g    -c -o arpa-to-const-arpa.o arpa-to-const-arpa.cc
g++ -rdynamic -Wl,-rpath=/home/josh/Desktop/kaldi/tools/openfst/lib  arpa-to-const-arpa.o ../lm/kaldi-lm.a ../util/kaldi-util.a ../base/kaldi-base.a   -L/home/josh/Desktop/kaldi/tools/openfst/lib -lfst /usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3 -lm -lpthread -ldl -o arpa-to-const-arpa
make[1]: Leaving directory `/home/josh/Desktop/kaldi/src/lmbin'
echo Done
Done

If you’ve gotten to this point without any hiccups, you should now have a working installation of Kaldi!


Testing Kaldi Out

The YESNO Example Recipe

To make sure our install worked well, we can take advantage of the examples provided in the kaldi/egs/ directory:

josh@yoga:~/Desktop/kaldi/src$ cd ../egs/
josh@yoga:~/Desktop/kaldi/egs$ la
ami                chime1                   fisher_english  librispeech  sprakbanken  tidigits      yesno
aspire             chime2                   fisher_swbd     lre          sre08        timit
aurora4            chime3                   gale_arabic     lre07        sre10        voxforge
babel              csj                      gale_mandarin   README.txt   swahili      vystadial_cz
bn_music_speech    farsdat                  gp              reverb       swbd         vystadial_en
callhome_egyptian  fisher_callhome_spanish  hkust           rm           tedlium      wsj

Let’s take a look at the README.txt file:

josh@yoga:~/Desktop/kaldi/egs$ cat README.txt 

This directory contains example scripts that demonstrate how to 
use Kaldi.  Each subdirectory corresponds to a corpus that we have
example scripts for.

Note: we now have some scripts using free data, including voxforge,
vystadial_{cz,en} and yesno.  Most of the others are available from
the Linguistic Data Consortium (LDC), which requires money (unless you
have a membership).

If you have an LDC membership, probably rm/s5 or wsj/s5 should be your first
choice to try out the scripts.

Since we can try out yesno off the shelf (the WAV files are downloaded when you run the run.sh script), we’re going to go with that one.

josh@yoga:~/Desktop/kaldi/egs$ cd yesno/
josh@yoga:~/Desktop/kaldi/egs/yesno la
README.txt  s5
josh@yoga:~/Desktop/kaldi/egs/yesno$ cat README.txt 


The "yesno" corpus is a very small dataset of recordings of one individual
saying yes or no multiple times per recording, in Hebrew.  It is available from
http://www.openslr.org/1.
It is mainly included here as an easy way to test out the Kaldi scripts.

The test set is perfectly recognized at the monophone stage, so the dataset is
not exactly challenging.

The scripts are in **s5/**.


Pre-Training File Structure

To get a clearer picture of the file structure, I like to use the tree command to display the file structure as a tree with indented braches. You might have to install tree, but I’d say it’s worth it.

josh@yoga:~/Desktop/kaldi/egs/yesno$ tree .
.
β”œβ”€β”€ README.txt
└── s5
    β”œβ”€β”€ conf
    β”‚Β Β  β”œβ”€β”€ mfcc.conf
    β”‚Β Β  └── topo_orig.proto
    β”œβ”€β”€ input
    β”‚Β Β  β”œβ”€β”€ lexicon_nosil.txt
    β”‚Β Β  β”œβ”€β”€ lexicon.txt
    β”‚Β Β  β”œβ”€β”€ phones.txt
    β”‚Β Β  └── task.arpabo
    β”œβ”€β”€ local
    β”‚Β Β  β”œβ”€β”€ create_yesno_txt.pl
    β”‚Β Β  β”œβ”€β”€ create_yesno_waves_test_train.pl
    β”‚Β Β  β”œβ”€β”€ create_yesno_wav_scp.pl
    β”‚Β Β  β”œβ”€β”€ prepare_data.sh
    β”‚Β Β  β”œβ”€β”€ prepare_dict.sh
    β”‚Β Β  β”œβ”€β”€ prepare_lm.sh
    β”‚Β Β  └── score.sh
    β”œβ”€β”€ path.sh
    β”œβ”€β”€ run.sh
    β”œβ”€β”€ steps -> ../../wsj/s5/steps
    └── utils -> ../../wsj/s5/utils

6 directories, 16 files

These original directories contain general information about the language (in the input/ dir) as well as instructions for preparing the data and scoring it (in the local/ dir) as well as information about the kind of model we want to train and test (in the conf/ dir).

More big-picture scripts (e.g. training monophones, extracting MFCCs from WAV files, etc) are in the steps/ and utils/ dirs. In this case, since these scripts are easily generalizable, Kaldi stores them for all examples in the same directory (in the Wall Street Journal example). All other example dirs (like YESNO) have symbolic links to those dirs.


Data Prep & Training & Testing: The run.sh Script

Now lets cd into the s5/ directory (which holds all the relevant scripts and data for running this example) and run the run.sh script.

josh@yoga:~/Desktop/kaldi/egs/yesno$ cd s5/
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ ./run.sh 
--2016-02-08 18:42:03--  http://www.openslr.org/resources/1/waves_yesno.tar.gz
Resolving www.openslr.org (www.openslr.org)... 107.178.217.247
Connecting to www.openslr.org (www.openslr.org)|107.178.217.247|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4703754 (4.5M) [application/x-gzip]
Saving to: 'waves_yesno.tar.gz'

100%[================================================================>] 4,703,754    630KB/s   in 6.9s   

2016-02-08 18:42:10 (661 KB/s) - 'waves_yesno.tar.gz' saved [4703754/4703754]

waves_yesno/
waves_yesno/1_0_0_0_0_0_1_1.wav
waves_yesno/1_1_0_0_1_0_1_0.wav
waves_yesno/1_0_1_1_1_1_0_1.wav
waves_yesno/1_1_1_1_0_1_0_0.wav
waves_yesno/0_0_1_1_1_0_0_0.wav
                .
                .
                .
waves_yesno/0_0_0_1_0_1_1_0.wav
waves_yesno/1_1_1_1_1_1_0_0.wav
waves_yesno/0_0_0_0_1_1_1_1.wav
Preparing train and test data
Dictionary preparation succeeded
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> data/local/dict/nonsilence_phones.txt is OK
                 .
                 .
                 .
steps/train_mono.sh: Initializing monophone system.
steps/train_mono.sh: Compiling training graphs
steps/train_mono.sh: Aligning data equally (pass 0)
steps/train_mono.sh: Pass 1
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 2
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 3
                 .
                 .
                 .
0.755859 -0.000430956
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/mono0a/final.mdl 
steps/decode.sh --nj 1 --cmd utils/run.pl exp/mono0a/graph_tgpr data/test_yesno exp/mono0a/decode_test_yesno
** split_data.sh: warning, #lines is (utt2spk,feats.scp) is (31,29); you can 
**  use utils/fix_data_dir.sh data/test_yesno to fix this.
decode.sh: feature type is delta
%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] [PARTIAL] exp/mono0a/decode_test_yesno/wer_10

You can see from the last line of output, that as we were warned in the README, this data set is not interesting because we get perfect performance, and our percent Word Error Rate was indeed %0.00.


Post-Training & Testing File Structure

If we take another look at the yesno dir, we will see that our run.sh file generated some more directories and files for us.

I’m going to use the tree function below with the -d flag so we only see directories. Otherwise, all the downloaded WAV files are listed and it’s a little much.

josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ la
conf  data  exp  input  local  mfcc  path.sh  run.sh  steps  utils  waves_yesno  waves_yesno.tar.gz
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree -d .
.
|-- conf
|-- data
|   |-- lang
|   |   `-- phones
|   |-- lang_test_tg
|   |   |-- phones
|   |   `-- tmp
|   |-- local
|   |   |-- dict
|   |   `-- lang
|   |-- test_yesno
|   |   `-- split1
|   |       `-- 1
|   `-- train_yesno
|       `-- split1
|           `-- 1
|-- exp
|   |-- make_mfcc
|   |   |-- test_yesno
|   |   `-- train_yesno
|   `-- mono0a
|       |-- decode_test_yesno
|       |   |-- log
|       |   `-- scoring
|       |       `-- log
|       |-- graph_tgpr
|       |   `-- phones
|       `-- log
|-- input
|-- local
|-- mfcc
|-- steps -> ../../wsj/s5/steps
|-- utils -> ../../wsj/s5/utils
`-- waves_yesno

34 directories

Walking down the subdirs, we can see that the three original dirs were left unchanged:

josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./conf/
./conf/
|-- mfcc.conf
`-- topo_orig.proto

0 directories, 2 files
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./input/
./input/
|-- lexicon.txt
|-- lexicon_nosil.txt
|-- phones.txt
`-- task.arpabo

0 directories, 4 files
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./local/
./local/
|-- create_yesno_txt.pl
|-- create_yesno_wav_scp.pl
|-- create_yesno_waves_test_train.pl
|-- prepare_data.sh
|-- prepare_dict.sh
|-- prepare_lm.sh
`-- score.sh

0 directories, 7 files

These are unchanged because these original directories are housing general information about the language (in the input/ dir) as well as instructions for preparing the data and scoring it (in the local/ dir) as well as information about the kind of model we want to train and test (in the conf/ dir).

Logically, nothing about these files and directories should change after we train and test the model.

However, the newly created data/ directory has a lot of new stuff in it. In general, this directory created by the run.sh script houses and organizes the files which describe the language (e.g. dictionary, phone lists, etc) and data (e.g. WAV file ids and their transcripts) to test and train the model.

josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./data/
./data/
β”œβ”€β”€ lang
β”‚Β Β  β”œβ”€β”€ L_disambig.fst
β”‚Β Β  β”œβ”€β”€ L.fst
β”‚Β Β  β”œβ”€β”€ oov.int
β”‚Β Β  β”œβ”€β”€ oov.txt
β”‚Β Β  β”œβ”€β”€ phones
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ context_indep.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ context_indep.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ context_indep.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ extra_questions.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ extra_questions.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ roots.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ roots.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sets.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sets.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ silence.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ silence.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ silence.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ wdisambig_phones.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ wdisambig.txt
β”‚Β Β  β”‚Β Β  └── wdisambig_words.int
β”‚Β Β  β”œβ”€β”€ phones.txt
β”‚Β Β  β”œβ”€β”€ topo
β”‚Β Β  └── words.txt
β”œβ”€β”€ lang_test_tg
β”‚Β Β  β”œβ”€β”€ G.fst
β”‚Β Β  β”œβ”€β”€ L_disambig.fst
β”‚Β Β  β”œβ”€β”€ L.fst
β”‚Β Β  β”œβ”€β”€ oov.int
β”‚Β Β  β”œβ”€β”€ oov.txt
β”‚Β Β  β”œβ”€β”€ phones
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ context_indep.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ context_indep.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ context_indep.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ extra_questions.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ extra_questions.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ roots.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ roots.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sets.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sets.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ silence.csl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ silence.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ silence.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ wdisambig_phones.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ wdisambig.txt
β”‚Β Β  β”‚Β Β  └── wdisambig_words.int
β”‚Β Β  β”œβ”€β”€ phones.txt
β”‚Β Β  β”œβ”€β”€ tmp
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ CLG_1_0.fst
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig_ilabels_1_0.int
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ilabels_1_0
β”‚Β Β  β”‚Β Β  └── LG.fst
β”‚Β Β  β”œβ”€β”€ topo
β”‚Β Β  └── words.txt
β”œβ”€β”€ local
β”‚Β Β  β”œβ”€β”€ dict
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lexiconp.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lexicon.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lexicon_words.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ nonsilence_phones.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ optional_silence.txt
β”‚Β Β  β”‚Β Β  └── silence_phones.txt
β”‚Β Β  β”œβ”€β”€ lang
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lexiconp_disambig.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lexiconp.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ lex_ndisambig
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ phone_map.txt
β”‚Β Β  β”‚Β Β  └── phones
β”‚Β Β  β”œβ”€β”€ lm_tg.arpa
β”‚Β Β  β”œβ”€β”€ test_yesno.txt
β”‚Β Β  β”œβ”€β”€ test_yesno_wav.scp
β”‚Β Β  β”œβ”€β”€ train_yesno.txt
β”‚Β Β  β”œβ”€β”€ train_yesno_wav.scp
β”‚Β Β  β”œβ”€β”€ waves_all.list
β”‚Β Β  β”œβ”€β”€ waves.test
β”‚Β Β  └── waves.train
β”œβ”€β”€ test_yesno
β”‚Β Β  β”œβ”€β”€ cmvn.scp
β”‚Β Β  β”œβ”€β”€ feats.scp
β”‚Β Β  β”œβ”€β”€ spk2utt
β”‚Β Β  β”œβ”€β”€ split1
β”‚Β Β  β”‚Β Β  └── 1
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ cmvn.scp
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ feats.scp
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ spk2utt
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ text
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ utt2spk
β”‚Β Β  β”‚Β Β      └── wav.scp
β”‚Β Β  β”œβ”€β”€ text
β”‚Β Β  β”œβ”€β”€ utt2spk
β”‚Β Β  └── wav.scp
└── train_yesno
    β”œβ”€β”€ cmvn.scp
    β”œβ”€β”€ feats.scp
    β”œβ”€β”€ spk2utt
    β”œβ”€β”€ split1
    β”‚Β Β  └── 1
    β”‚Β Β      β”œβ”€β”€ cmvn.scp
    β”‚Β Β      β”œβ”€β”€ feats.scp
    β”‚Β Β      β”œβ”€β”€ spk2utt
    β”‚Β Β      β”œβ”€β”€ text
    β”‚Β Β      β”œβ”€β”€ utt2spk
    β”‚Β Β      └── wav.scp
    β”œβ”€β”€ text
    β”œβ”€β”€ utt2spk
    └── wav.scp

14 directories, 115 files

The next directory created by the run.sh script is the exp/ directory. As far as I can gather, β€œexp” is short for β€œexperiment”. I think this is the case becuase the exp/ dir holds information about the model you’re training and testing. It has a lot of files as you see below, and you can see that a lot of them (if not most) are .log files.

I think that Kaldi could have more transparent naming conventions for files and directories, but I will say that the log files are very thorough. There’s a lot of info to be found if you do some digging.

josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./exp/
./exp/
β”œβ”€β”€ make_mfcc
β”‚Β Β  β”œβ”€β”€ test_yesno
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ cmvn_test_yesno.log
β”‚Β Β  β”‚Β Β  └── make_mfcc_test_yesno.1.log
β”‚Β Β  └── train_yesno
β”‚Β Β      β”œβ”€β”€ cmvn_train_yesno.log
β”‚Β Β      └── make_mfcc_train_yesno.1.log
└── mono0a
    β”œβ”€β”€ 0.mdl
    β”œβ”€β”€ 40.mdl
    β”œβ”€β”€ 40.occs
    β”œβ”€β”€ ali.1.gz
    β”œβ”€β”€ cmvn_opts
    β”œβ”€β”€ decode_test_yesno
    β”‚Β Β  β”œβ”€β”€ lat.1.gz
    β”‚Β Β  β”œβ”€β”€ log
    β”‚Β Β  β”‚Β Β  └── decode.1.log
    β”‚Β Β  β”œβ”€β”€ num_jobs
    β”‚Β Β  β”œβ”€β”€ scoring
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 10.tra
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 11.tra
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 9.tra
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ log
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ best_path.10.log
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ best_path.11.log
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ best_path.9.log
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ score.10.log
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ score.11.log
    β”‚Β Β  β”‚Β Β  β”‚Β Β  └── score.9.log
    β”‚Β Β  β”‚Β Β  └── test_filt.txt
    β”‚Β Β  β”œβ”€β”€ wer_10
    β”‚Β Β  β”œβ”€β”€ wer_11
    β”‚Β Β  └── wer_9
    β”œβ”€β”€ final.mdl -> 40.mdl
    β”œβ”€β”€ final.occs -> 40.occs
    β”œβ”€β”€ fsts.1.gz
    β”œβ”€β”€ graph_tgpr
    β”‚Β Β  β”œβ”€β”€ disambig_tid.int
    β”‚Β Β  β”œβ”€β”€ Ha.fst
    β”‚Β Β  β”œβ”€β”€ HCLGa.fst
    β”‚Β Β  β”œβ”€β”€ HCLG.fst
    β”‚Β Β  β”œβ”€β”€ num_pdfs
    β”‚Β Β  β”œβ”€β”€ phones
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.int
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ align_lexicon.txt
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.int
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ disambig.txt
    β”‚Β Β  β”‚Β Β  └── silence.csl
    β”‚Β Β  β”œβ”€β”€ phones.txt
    β”‚Β Β  └── words.txt
    β”œβ”€β”€ log
    β”‚Β Β  β”œβ”€β”€ acc.10.1.log
    β”‚Β Β  β”œβ”€β”€ acc.11.1.log
    β”‚Β Β  β”œβ”€β”€ acc.1.1.log
    β”‚Β Β  β”œβ”€β”€ acc.12.1.log
    β”‚Β Β  β”œβ”€β”€ acc.13.1.log
    β”‚Β Β  β”œβ”€β”€ acc.14.1.log
    β”‚Β Β  β”œβ”€β”€ acc.15.1.log
    β”‚Β Β  β”œβ”€β”€ acc.16.1.log
    β”‚Β Β  β”œβ”€β”€ acc.17.1.log
    β”‚Β Β  β”œβ”€β”€ acc.18.1.log
    β”‚Β Β  β”œβ”€β”€ acc.19.1.log
    β”‚Β Β  β”œβ”€β”€ acc.20.1.log
    β”‚Β Β  β”œβ”€β”€ acc.21.1.log
    β”‚Β Β  β”œβ”€β”€ acc.2.1.log
    β”‚Β Β  β”œβ”€β”€ acc.22.1.log
    β”‚Β Β  β”œβ”€β”€ acc.23.1.log
    β”‚Β Β  β”œβ”€β”€ acc.24.1.log
    β”‚Β Β  β”œβ”€β”€ acc.25.1.log
    β”‚Β Β  β”œβ”€β”€ acc.26.1.log
    β”‚Β Β  β”œβ”€β”€ acc.27.1.log
    β”‚Β Β  β”œβ”€β”€ acc.28.1.log
    β”‚Β Β  β”œβ”€β”€ acc.29.1.log
    β”‚Β Β  β”œβ”€β”€ acc.30.1.log
    β”‚Β Β  β”œβ”€β”€ acc.31.1.log
    β”‚Β Β  β”œβ”€β”€ acc.3.1.log
    β”‚Β Β  β”œβ”€β”€ acc.32.1.log
    β”‚Β Β  β”œβ”€β”€ acc.33.1.log
    β”‚Β Β  β”œβ”€β”€ acc.34.1.log
    β”‚Β Β  β”œβ”€β”€ acc.35.1.log
    β”‚Β Β  β”œβ”€β”€ acc.36.1.log
    β”‚Β Β  β”œβ”€β”€ acc.37.1.log
    β”‚Β Β  β”œβ”€β”€ acc.38.1.log
    β”‚Β Β  β”œβ”€β”€ acc.39.1.log
    β”‚Β Β  β”œβ”€β”€ acc.4.1.log
    β”‚Β Β  β”œβ”€β”€ acc.5.1.log
    β”‚Β Β  β”œβ”€β”€ acc.6.1.log
    β”‚Β Β  β”œβ”€β”€ acc.7.1.log
    β”‚Β Β  β”œβ”€β”€ acc.8.1.log
    β”‚Β Β  β”œβ”€β”€ acc.9.1.log
    β”‚Β Β  β”œβ”€β”€ align.0.1.log
    β”‚Β Β  β”œβ”€β”€ align.10.1.log
    β”‚Β Β  β”œβ”€β”€ align.1.1.log
    β”‚Β Β  β”œβ”€β”€ align.12.1.log
    β”‚Β Β  β”œβ”€β”€ align.14.1.log
    β”‚Β Β  β”œβ”€β”€ align.16.1.log
    β”‚Β Β  β”œβ”€β”€ align.18.1.log
    β”‚Β Β  β”œβ”€β”€ align.20.1.log
    β”‚Β Β  β”œβ”€β”€ align.2.1.log
    β”‚Β Β  β”œβ”€β”€ align.23.1.log
    β”‚Β Β  β”œβ”€β”€ align.26.1.log
    β”‚Β Β  β”œβ”€β”€ align.29.1.log
    β”‚Β Β  β”œβ”€β”€ align.3.1.log
    β”‚Β Β  β”œβ”€β”€ align.32.1.log
    β”‚Β Β  β”œβ”€β”€ align.35.1.log
    β”‚Β Β  β”œβ”€β”€ align.38.1.log
    β”‚Β Β  β”œβ”€β”€ align.4.1.log
    β”‚Β Β  β”œβ”€β”€ align.5.1.log
    β”‚Β Β  β”œβ”€β”€ align.6.1.log
    β”‚Β Β  β”œβ”€β”€ align.7.1.log
    β”‚Β Β  β”œβ”€β”€ align.8.1.log
    β”‚Β Β  β”œβ”€β”€ align.9.1.log
    β”‚Β Β  β”œβ”€β”€ compile_graphs.1.log
    β”‚Β Β  β”œβ”€β”€ init.log
    β”‚Β Β  β”œβ”€β”€ update.0.log
    β”‚Β Β  β”œβ”€β”€ update.10.log
    β”‚Β Β  β”œβ”€β”€ update.11.log
    β”‚Β Β  β”œβ”€β”€ update.12.log
    β”‚Β Β  β”œβ”€β”€ update.13.log
    β”‚Β Β  β”œβ”€β”€ update.14.log
    β”‚Β Β  β”œβ”€β”€ update.15.log
    β”‚Β Β  β”œβ”€β”€ update.16.log
    β”‚Β Β  β”œβ”€β”€ update.17.log
    β”‚Β Β  β”œβ”€β”€ update.18.log
    β”‚Β Β  β”œβ”€β”€ update.19.log
    β”‚Β Β  β”œβ”€β”€ update.1.log
    β”‚Β Β  β”œβ”€β”€ update.20.log
    β”‚Β Β  β”œβ”€β”€ update.21.log
    β”‚Β Β  β”œβ”€β”€ update.22.log
    β”‚Β Β  β”œβ”€β”€ update.23.log
    β”‚Β Β  β”œβ”€β”€ update.24.log
    β”‚Β Β  β”œβ”€β”€ update.25.log
    β”‚Β Β  β”œβ”€β”€ update.26.log
    β”‚Β Β  β”œβ”€β”€ update.27.log
    β”‚Β Β  β”œβ”€β”€ update.28.log
    β”‚Β Β  β”œβ”€β”€ update.29.log
    β”‚Β Β  β”œβ”€β”€ update.2.log
    β”‚Β Β  β”œβ”€β”€ update.30.log
    β”‚Β Β  β”œβ”€β”€ update.31.log
    β”‚Β Β  β”œβ”€β”€ update.32.log
    β”‚Β Β  β”œβ”€β”€ update.33.log
    β”‚Β Β  β”œβ”€β”€ update.34.log
    β”‚Β Β  β”œβ”€β”€ update.35.log
    β”‚Β Β  β”œβ”€β”€ update.36.log
    β”‚Β Β  β”œβ”€β”€ update.37.log
    β”‚Β Β  β”œβ”€β”€ update.38.log
    β”‚Β Β  β”œβ”€β”€ update.39.log
    β”‚Β Β  β”œβ”€β”€ update.3.log
    β”‚Β Β  β”œβ”€β”€ update.4.log
    β”‚Β Β  β”œβ”€β”€ update.5.log
    β”‚Β Β  β”œβ”€β”€ update.6.log
    β”‚Β Β  β”œβ”€β”€ update.7.log
    β”‚Β Β  β”œβ”€β”€ update.8.log
    β”‚Β Β  └── update.9.log
    β”œβ”€β”€ num_jobs
    └── tree

11 directories, 145 files

The last directory created by the run.sh script isn’t super interesting, but it’s essential. This is the mfcc/ dir. This directory holds all the .ark (archive) and .scp (script) files for (1) the MFCC features as well as (2) the cepstral mean and variance statistics per speaker.

josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./mfcc/
./mfcc/
β”œβ”€β”€ cmvn_test_yesno.ark
β”œβ”€β”€ cmvn_test_yesno.scp
β”œβ”€β”€ cmvn_train_yesno.ark
β”œβ”€β”€ cmvn_train_yesno.scp
β”œβ”€β”€ raw_mfcc_test_yesno.1.ark
β”œβ”€β”€ raw_mfcc_test_yesno.1.scp
β”œβ”€β”€ raw_mfcc_train_yesno.1.ark
└── raw_mfcc_train_yesno.1.scp

0 directories, 8 files


Conclusion

I hope this was helpful!

Let me know if you have comments or suggestions and you can always leave a comment below.

Happy Kaldi-ing!