π Hi, itβs Josh here. Iβm writing you this note in 2021: the world of speech technology has changed dramatically since Kaldi. Before devoting weeks of your time to deploying Kaldi, take a look at πΈ Coqui Speech-to-Text . It takes minutes to deploy an off-the-shelf πΈ STT model, and itβs open source on Github . Iβm on the Coqui founding team so Iβm admittedly biased. However, you can tell from this blog that Iβve spent years working with Kaldi, so I understand the headaches.
With πΈ STT, weβve removed the headaches of Kaldi and streamlined everything for production settings. You can train and deploy state-of-the-art πΈ Speech-to-Text models in just minutes, not weeks. Check out the πΈ Model Zoo for open, pre-trained models in different languages. Try it out for yourself, and come join our friendly chatroom π
Installation via GitHub
Kaldi is primarily hosted on GitHub (not SourceForge anymore), so Iβm going to just clone the official GitHub repository to my Desktop and go from there.
josh@yoga:~/Desktop$ git clone https://github.com/kaldi-asr/kaldi.git
Cloning into 'kaldi' ...
remote: Counting objects: 63320, done .
remote: Compressing objects: 100% ( 22/22) , done .
remote: Total 63320 ( delta 5) , reused 0 ( delta 0) , pack-reused 63298
Receiving objects: 100% ( 63320/63320) , 74.94 MiB | 8.26 MiB/s, done .
Resolving deltas: 100% ( 49427/49427) , done .
Checking connectivity... done .
Taking a look inside to see what I just cloned:
josh@yoga:~/Desktop$ cd kaldi/
josh@yoga:~/Desktop/kaldi$ la
COPYING .git .gitignore misc src .travis.yml
egs .gitattributes INSTALL README.md tools windows
Now thereβs a lot of good official documentation for Kaldi, but I think the best install info will always be in the INSTALL file on the latest version is. So, letβs take a look:
josh@yoga:~/Desktop/kaldi$ cat INSTALL
This is the official Kaldi INSTALL. Look also at INSTALL.md for the git mirror installation.
[ for native Windows install , see windows/INSTALL]
( 1)
go to tools/ and follow INSTALL instructions there.
( 2)
go to src/ and follow INSTALL instructions there.
First things first, it says to go to tools/ and follow those instructions. So, lets cd into tools/ and see whatβs there:
josh@yoga:~/Desktop/kaldi$ cd tools/
josh@yoga:~/Desktop/kaldi/tools$ la
CLAPACK INSTALL install_pfile_utils.sh install_speex.sh Makefile
extras install_atlas.sh install_portaudio.sh install_srilm.sh
Looking into the INSTALL file, we see:
josh@yoga:~/Desktop/kaldi/tools$ cat INSTALL
To install the most important prerequisites for Kaldi:
first do
extras/check_dependencies.sh
to see if there are any system-level installations or modifications you need to do .
Check the output carefully: there are some things that will make your life a lot
easier if you fix them at this stage.
Then run
make
If you have multiple CPUs and want to speed things up, you can do a parallel
build by supplying the "-j" option to make, e.g. to use 4 CPUs:
make -j 4
By default, Kaldi builds against OpenFst-1.3.4. If you want to build against
OpenFst-1.4, edit the Makefile in this folder. Note that this change requires
a relatively new compiler with C++11 support, e.g. gcc >= 4.6, clang >= 3.0.
In extras/, there are also various scripts to install extra bits and pieces that
are used by individual example scripts. If an example script needs you to run
one of those scripts, it will tell you what to do .
So, first we need to check out dependencies:
josh@yoga:~/Desktop/kaldi/tools$ extras/check_dependencies.sh
extras/check_dependencies.sh: all OK.
Iβm OK on this one, but I have a feeling others will need to do some installing of dependencies before they move on. Iβd recommend running that check_dependencies.sh script after you do your installs to make sure you actually did install what you needed and that itβs in the right spot.
Moving along, we need to run make . Thereβs an option here for parallelizing this step, so Iβm going to check how many processors I have:
josh@yoga:~/Desktop$ nproc
4
So I can run make on all 4 of my processors like this:
josh@yoga:~/Desktop/kaldi/tools$ make -j 4
.
.
.
make[3]: Entering directory ` /home/josh/Desktop/kaldi/tools/openfst-1.3.4'
make[3]: Nothing to be done for `install-exec-am' .
make[3]: Nothing to be done for ` install-data-am'.
make[3]: Leaving directory `/home/josh/Desktop/kaldi/tools/openfst-1.3.4'
make[2]: Leaving directory ` /home/josh/Desktop/kaldi/tools/openfst-1.3.4'
make[1]: Leaving directory `/home/josh/Desktop/kaldi/tools/openfst-1.3.4'
rm -f openfst
ln -s openfst-1.3.4 openfst
Warning: IRSTLM is not installed by default anymore. If you need IRSTLM
Warning: use the script extras/install_irstlm.sh
All done OK.
josh@yoga:~/Desktop/kaldi/tools$
Those last lines recommend we install a language modeling toolkit IRSTLM , and I want to make my own language models, so Iβm going to install it. If youβre using some pre-existing language model, you can skip these next few steps.
josh@yoga:~/Desktop/kaldi/tools$ extras/install_irstlm.sh
.
.
.
make[1]: Entering directory ` /home/josh/Desktop/kaldi/tools/irstlm'
make[2]: Entering directory `/home/josh/Desktop/kaldi/tools/irstlm'
make[2]: Nothing to be done for ` install-exec-am'.
make[2]: Nothing to be done for `install-data-am' .
make[2]: Leaving directory ` /home/josh/Desktop/kaldi/tools/irstlm'
make[1]: Leaving directory `/home/josh/Desktop/kaldi/tools/irstlm'
readlink : missing operand
Try 'readlink --help' for more information.
*** () Installation of IRSTLM finished successfully
*** () Please source the tools/env.sh in your path.sh to enable it
Now we should have a working installation of IRSTLM on the computer, and you can verify by looking into /usr/local :
josh@yoga:~/Desktop/kaldi/tools$ cd /usr/local/
josh@yoga:/usr/local$ ls
bin etc games include irstlm lib libexec man MATLAB sbin share src
josh@yoga:/usr/local$ ls irstlm/
bin include lib
We donβt have to do anything else with IRSTLM right now because weβre just installing. But itβll be there when you need it!
So, at this point weβve done part (1) of the kaldi/INSTALL file (i.e. following the steps in the kaldi/tools/INSTALL file).
Now letβs go on to step (2), and follow the instructions in kaldi/src/INSTALL .
josh@yoga:~/Desktop/kaldi/tools$ cd ../src/
josh@yoga:~/Desktop/kaldi/src$ la
base Doxyfile gmm ivector lm nnet2 online sgmm2 tree
bin feat gmmbin ivectorbin lmbin nnet2bin online2 sgmm2bin util
configure featbin gst-plugin kws Makefile nnet3 online2bin sgmmbin
cudamatrix fgmmbin hmm kwsbin makefiles nnet3bin onlinebin thread
decoder fstbin INSTALL lat matrix nnetbin probe TODO
doc fstext itf latbin nnet NOTES sgmm transform
Looking into the INSTALL file itself:
josh@yoga:~/Desktop/kaldi/src$ cat INSTALL
These instructions are valid for UNIX-like systems ( these steps have
been run on various Linux distributions; Darwin; Cygwin) . For native Windows
compilation, see ../windows/INSTALL.
You must first have completed the installation steps in ../tools/INSTALL
( compiling OpenFst; getting ATLAS and CLAPACK headers) .
The installation instructions are:
./configure
make depend
make
Note that "make" takes a long time ; you can speed it up by running make
in parallel if you have multiple CPUs, for instance
make depend -j 8
make -j 8
For more information, see documentation at http://kaldi-asr.org/doc/
and click on "The build process (how Kaldi is compiled)" .
Like it says, the first step is to run the ./configure script:
josh@yoga:~/Desktop/kaldi/src$ ./configure
Configuring ...
Checking OpenFST library in /home/josh/Desktop/kaldi/tools/openfst ...
Checking OpenFst library was patched.
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Successfully configured for Debian/Ubuntu Linux [ dynamic libraries] with ATLASLIBS = /usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3
CUDA will not be used! If you have already installed cuda drivers
and cuda toolkit, try using --cudatk-dir = ... option. Note: this is
only relevant for neural net experiments
Static =[ false ] Speex library not found: You can still build Kaldi without Speex.
SUCCESS
Now we run make depend :
josh@yoga:~/Desktop/kaldi/src$ make depend -j 4
.
.
.
make[1]: Leaving directory ` /home/josh/Desktop/kaldi/src/online2'
make -C online2bin/ depend
make[1]: Entering directory `/home/josh/Desktop/kaldi/src/online2bin'
g++ -M -msse -msse2 -Wall -I .. -pthread -DKALDI_DOUBLEPRECISION = 0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H = 1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I /home/josh/Desktop/kaldi/tools/ATLAS/include -I /home/josh/Desktop/kaldi/tools/openfst/include -g * .cc > .depend.mk
make[1]: Leaving directory ` /home/josh/Desktop/kaldi/src/online2bin'
make -C lmbin/ depend
make[1]: Entering directory `/home/josh/Desktop/kaldi/src/lmbin'
g++ -M -msse -msse2 -Wall -I .. -pthread -DKALDI_DOUBLEPRECISION = 0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H = 1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I /home/josh/Desktop/kaldi/tools/ATLAS/include -I /home/josh/Desktop/kaldi/tools/openfst/include -Wno-sign-compare -g * .cc > .depend.mk
make[1]: Leaving directory ` /home/josh/Desktop/kaldi/src/lmbin'
And finally, make :
josh@yoga:~/Desktop/kaldi/src$ make -j 4
.
.
.
make -C lmbin
make[1]: Entering directory ` /home/josh/Desktop/kaldi/src/lmbin'
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Wno-unused-local-typedefs -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/home/josh/Desktop/kaldi/tools/ATLAS/include -I/home/josh/Desktop/kaldi/tools/openfst/include -Wno-sign-compare -g -c -o arpa-to-const-arpa.o arpa-to-const-arpa.cc
g++ -rdynamic -Wl,-rpath=/home/josh/Desktop/kaldi/tools/openfst/lib arpa-to-const-arpa.o ../lm/kaldi-lm.a ../util/kaldi-util.a ../base/kaldi-base.a -L/home/josh/Desktop/kaldi/tools/openfst/lib -lfst /usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3 -lm -lpthread -ldl -o arpa-to-const-arpa
make[1]: Leaving directory `/home/josh/Desktop/kaldi/src/lmbin'
echo Done
Done
If youβve gotten to this point without any hiccups, you should now have a working installation of Kaldi!
Testing Kaldi Out
The YESNO Example Recipe
To make sure our install worked well, we can take advantage of the examples provided in the kaldi/egs/ directory:
josh@yoga:~/Desktop/kaldi/src$ cd ../egs/
josh@yoga:~/Desktop/kaldi/egs$ la
ami chime1 fisher_english librispeech sprakbanken tidigits yesno
aspire chime2 fisher_swbd lre sre08 timit
aurora4 chime3 gale_arabic lre07 sre10 voxforge
babel csj gale_mandarin README.txt swahili vystadial_cz
bn_music_speech farsdat gp reverb swbd vystadial_en
callhome_egyptian fisher_callhome_spanish hkust rm tedlium wsj
Letβs take a look at the README.txt file:
josh@yoga:~/Desktop/kaldi/egs$ cat README.txt
This directory contains example scripts that demonstrate how to
use Kaldi. Each subdirectory corresponds to a corpus that we have
example scripts for .
Note: we now have some scripts using free data, including voxforge,
vystadial_{ cz,en} and yesno. Most of the others are available from
the Linguistic Data Consortium ( LDC) , which requires money ( unless you
have a membership) .
If you have an LDC membership, probably rm /s5 or wsj/s5 should be your first
choice to try out the scripts.
Since we can try out yesno off the shelf (the WAV files are downloaded when you run the run.sh script), weβre going to go with that one.
josh@yoga:~/Desktop/kaldi/egs$ cd yesno/
josh@yoga:~/Desktop/kaldi/egs/yesno la
README.txt s5
josh@yoga:~/Desktop/kaldi/egs/yesno$ cat README.txt
The "yesno" corpus is a very small dataset of recordings of one individual
saying yes or no multiple times per recording, in Hebrew. It is available from
http://www.openslr.org/1.
It is mainly included here as an easy way to test out the Kaldi scripts.
The test set is perfectly recognized at the monophone stage, so the dataset is
not exactly challenging.
The scripts are in ** s5/** .
Pre-Training File Structure
To get a clearer picture of the file structure, I like to use the tree command to display the file structure as a tree with indented braches. You might have to install tree , but Iβd say itβs worth it.
josh@yoga:~/Desktop/kaldi/egs/yesno$ tree .
.
βββ README.txt
βββ s5
βββ conf
βΒ Β βββ mfcc.conf
βΒ Β βββ topo_orig.proto
βββ input
βΒ Β βββ lexicon_nosil.txt
βΒ Β βββ lexicon.txt
βΒ Β βββ phones.txt
βΒ Β βββ task.arpabo
βββ local
βΒ Β βββ create_yesno_txt.pl
βΒ Β βββ create_yesno_waves_test_train.pl
βΒ Β βββ create_yesno_wav_scp.pl
βΒ Β βββ prepare_data.sh
βΒ Β βββ prepare_dict.sh
βΒ Β βββ prepare_lm.sh
βΒ Β βββ score.sh
βββ path.sh
βββ run.sh
βββ steps -> ../../wsj/s5/steps
βββ utils -> ../../wsj/s5/utils
6 directories, 16 files
These original directories contain general information about the language (in the input/ dir) as well as instructions for preparing the data and scoring it (in the local/ dir) as well as information about the kind of model we want to train and test (in the conf/ dir).
More big-picture scripts (e.g. training monophones, extracting MFCCs from WAV files, etc) are in the steps/ and utils/ dirs. In this case, since these scripts are easily generalizable, Kaldi stores them for all examples in the same directory (in the Wall Street Journal example). All other example dirs (like YESNO) have symbolic links to those dirs.
Data Prep & Training & Testing: The run.sh Script
Now lets cd into the s5/ directory (which holds all the relevant scripts and data for running this example) and run the run.sh script.
josh@yoga:~/Desktop/kaldi/egs/yesno$ cd s5/
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ ./run.sh
--2016-02-08 18:42:03-- http://www.openslr.org/resources/1/waves_yesno.tar.gz
Resolving www.openslr.org ( www.openslr.org) ... 107.178.217.247
Connecting to www.openslr.org ( www.openslr.org) |107.178.217.247|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4703754 ( 4.5M) [ application/x-gzip]
Saving to: 'waves_yesno.tar.gz'
100%[================================================================>] 4,703,754 630KB/s in 6.9s
2016-02-08 18:42:10 ( 661 KB/s) - 'waves_yesno.tar.gz' saved [ 4703754/4703754]
waves_yesno/
waves_yesno/1_0_0_0_0_0_1_1.wav
waves_yesno/1_1_0_0_1_0_1_0.wav
waves_yesno/1_0_1_1_1_1_0_1.wav
waves_yesno/1_1_1_1_0_1_0_0.wav
waves_yesno/0_0_1_1_1_0_0_0.wav
.
.
.
waves_yesno/0_0_0_1_0_1_1_0.wav
waves_yesno/1_1_1_1_1_1_0_0.wav
waves_yesno/0_0_0_0_1_1_1_1.wav
Preparing train and test data
Dictionary preparation succeeded
Checking data/local/dict/silence_phones.txt ...
-- > reading data/local/dict/silence_phones.txt
-- > data/local/dict/silence_phones.txt is OK
Checking data/local/dict/optional_silence.txt ...
-- > reading data/local/dict/optional_silence.txt
-- > data/local/dict/optional_silence.txt is OK
Checking data/local/dict/nonsilence_phones.txt ...
-- > reading data/local/dict/nonsilence_phones.txt
-- > data/local/dict/nonsilence_phones.txt is OK
.
.
.
steps/train_mono.sh: Initializing monophone system.
steps/train_mono.sh: Compiling training graphs
steps/train_mono.sh: Aligning data equally ( pass 0)
steps/train_mono.sh: Pass 1
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 2
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 3
.
.
.
0.755859 -0 .000430956
HCLGa is not stochastic
add-self-loops --self-loop-scale = 0.1 --reorder = true exp/mono0a/final.mdl
steps/decode.sh --nj 1 --cmd utils/run.pl exp/mono0a/graph_tgpr data/test_yesno exp/mono0a/decode_test_yesno
** split_data.sh: warning, #lines is (utt2spk,feats.scp) is (31,29); you can
** use utils/fix_data_dir.sh data/test_yesno to fix this.
decode.sh: feature type is delta
%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] [ PARTIAL] exp/mono0a/decode_test_yesno/wer_10
You can see from the last line of output, that as we were warned in the README , this data set is not interesting because we get perfect performance, and our percent Word Error Rate was indeed %0.00.
Post-Training & Testing File Structure
If we take another look at the yesno dir, we will see that our run.sh file generated some more directories and files for us.
Iβm going to use the tree function below with the -d flag so we only see directories. Otherwise, all the downloaded WAV files are listed and itβs a little much.
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ la
conf data exp input local mfcc path.sh run.sh steps utils waves_yesno waves_yesno.tar.gz
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree -d .
.
|-- conf
|-- data
| |-- lang
| | ` -- phones
| |-- lang_test_tg
| | |-- phones
| | ` -- tmp
| |-- local
| | |-- dict
| | ` -- lang
| |-- test_yesno
| | ` -- split1
| | ` -- 1
| ` -- train_yesno
| ` -- split1
| ` -- 1
|-- exp
| |-- make_mfcc
| | |-- test_yesno
| | ` -- train_yesno
| ` -- mono0a
| |-- decode_test_yesno
| | |-- log
| | ` -- scoring
| | ` -- log
| |-- graph_tgpr
| | ` -- phones
| ` -- log
|-- input
|-- local
|-- mfcc
|-- steps -> ../../wsj/s5/steps
|-- utils -> ../../wsj/s5/utils
` -- waves_yesno
34 directories
Walking down the subdirs, we can see that the three original dirs were left unchanged:
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./conf/
./conf/
|-- mfcc.conf
` -- topo_orig.proto
0 directories, 2 files
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./input/
./input/
|-- lexicon.txt
|-- lexicon_nosil.txt
|-- phones.txt
` -- task.arpabo
0 directories, 4 files
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./local/
./local/
|-- create_yesno_txt.pl
|-- create_yesno_wav_scp.pl
|-- create_yesno_waves_test_train.pl
|-- prepare_data.sh
|-- prepare_dict.sh
|-- prepare_lm.sh
` -- score.sh
0 directories, 7 files
These are unchanged because these original directories are housing general information about the language (in the input/ dir) as well as instructions for preparing the data and scoring it (in the local/ dir) as well as information about the kind of model we want to train and test (in the conf/ dir).
Logically, nothing about these files and directories should change after we train and test the model.
However, the newly created data/ directory has a lot of new stuff in it. In general, this directory created by the run.sh script houses and organizes the files which describe the language (e.g. dictionary, phone lists, etc) and data (e.g. WAV file ids and their transcripts) to test and train the model.
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./data/
./data/
βββ lang
βΒ Β βββ L_disambig.fst
βΒ Β βββ L.fst
βΒ Β βββ oov.int
βΒ Β βββ oov.txt
βΒ Β βββ phones
βΒ Β βΒ Β βββ align_lexicon.int
βΒ Β βΒ Β βββ align_lexicon.txt
βΒ Β βΒ Β βββ context_indep.csl
βΒ Β βΒ Β βββ context_indep.int
βΒ Β βΒ Β βββ context_indep.txt
βΒ Β βΒ Β βββ disambig.csl
βΒ Β βΒ Β βββ disambig.int
βΒ Β βΒ Β βββ disambig.txt
βΒ Β βΒ Β βββ extra_questions.int
βΒ Β βΒ Β βββ extra_questions.txt
βΒ Β βΒ Β βββ nonsilence.csl
βΒ Β βΒ Β βββ nonsilence.int
βΒ Β βΒ Β βββ nonsilence.txt
βΒ Β βΒ Β βββ optional_silence.csl
βΒ Β βΒ Β βββ optional_silence.int
βΒ Β βΒ Β βββ optional_silence.txt
βΒ Β βΒ Β βββ roots.int
βΒ Β βΒ Β βββ roots.txt
βΒ Β βΒ Β βββ sets.int
βΒ Β βΒ Β βββ sets.txt
βΒ Β βΒ Β βββ silence.csl
βΒ Β βΒ Β βββ silence.int
βΒ Β βΒ Β βββ silence.txt
βΒ Β βΒ Β βββ wdisambig_phones.int
βΒ Β βΒ Β βββ wdisambig.txt
βΒ Β βΒ Β βββ wdisambig_words.int
βΒ Β βββ phones.txt
βΒ Β βββ topo
βΒ Β βββ words.txt
βββ lang_test_tg
βΒ Β βββ G.fst
βΒ Β βββ L_disambig.fst
βΒ Β βββ L.fst
βΒ Β βββ oov.int
βΒ Β βββ oov.txt
βΒ Β βββ phones
βΒ Β βΒ Β βββ align_lexicon.int
βΒ Β βΒ Β βββ align_lexicon.txt
βΒ Β βΒ Β βββ context_indep.csl
βΒ Β βΒ Β βββ context_indep.int
βΒ Β βΒ Β βββ context_indep.txt
βΒ Β βΒ Β βββ disambig.csl
βΒ Β βΒ Β βββ disambig.int
βΒ Β βΒ Β βββ disambig.txt
βΒ Β βΒ Β βββ extra_questions.int
βΒ Β βΒ Β βββ extra_questions.txt
βΒ Β βΒ Β βββ nonsilence.csl
βΒ Β βΒ Β βββ nonsilence.int
βΒ Β βΒ Β βββ nonsilence.txt
βΒ Β βΒ Β βββ optional_silence.csl
βΒ Β βΒ Β βββ optional_silence.int
βΒ Β βΒ Β βββ optional_silence.txt
βΒ Β βΒ Β βββ roots.int
βΒ Β βΒ Β βββ roots.txt
βΒ Β βΒ Β βββ sets.int
βΒ Β βΒ Β βββ sets.txt
βΒ Β βΒ Β βββ silence.csl
βΒ Β βΒ Β βββ silence.int
βΒ Β βΒ Β βββ silence.txt
βΒ Β βΒ Β βββ wdisambig_phones.int
βΒ Β βΒ Β βββ wdisambig.txt
βΒ Β βΒ Β βββ wdisambig_words.int
βΒ Β βββ phones.txt
βΒ Β βββ tmp
βΒ Β βΒ Β βββ CLG_1_0.fst
βΒ Β βΒ Β βββ disambig_ilabels_1_0.int
βΒ Β βΒ Β βββ ilabels_1_0
βΒ Β βΒ Β βββ LG.fst
βΒ Β βββ topo
βΒ Β βββ words.txt
βββ local
βΒ Β βββ dict
βΒ Β βΒ Β βββ lexiconp.txt
βΒ Β βΒ Β βββ lexicon.txt
βΒ Β βΒ Β βββ lexicon_words.txt
βΒ Β βΒ Β βββ nonsilence_phones.txt
βΒ Β βΒ Β βββ optional_silence.txt
βΒ Β βΒ Β βββ silence_phones.txt
βΒ Β βββ lang
βΒ Β βΒ Β βββ align_lexicon.txt
βΒ Β βΒ Β βββ lexiconp_disambig.txt
βΒ Β βΒ Β βββ lexiconp.txt
βΒ Β βΒ Β βββ lex_ndisambig
βΒ Β βΒ Β βββ phone_map.txt
βΒ Β βΒ Β βββ phones
βΒ Β βββ lm_tg.arpa
βΒ Β βββ test_yesno.txt
βΒ Β βββ test_yesno_wav.scp
βΒ Β βββ train_yesno.txt
βΒ Β βββ train_yesno_wav.scp
βΒ Β βββ waves_all.list
βΒ Β βββ waves.test
βΒ Β βββ waves.train
βββ test_yesno
βΒ Β βββ cmvn.scp
βΒ Β βββ feats.scp
βΒ Β βββ spk2utt
βΒ Β βββ split1
βΒ Β βΒ Β βββ 1
βΒ Β βΒ Β βββ cmvn.scp
βΒ Β βΒ Β βββ feats.scp
βΒ Β βΒ Β βββ spk2utt
βΒ Β βΒ Β βββ text
βΒ Β βΒ Β βββ utt2spk
βΒ Β βΒ Β βββ wav.scp
βΒ Β βββ text
βΒ Β βββ utt2spk
βΒ Β βββ wav.scp
βββ train_yesno
βββ cmvn.scp
βββ feats.scp
βββ spk2utt
βββ split1
βΒ Β βββ 1
βΒ Β βββ cmvn.scp
βΒ Β βββ feats.scp
βΒ Β βββ spk2utt
βΒ Β βββ text
βΒ Β βββ utt2spk
βΒ Β βββ wav.scp
βββ text
βββ utt2spk
βββ wav.scp
14 directories, 115 files
The next directory created by the run.sh script is the exp/ directory. As far as I can gather, βexpβ is short for βexperimentβ. I think this is the case becuase the exp/ dir holds information about the model youβre training and testing. It has a lot of files as you see below, and you can see that a lot of them (if not most) are .log files.
I think that Kaldi could have more transparent naming conventions for files and directories, but I will say that the log files are very thorough. Thereβs a lot of info to be found if you do some digging.
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./exp/
./exp/
βββ make_mfcc
βΒ Β βββ test_yesno
βΒ Β βΒ Β βββ cmvn_test_yesno.log
βΒ Β βΒ Β βββ make_mfcc_test_yesno.1.log
βΒ Β βββ train_yesno
βΒ Β βββ cmvn_train_yesno.log
βΒ Β βββ make_mfcc_train_yesno.1.log
βββ mono0a
βββ 0.mdl
βββ 40.mdl
βββ 40.occs
βββ ali.1.gz
βββ cmvn_opts
βββ decode_test_yesno
βΒ Β βββ lat.1.gz
βΒ Β βββ log
βΒ Β βΒ Β βββ decode.1.log
βΒ Β βββ num_jobs
βΒ Β βββ scoring
βΒ Β βΒ Β βββ 10.tra
βΒ Β βΒ Β βββ 11.tra
βΒ Β βΒ Β βββ 9.tra
βΒ Β βΒ Β βββ log
βΒ Β βΒ Β βΒ Β βββ best_path.10.log
βΒ Β βΒ Β βΒ Β βββ best_path.11.log
βΒ Β βΒ Β βΒ Β βββ best_path.9.log
βΒ Β βΒ Β βΒ Β βββ score.10.log
βΒ Β βΒ Β βΒ Β βββ score.11.log
βΒ Β βΒ Β βΒ Β βββ score.9.log
βΒ Β βΒ Β βββ test_filt.txt
βΒ Β βββ wer_10
βΒ Β βββ wer_11
βΒ Β βββ wer_9
βββ final.mdl -> 40.mdl
βββ final.occs -> 40.occs
βββ fsts.1.gz
βββ graph_tgpr
βΒ Β βββ disambig_tid.int
βΒ Β βββ Ha.fst
βΒ Β βββ HCLGa.fst
βΒ Β βββ HCLG.fst
βΒ Β βββ num_pdfs
βΒ Β βββ phones
βΒ Β βΒ Β βββ align_lexicon.int
βΒ Β βΒ Β βββ align_lexicon.txt
βΒ Β βΒ Β βββ disambig.int
βΒ Β βΒ Β βββ disambig.txt
βΒ Β βΒ Β βββ silence.csl
βΒ Β βββ phones.txt
βΒ Β βββ words.txt
βββ log
βΒ Β βββ acc.10.1.log
βΒ Β βββ acc.11.1.log
βΒ Β βββ acc.1.1.log
βΒ Β βββ acc.12.1.log
βΒ Β βββ acc.13.1.log
βΒ Β βββ acc.14.1.log
βΒ Β βββ acc.15.1.log
βΒ Β βββ acc.16.1.log
βΒ Β βββ acc.17.1.log
βΒ Β βββ acc.18.1.log
βΒ Β βββ acc.19.1.log
βΒ Β βββ acc.20.1.log
βΒ Β βββ acc.21.1.log
βΒ Β βββ acc.2.1.log
βΒ Β βββ acc.22.1.log
βΒ Β βββ acc.23.1.log
βΒ Β βββ acc.24.1.log
βΒ Β βββ acc.25.1.log
βΒ Β βββ acc.26.1.log
βΒ Β βββ acc.27.1.log
βΒ Β βββ acc.28.1.log
βΒ Β βββ acc.29.1.log
βΒ Β βββ acc.30.1.log
βΒ Β βββ acc.31.1.log
βΒ Β βββ acc.3.1.log
βΒ Β βββ acc.32.1.log
βΒ Β βββ acc.33.1.log
βΒ Β βββ acc.34.1.log
βΒ Β βββ acc.35.1.log
βΒ Β βββ acc.36.1.log
βΒ Β βββ acc.37.1.log
βΒ Β βββ acc.38.1.log
βΒ Β βββ acc.39.1.log
βΒ Β βββ acc.4.1.log
βΒ Β βββ acc.5.1.log
βΒ Β βββ acc.6.1.log
βΒ Β βββ acc.7.1.log
βΒ Β βββ acc.8.1.log
βΒ Β βββ acc.9.1.log
βΒ Β βββ align.0.1.log
βΒ Β βββ align.10.1.log
βΒ Β βββ align.1.1.log
βΒ Β βββ align.12.1.log
βΒ Β βββ align.14.1.log
βΒ Β βββ align.16.1.log
βΒ Β βββ align.18.1.log
βΒ Β βββ align.20.1.log
βΒ Β βββ align.2.1.log
βΒ Β βββ align.23.1.log
βΒ Β βββ align.26.1.log
βΒ Β βββ align.29.1.log
βΒ Β βββ align.3.1.log
βΒ Β βββ align.32.1.log
βΒ Β βββ align.35.1.log
βΒ Β βββ align.38.1.log
βΒ Β βββ align.4.1.log
βΒ Β βββ align.5.1.log
βΒ Β βββ align.6.1.log
βΒ Β βββ align.7.1.log
βΒ Β βββ align.8.1.log
βΒ Β βββ align.9.1.log
βΒ Β βββ compile_graphs.1.log
βΒ Β βββ init.log
βΒ Β βββ update.0.log
βΒ Β βββ update.10.log
βΒ Β βββ update.11.log
βΒ Β βββ update.12.log
βΒ Β βββ update.13.log
βΒ Β βββ update.14.log
βΒ Β βββ update.15.log
βΒ Β βββ update.16.log
βΒ Β βββ update.17.log
βΒ Β βββ update.18.log
βΒ Β βββ update.19.log
βΒ Β βββ update.1.log
βΒ Β βββ update.20.log
βΒ Β βββ update.21.log
βΒ Β βββ update.22.log
βΒ Β βββ update.23.log
βΒ Β βββ update.24.log
βΒ Β βββ update.25.log
βΒ Β βββ update.26.log
βΒ Β βββ update.27.log
βΒ Β βββ update.28.log
βΒ Β βββ update.29.log
βΒ Β βββ update.2.log
βΒ Β βββ update.30.log
βΒ Β βββ update.31.log
βΒ Β βββ update.32.log
βΒ Β βββ update.33.log
βΒ Β βββ update.34.log
βΒ Β βββ update.35.log
βΒ Β βββ update.36.log
βΒ Β βββ update.37.log
βΒ Β βββ update.38.log
βΒ Β βββ update.39.log
βΒ Β βββ update.3.log
βΒ Β βββ update.4.log
βΒ Β βββ update.5.log
βΒ Β βββ update.6.log
βΒ Β βββ update.7.log
βΒ Β βββ update.8.log
βΒ Β βββ update.9.log
βββ num_jobs
βββ tree
11 directories, 145 files
The last directory created by the run.sh script isnβt super interesting, but itβs essential. This is the mfcc/ dir. This directory holds all the .ark (archive) and .scp (script) files for (1) the MFCC features as well as (2) the cepstral mean and variance statistics per speaker.
josh@yoga:~/Desktop/kaldi/egs/yesno/s5$ tree ./mfcc/
./mfcc/
βββ cmvn_test_yesno.ark
βββ cmvn_test_yesno.scp
βββ cmvn_train_yesno.ark
βββ cmvn_train_yesno.scp
βββ raw_mfcc_test_yesno.1.ark
βββ raw_mfcc_test_yesno.1.scp
βββ raw_mfcc_train_yesno.1.ark
βββ raw_mfcc_train_yesno.1.scp
0 directories, 8 files
Conclusion
I hope this was helpful!
Let me know if you have comments or suggestions and you can always leave a comment below.
Happy Kaldi-ing!