How to Visualize a Word Lattice with Kaldi
š Hi, itās Josh here. Iām writing you this note in 2021: the world of speech technology has changed dramatically since Kaldi. Before devoting weeks of your time to deploying Kaldi, take a look at šø [Coqui Speech-to-Text][coqui-github]. It takes minutes to deploy an off-the-shelf šø STT model, and itās [open source on Github][coqui-github]. Iām on the Coqui founding team so Iām admittedly biased. However, you can tell from this blog that Iāve spent years working with Kaldi, so I understand the headaches.
With šø STT, weāve removed the headaches of Kaldi and streamlined everything for production settings. You can train and deploy state-of-the-art šø Speech-to-Text models in just minutes, not weeks. Check out the [šø Model Zoo][coqui-model-zoo] for open, pre-trained models in different languages. Try it out for yourself, and come join our [friendly chatroom][coqui-gitter] š
Introduction
If you want to take a step back and learn about Kaldi in general, I have posts on how to install Kaldi or some miscellaneous Kaldi notes which contain some documentation.
This is just a very short post on how to visualize a word lattice with Kaldi.
Effectively, there is a simple script already included in the official Kaldi repository, within the Wall Street Journal example utils directory.
Dependencies
You need to have install the dot program provided by Graphviz. This
show_lattice.sh
After re-aranging the original Kaldi script a little bit and adding my own comments, hereās what my version looks of show_lattice.sh like:
Running the Script
When I run this script to display a lattice I generated for the Kyrgyz language, this is command I use:
All the script takes is:
- the utterance ID of the lattice you want to visualize
- the path to the (compressed) ark file of lattices in which the target utterance is located
- the word list of the graph you used to decode the utterance
Itās as simple as that!
There are a couple of parameters you can play around with while visualizing the lattice.
- Acoustic model scale: āacoustic-scale
- Language model scale: ālm-scale
The size of the vertices on the graph will change according to the values you insert for these two parameters. The default value for both parameters is 0.0, and with that you will be shown a plain graph where all the vertices are the same size. All the edges will contain the word and the word ID, and vertices will have an ID as well.
Here is a graph with āacoustic-scale=0 and ālm-scale=0:
Here is a graph with āacoustic-scale=0.1 and ālm-scale=0:
Here is a graph with āacoustic-scale=0 and ālm-scale=10:
Here is a graph with āacoustic-scale=0.1 and ālm-scale=10:
Conclusion
I hope this was helpful!
If you have any feedback or questions, donāt hesitate to leave a comment!