Ondřej Plátek Blog
PhD candidate@UFAL, Prague. LLM & TTS evaluation. Engineer. Researcher. Speaker. Father.

IPython demo Pykaldi decoders on short noise wav

[]

Pykaldi: Demostrating Python extension and customized Kaldi decoder

The Pykaldi (for of Kaldi) for building the Python wrapper and decoder is available at:

Pykaldi itself depends on forked pyfst

The decoder is used in Dialog system Alex

https://github.com/UFAL-DSG/alex

For Czech: Test it for FREE at 800 899 998!

More about the public transport information can be found at:

In [2]:
from pykaldi.decoders import PyGmmLatgenWrapper
from pykaldi.utils import load_wav
from IPython.display import display # display multiple SVG in one cell
import fst # fork of pyfst: https://github.com/UFAL-DSG/pyfst
test_wav = '/ha/projects/vystadial/data/asr/cs/voip/test/all-2012-06-08-13-32-40.800581.recorded-0148.75-0150.09.wav'
test_pcm = load_wav(test_wav)
d = PyGmmLatgenWrapper()
# Settings mainly paths to AM, HCLG.fst, mfcc.conf and other settings
argv = ['--config=/ha/work/people/oplatek/alex-dsg/alex/resources/asr/voip_cs/kaldi/mfcc.conf',
'--verbose=0', '--max-mem=10000000000', '--lat-lm-scale=10', '--beam=12.0',
'--lattice-beam=6.0', '--max-active=5000',
'/ha/work/people/oplatek/alex-dsg/alex/resources/asr/voip_cs/kaldi/tri2b_bmmi.mdl',
'/ha/work/people/oplatek/alex-dsg/alex/applications/PublicTransportInfoCS/hclg/models/HCLG_tri2b_bmmi.fst',
'1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25',
'/ha/work/people/oplatek/alex-dsg/alex/resources/asr/voip_cs/kaldi/tri2b_bmmi.mat']
d.setup(argv)

# Usually we send few frames and run forward decoding afterward
# Here we buffer all the test wav
d.frame_in(test_pcm)
# Decode until nothing is in buffer
decoded, total = d.decode(max_frames=10), 0
while decoded > 0:
total += decoded
decoded = d.decode(max_frames=10)

d.prune_final()
utt_lik, lat = d.get_lattice()

# Printing results
print 'The likelihood of posterior lattice is %f' % utt_lik
print 'Forward decoded grames: %d' % total
# Change integer ids to English words
lat.isyms = lat.osyms = fst.read_symbols_text('/ha/work/people/oplatek/alex-dsg/alex/applications/PublicTransportInfoCS/hclg/models/words.txt')
display(lat)
with open(test_wav+'.trn', 'r') as r:
print 'REFERENCE %s ' % r.read()

The likelihood of posterior lattice is -6694.570312
Forward decoded grames: 128

]>FST00110->1TO331->3NENÍ221->2NENI/57.615211113->11DNEŠNÍM/9.19642e-13993->9DALŠÍ/27.7154883->8DNEŠNÍMU/35.0781773->7SNAŽI/38.2294663->6DNEŠNÍ/50.5146443->4NA/53.29882->11DNEŠNÍM131311->13OBCI/2.20103e-10141411->14ASI/22.2383121211->12A/28.840810109->10MOC/0.01012589->14MOŽNOSTI/4.600989->14MOSTĚ/10.33059->14MAPĚ/15.97899->14MOSTEM/29.94479->14MOKRÝ/30.99368->14TŘI/1.73664e-058->14SI/11.17388->14CHCI/12.8348->14VSI/14.23158->14TŘÍDY/20.28617->10MOC/3.29893e-057->14MOSTĚ/10.31947->14MAPĚ/20.08216->10MOC554->5ČÍM5->13OBCI13->14_NOISE_/51.7646151513->15<eps>10->14I/3.39578e-0510->14JI/10.601610->14JE/12.350610->14CHCI/12.359410->14TŘI/14.844810->14SI/16.08410->14JEDE/20.33610->14TY/20.467810->14JET/20.639710->14JÍ/23.895510->14_NOISE_/25.074310->14ČTYŘI/26.517610->14Í/27.187510->14VSI/27.909214->15<eps>12->14TŘI/0.032286512->14TY/3.4492812->14CHCI/12.9874

REFERENCE TO NENÍ V NAŠÍ MOCI