YESNO单音素模型解码与训练
1.训练与测试
首先进入其目录,先进行训练
cd /data/zsf/Zsf_WorkSpace/Kaldi_WorkSpace/kaldi/egs/yesno/s5
./run.sh
从日志输出
fsttablecompose exp/mono0a/graph_tgpr/Ha.fst data/lang_test_tg/tmp/CLG_1_0.fst
fstrmsymbols exp/mono0a/graph_tgpr/disambig_tid.int
fstminimizeencoded
fstdeterminizestar --use-log=true
fstrmepslocal
fstisstochastic exp/mono0a/graph_tgpr/HCLGa.fst
0.5342 -0.000422432
HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true exp/mono0a/final.mdl exp/mono0a/graph_tgpr/HCLGa.fst
steps/decode.sh --nj 1 --cmd utils/run.pl exp/mono0a/graph_tgpr data/test_yesno exp/mono0a/decode_test_yesno
decode.sh: feature type is delta
steps/diagnostic/analyze_lats.sh --cmd utils/run.pl exp/mono0a/graph_tgpr exp/mono0a/decode_test_yesno
steps/diagnostic/analyze_lats.sh: see stats in exp/mono0a/decode_test_yesno/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,1,2) and mean=1.2
steps/diagnostic/analyze_lats.sh: see stats in exp/mono0a/decode_test_yesno/log/analyze_lattice_depth_stats.log
local/score.sh --cmd utils/run.pl data/test_yesno exp/mono0a/graph_tgpr exp/mono0a/decode_test_yesno
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] exp/mono0a/decode_test_yesno/wer_10_0.0
其运行没有问题,前提是按前面的BLOG,务必安装好KALDI,
接下将训练好的模型放到板子上,测试一下训练生成的模型
cd exp
scp -r mono0a/ root@192.168.1.158:/root/yesorno //训练好的模型拷贝到板子
目录结构如下,太长,只取一部分
root@NanoPi-M1:~/yesorno# tree
.
└── mono0a
├── 0.mdl
├── 40.mdl
├── 40.occs
├── ali.1.gz
├── cmvn_opts
├── decode_test_yesno
│ ├── lat.1.gz
│ ├── log
│ │ ├── analyze_alignments.log
│ │ ├── analyze_lattice_depth_stats.log
│ │ ├── decode.1.log
。。。。。
接下来,尝试录一个段yes与no的录音,然后再将进放到板子上
base) zsf@BSP-D01-srv:~$ scp yesorno.wav root@192.168.1.158:/root
The authenticity of host '192.168.1.158 (192.168.1.158)' can't be established.
ECDSA key fingerprint is a2:88:9a:23:d0:bf:f0:f9:3e:af:77:6d:02:86:7b:3a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.158' (ECDSA) to the list of known hosts.
root@192.168.1.158's password:
yesorno.wav 100% 252KB 252.4KB/s 00:00
(base) zsf@BSP-D01-srv:~$
2.上板测试
先来看下online-wav-gmm-decode-faster解码器用的法
root@NanoPi-M1:~# ./online-wav-gmm-decode-faster
./online-wav-gmm-decode-faster
Reads in wav file(s) and simulates online decoding.
Writes integerized-text and .ali files for WER computation. Utterance segmentation is done on-the-fly.
Feature splicing/LDA transform is used, if the optional(last) argument is given.
Otherwise delta/delta-delta(i.e. 2-nd order) features are produced.
Caution: the last few frames of the wav file may not be decoded properly.
Hence, don't use one wav file per utterance, but rather use one wav file per show.
Usage: online-wav-gmm-decode-faster [options] wav-rspecifier model-infst-in word-symbol-table silence-phones transcript-wspecifier alignments-wspecifier [lda-matrix-in]
Example: ./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:wav.scp model HCLG.fst words.txt '1:2:3:4:5' ark,t:trans.txt ark,t:ali.txt
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 0.1)
--batch-size : Number of feature vectors processed w/o interruption (int, default = 27)
--beam : Decoding beam. Larger->slower, more accurate. (float, default = 16)
--beam-delta : Increment used in decoder [obscure setting] (float, default = 0.5)
--beam-update : Beam update rate (float, default = 0.01)
--channel : Channel to extract (-1 -> expect mono, 0 -> left, 1 -> right) (int, default = -1)
--cmn-window : Number of feat. vectors used in the running average CMN calculation (int, default = 600)
--hash-ratio : Setting used in decoder to control hash behavior (float, default = 2)
--inter-utt-sil : Maximum # of silence frames to trigger new utterance (int, default = 50)
--left-context : Number of frames of left context (int, default = 4)
--max-active : Decoder max active states. Larger->slower; more accurate (int, default = 2147483647)
--max-beam-update : Max beam update rate (float, default = 0.05)
--max-utt-length : If the utterance becomes longer than this number of frames, shorter silence is acceptable as an utterance separator (int, default = 1500)
--min-active : Decoder min active states (don't prune if #active less than this). (int, default = 20)
--min-cmn-window : Minumum CMN window used at start of decoding (adds latency only at start) (int, default = 100)
--num-tries : Number of successive repetitions of timeout before we terminate stream (int, default = 5)
--right-context : Number of frames of right context (int, default = 4)
--rt-max : Approximate maximum decoding run time factor (float, default = 0.75)
--rt-min : Approximate minimum decoding run time factor (float, default = 0.7)
--update-interval : Beam update interval in frames (int, default = 3)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
接下来我们来准备数据,直接在数据集中取测试数据,在板子上测试,数据在s5的目录
/data/zsf/Zsf_WorkSpace/Kaldi_WorkSpace/kaldi/egs/yesno/s5/waves_yesno
然后为了方便测试,直接拷贝到板子上
/yesno/s5$ scp -r waves_yesno root@192.168.1.158:/root
root@192.168.1.158's password:
1_0_1_0_1_0_0_1.wav 100% 77KB 4.3MB/s 00:00
README 100% 833 447.3KB/s 00:00
0_1_1_1_1_0_1_0.wav 100% 94KB 4.4MB/s 00:00
0_1_0_0_1_0_1_0.wav
。。。。。。。。。。
然后再取SCP,也就是WAV文件的路径与文件名对应的一个文件,通过这个文件 ,解码器可以解码多个WAV文件,我们随便解析5个,主要是为了测试解码器
1_0_0_0_0_0_0_0 /root/waves_yesno/1_0_0_0_0_0_0_0.wav
1_0_0_0_0_0_0_1 /rootwaves_yesno/1_0_0_0_0_0_0_1.wav
1_0_0_0_0_0_1_1 /root/waves_yesno/1_0_0_0_0_0_1_1.wav
1_0_0_0_1_0_0_1 /root/waves_yesno/1_0_0_0_1_0_0_1.wav
1_0_0_1_0_1_1_1 /root/waves_yesno/1_0_0_1_0_1_1_1.wav
~
然后再分别从目录下取出
final.mdl HCLG.fst words.txt
它们分别对应model, HCLG.fst words.txt,其在s5/exp/mono0a下面,具体的路径就不贴了
接下就是解码了
./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:test_yesno_wav.scp final.mdl HCLG.fst words.txt '1:2:3:4: 5' ark,t:trans.txt ark,t:ali.txt
./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:test_yesno_wav.scp final.mdl HCLG.fst words.txt 1:2:3:4:5 ark,t:trans.txt ark,t:ali.txt
File: 1_0_0_0_0_0_0_0
ERROR (online-wav-gmm-decode-faster[5.5.120~486-da328]:main():online-wav-gmm-decode-faster.cc:148) Sampling rates other than 16kHz are not supported!
[ Stack-Trace: ]
./online-wav-gmm-decode-faster(kaldi::MessageLogger::LogMessage() const+0x7dc) [0x1c2cd4]
./online-wav-gmm-decode-faster(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x14) [0xbf950]
./online-wav-gmm-decode-faster(main+0xa68) [0xbcad0]
/lib/arm-linux-gnueabihf/libc.so.6(__libc_start_main+0x9d) [0xb6bc28aa]
结果发现原文件 采样率不是16K的,我们来改造WAV文件 ,把它从8K采样的文件 转成16K采样的文件 ,这个转换随便打个工具就可以,我直接用的audacity转的。
./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:test_yesno_wav.scp final.mdl HCLG.fst words.txt '1:2:3:4: 5' ark,t:trans.txt ark,t:ali.txt
./online-wav-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:test_yesno_wav.scp final.mdl HCLG.fst words.txt 1:2:3:4:5 ark,t:trans.txt ark,t:ali.txt
File: 1_0_0_0_0_0_0_0
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
File: 1_0_0_0_0_0_0_1
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
File: 1_0_0_0_0_0_1_1
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
File: 1_0_0_0_1_0_0_1
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
File: 1_0_0_1_0_1_1_1
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
YES
root@NanoPi-M1:~#
打赏作者
如何训练中文模型
跟KALDI中的链式训练