返回博客列表

Model Structures

2022年6月3日
2 min read
随笔

CART models

CART model is used in the last step of GMM-pipeline. It has stable behavior, but the generation of CART depends on a given alignment of GMM-system.

CART model trained with GMM-alignment

BLSTM encoder consists of 6 BLSTM layers with the size of 512, dropout rate 0.1, "L2" normalization. It is applied to all the models in this chapter. The output of BLSTM encoder is directly aligned to The GMM-alignment.

CART-Viterbi.png

CART model trained with soft alignment

The BLSTM Encoder keeps the same. "Joint Output" is a copy of encoder output, which stands for the joint probability of all states p(sx)p(s|x). "Accumulate Proir" represents a layer that accumulates the probability of "Joint Outuput", p(s)=sumlimitsxp(sx)p(s)=\\sum\\limits_x p(s|x). The prior probability is saved in a variable VV, which updates during training with the factor alpha\\alpha: V=(1alpha)cdotV+alphacdotp(s)V=(1-\\alpha)\\cdot V + \\alpha \\cdot p(s).

CART-BW.png

Factored diphone models

The following models substitute the state-tying CART with factored layers. Encoder output are forwarded to two independent MLPs, which represent respectively the probability of left context p(leftx)p(left|x) and probability of center and state given different left context p(centerleft,x)p(center|left, x).

Factored diphone joint output BW

To get the joint output the two layers Left-Context Output and Center-State Output are combined, which corresponds to the formulation p(s)=p(center,leftx)=p(leftx)cdotp(centerleft,x)p(s)=p(center, left| x) = p(left|x)\\cdot p(center|left,x).

Factored-joint-BW

Factored diphone joint output Viterbi

It has the same factored part as the model above, aligned with GMM-alignment.

Factored-joint-Viterbi.png

Factored diphone model with separate outputs and BW-alignment

In contrast to the model with joint output and BW-alignment, this model is finer that the Left Output and Center Output have their own prior, which can also be scaled independently.

Factored-separate-BW.png

Factored diphone separate output Viterbi

In this model there is not a joint probability. The GMM-alignment is factored into Left Alignment and Center Alignment, aligning with Left-Context Output and Center-State Output respectively.

Factored-separate-Viterbi.png

评论讨论

使用 GitHub 账号登录参与讨论

加载评论中...