返回博客列表

Double Softmax

2022年6月3日
1 min read
编程数学统计学

Intuitively double Softmax could produce a sharper distribution.

softmax(softmax(X))softmax(softmax(X))

But it's not true if putting a logarithm in between:

P=[p1,p2,...pn]=softmax(X)P = [p_1, p_2, ... p_n] = softmax(X)

Plog=logsoftmax(X)=logPP_{log} = \log softmax(X) = \log P

softmax(Plog)i=elogpik=1nelogpksoftmax (P_{log})_i = \frac{e^{\log p_i}}{\sum_{k=1}^n e^{\log p_k}}

=pik=1npk= \frac{p_i}{\sum_{k=1}^n p_k}

=pi= p_i

softmax(logsoftmax(X))=softmax(Plog)=P=softmax(X)softmax(\log softmax(X)) = softmax(P_{log}) = P = softmax(X)

Therefore: softmax(log(softmax(X))) == softmax(X)

评论讨论

使用 GitHub 账号登录参与讨论

加载评论中...