![](https://i0.wp.com/www.elfsong.cn/wp-content/uploads/2020/04/image.png?resize=584%2C206&ssl=1)
In general neural networks have a sort of loss like that:
![](https://i0.wp.com/www.elfsong.cn/wp-content/uploads/2020/04/math-20200414.png?resize=300%2C99&ssl=1)
However, The part of the denominator integral is intractable of finding an analytic solution solution in practice. Therefore, we are going to make a distribution approaching the original distribution. KL divergence can be used to indicate the difference between these two distributions.
![](https://i0.wp.com/www.elfsong.cn/wp-content/uploads/2020/04/image-4.png?resize=584%2C651&ssl=1)