Auto-Encoding Variational Bayes

Diederik P. Kingma, Max Welling•11/9/2025

Summary of Auto-Encoding Variational Bayes by Diederik P. Kingma, Max Welling

Summary

The paper addresses the challenge of performing efficient inference and learning in directed probabilistic models with continuous latent variables and intractable posterior distributions, especially when dealing with large datasets. The authors propose a stochastic variational inference and learning algorithm that scales well under these conditions. Their contributions include a reparameterization of the variational lower bound to create a lower bound estimator that can be optimized using standard stochastic gradient methods, and an efficient method for posterior inference by fitting an approximate inference model to the intractable posterior.

The proposed method, called Auto-Encoding Variational Bayes (AEVB), leverages the Stochastic Gradient Variational Bayes (SGVB) estimator to optimize a recognition model, allowing for efficient approximate posterior inference. This approach eliminates the need for expensive iterative inference schemes like MCMC for each datapoint. The recognition model, when implemented as a neural network, leads to the development of the variational auto-encoder, which can be used for tasks such as recognition, denoising, and visualization.

The paper introduces a practical estimator for the variational lower bound and its derivatives, applicable to various directed graphical models with continuous latent variables. The authors demonstrate the use of a reparameterization trick to transform the random variable sampling process into a differentiable function of auxiliary noise variables, allowing for efficient gradient estimation. This approach is exemplified using a Gaussian case, where the reparameterization enables straightforward optimization of the variational lower bound.

The authors compare their AEVB method to the wake-sleep algorithm, highlighting AEVB's faster convergence and better optimization of the lower bound. They also discuss the connection between their method and auto-encoders, noting that the SGVB objective includes a regularization term that naturally encourages learning useful representations without additional hyperparameters. The paper also explores the potential for future work, including hierarchical generative architectures, time-series models, and supervised models with latent variables.

The experiments conducted on the MNIST and Frey Face datasets demonstrate the effectiveness of the AEVB algorithm in optimizing the variational lower bound and estimating the marginal likelihood. The results show that AEVB outperforms the wake-sleep algorithm and Monte Carlo EM in terms of convergence speed and likelihood estimation, particularly in scenarios with large datasets.

The paper concludes by emphasizing the broad applicability of the SGVB estimator and AEVB algorithm to various inference and learning problems involving continuous latent variables. The authors suggest several directions for future research, including the development of deep neural networks for encoders and decoders, application to dynamic Bayesian networks, and exploration of supervised models with latent variables.