Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • 欧拉-拉格朗日方程
  • 欧拉-拉格朗日方程两种形式
  • 变分EM
  • 参考佳文

Was this helpful?

  1. MATH-碎碎念

variational inference

PreviousEntropyNextbasis function

Last updated 5 years ago

Was this helpful?

传统的MCMC去近似,有lib,较容易。但是用VI,每个问题都得推导。但是现在出现 自动变分推断算法,可以直接用lib,比如PyMC3。

欧拉-拉格朗日方程

找一个f(x)f(x)f(x),在[a,b][a,b][a,b]之间使得J=∫abF(x,f(x),f′(x))dxJ = \int_a^b F(x,f(x),f^\prime (x)) dxJ=∫ab​F(x,f(x),f′(x))dx积分达到最大值。 假设g(x)=f(x)+εη(x)g(x) = f(x) + \varepsilon \eta(x)g(x)=f(x)+εη(x)。ε\varepsilonε是个很小的正数,这就相当于g(x)g(x)g(x)为最佳函数f(x)f(x)f(x)增加了一个很小的扰动。当使用g(x)g(x)g(x)求得极值时候,g(x)=f(x)g(x) = f(x)g(x)=f(x)。 将 g(x)g(x)g(x)带入原方程,并求导。

J=∫abF(x,g(x),g′(x))dx∂J∂ε=∂x∂ε∂F∂x+∂g∂ε∂F∂g+∂g′∂ε∂F∂g′=η(x)∂F∂g+η′(x)∂F∂g′当ε=0时,函数能取到极值。且此时:g(x)=f(x),g′(x)=f′(x)∂J∂ε∣ε=0=∫ab[η(x)∂F∂f+η′(x)∂F∂f′]dx=0∫abη′(x)∂F∂f′dx=∫ab∂F∂f′dη(x)=[∂F∂f′η(x)]ab−∫abη(x)d∂F∂f′=−∫abη(x)d∂F∂f′∫abη(x)[∂F∂f+∂∂x∂F∂f′]dx=0∂F∂f+∂∂x∂F∂f′=0J = \int_a^b F(x,g(x),g^\prime (x)) dx \\ \frac {\partial J}{\partial \varepsilon} = \frac {\partial x}{\partial \varepsilon} \frac {\partial F}{\partial x} + \frac {\partial g}{\partial \varepsilon} \frac {\partial F}{\partial g} + \frac {\partial g^\prime}{\partial \varepsilon} \frac {\partial F}{\partial g^\prime} = \eta(x) \frac {\partial F}{\partial g} + \eta^\prime (x) \frac {\partial F}{\partial g^\prime} \\ \text{当}\varepsilon=0 \text{时,函数能取到极值。且此时:} g(x) = f(x), g^\prime(x) = f^\prime(x)\\ \frac {\partial J}{\partial \varepsilon} \mid_{\varepsilon=0} = \int_a^b [\eta(x) \frac {\partial F}{\partial f} + \eta^\prime (x) \frac {\partial F}{\partial f^\prime}] dx = 0 \\ \int_a^b \eta^\prime (x) \frac {\partial F}{\partial f^\prime} dx = \int_a^b \frac {\partial F}{\partial f^\prime} d \eta (x) = [\frac {\partial F}{\partial f^\prime} \eta (x)]_a^b - \int_a^b \eta (x) d \frac {\partial F}{\partial f^\prime} = - \int_a^b \eta (x) d \frac {\partial F}{\partial f^\prime} \\ \int_a^b \eta(x) [\frac {\partial F}{\partial f} + \frac {\partial }{\partial x} \frac {\partial F}{\partial f^\prime}] dx = 0 \\ \frac {\partial F}{\partial f} + \frac {\partial }{\partial x} \frac {\partial F}{\partial f^\prime} = 0 \\J=∫ab​F(x,g(x),g′(x))dx∂ε∂J​=∂ε∂x​∂x∂F​+∂ε∂g​∂g∂F​+∂ε∂g′​∂g′∂F​=η(x)∂g∂F​+η′(x)∂g′∂F​当ε=0时,函数能取到极值。且此时:g(x)=f(x),g′(x)=f′(x)∂ε∂J​∣ε=0​=∫ab​[η(x)∂f∂F​+η′(x)∂f′∂F​]dx=0∫ab​η′(x)∂f′∂F​dx=∫ab​∂f′∂F​dη(x)=[∂f′∂F​η(x)]ab​−∫ab​η(x)d∂f′∂F​=−∫ab​η(x)d∂f′∂F​∫ab​η(x)[∂f∂F​+∂x∂​∂f′∂F​]dx=0∂f∂F​+∂x∂​∂f′∂F​=0

引入“δ\deltaδ算子”来描述上述过程。定义δ[y(x)]=y~−y\delta [y(x)] = \tilde{y}-yδ[y(x)]=y~​−y。 在本例中: δy=y~−y=aη,δy′=y′~−y′=aη′\delta y = \tilde{y}-y = a \eta, \delta y^\prime = \tilde{y^\prime}-y^\prime = a \eta^\primeδy=y~​−y=aη,δy′=y′~​−y′=aη′

欧拉-拉格朗日方程两种形式

变分EM

参考佳文

数学变分法
变分法
徐亦达的机器学习视频
贝叶斯深度学习——基于PyMC3的变分推理
变分贝叶斯
变分原理正文
变分原理的直接方法
变分方法