Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page

Was this helpful?

  1. Maximum Entropy

IIS

一般介绍最大熵的文章都会IIS。凸优化中迭代方法有很多,目前收敛速度比较快的是拟牛顿法,常用L-BFGS,没必要用IIS,但是IIS的思想可以学习学习。

Lp~(Pw)=log⁡∏x,yP(y∣x)P~(x,y)=∑x,yP~(x,y)log⁡P(y∣x)=∑x,yP~(x,y)log⁡(exp⁡(∑i=1nwifi(x,y))/Zw(x))=∑x,yP~(x,y)∑i=1nwifi(x,y)−∑xP~(x)log⁡Zw(x)\begin{align} L_{\widetilde p}(P_w) &= \log \prod_{x,y} P(y|x)^{\widetilde P(x,y)} \\ &= \sum_{x,y} \widetilde P(x,y) \log P(y|x) \\ &= \sum_{x,y} \widetilde P(x,y) \log (\exp(\sum_{i=1}^n w_i f_i(x,y)) / Z_w(x)) \\ &= \sum_{x,y} \widetilde P(x,y) \sum_{i=1}^n w_i f_i(x,y) - \sum_x \widetilde P(x) \log Z_w(x) \\ \end{align}Lp​​(Pw​)​=logx,y∏​P(y∣x)P(x,y)=x,y∑​P(x,y)logP(y∣x)=x,y∑​P(x,y)log(exp(i=1∑n​wi​fi​(x,y))/Zw​(x))=x,y∑​P(x,y)i=1∑n​wi​fi​(x,y)−x∑​P(x)logZw​(x)​​
L(w+δ)−L(w)=∑x,yP~(x,y)∑i=1nδifi(x,y)−∑xP~(x)log⁡(Zw+δ(x)/Zw(x))≥∑x,yP~(x,y)∑i=1nδifi(x,y)+∑xP~(x)(1−Zw+δ(x)/Zw(x))∵(−logx≥1−x)=∑x,yP~(x,y)∑i=1nδifi(x,y)+1−∑xP~(x)(Zw+δ(x)/Zw(x))=∑x,yP~(x,y)∑i=1nδifi(x,y)+1−∑xP~(x)∑yPw(y∣x)exp⁡(∑i=1nδifi(x,y))\begin{align} L(w+\delta) - L(w) &= \sum_{x,y} \widetilde P(x,y) \sum_{i=1}^n \delta_i f_i(x,y) - \sum_x \widetilde P(x) \log (Z_{w+\delta}(x)/Z_w(x)) \\ & \ge \sum_{x,y} \widetilde P(x,y) \sum_{i=1}^n \delta_i f_i(x,y) + \sum_x \widetilde P(x) (1- Z_{w+\delta}(x)/Z_w(x)) \qquad \because (-log x \ge 1-x )\\ &= \sum_{x,y} \widetilde P(x,y) \sum_{i=1}^n \delta_i f_i(x,y) + 1 - \sum_x \widetilde P(x) (Z_{w+\delta}(x)/Z_w(x)) \\ &= \sum_{x,y} \widetilde P(x,y) \sum_{i=1}^n \delta_i f_i(x,y) + 1 - \sum_x \widetilde P(x) \sum_y P_w(y|x) \exp(\sum_{i=1}^n \delta_i f_i(x,y)) \\ \end{align}L(w+δ)−L(w)​=x,y∑​P(x,y)i=1∑n​δi​fi​(x,y)−x∑​P(x)log(Zw+δ​(x)/Zw​(x))≥x,y∑​P(x,y)i=1∑n​δi​fi​(x,y)+x∑​P(x)(1−Zw+δ​(x)/Zw​(x))∵(−logx≥1−x)=x,y∑​P(x,y)i=1∑n​δi​fi​(x,y)+1−x∑​P(x)(Zw+δ​(x)/Zw​(x))=x,y∑​P(x,y)i=1∑n​δi​fi​(x,y)+1−x∑​P(x)y∑​Pw​(y∣x)exp(i=1∑n​δi​fi​(x,y))​​
令f#(x,y)=∑i=1nfi(x,y)exp⁡(∑i=1nδifi(x,y))=exp⁡(∑i=1nfi(x,y)f#(x,y)δif#(x,y))≤∑i=1nfi(x,y)f#(x,y)exp⁡(δif#(x,y))\text{令} f^{\#} (x,y) = \sum_{i=1}^n f_i(x,y) \\ \exp(\sum_{i=1}^n \delta_i f_i(x,y)) = \exp(\sum_{i=1}^n \frac {f_i(x,y)}{f^{\#} (x,y)} \delta_i f^{\#} (x,y)) \le \sum_{i=1}^n \frac {f_i(x,y)}{f^{\#} (x,y)} \exp(\delta_i f^{\#} (x,y))令f#(x,y)=i=1∑n​fi​(x,y)exp(i=1∑n​δi​fi​(x,y))=exp(i=1∑n​f#(x,y)fi​(x,y)​δi​f#(x,y))≤i=1∑n​f#(x,y)fi​(x,y)​exp(δi​f#(x,y))
L(w+δ)−L(w)≥∑x,yP~(x,y)∑i=1nδifi(x,y)+1−∑xP~(x)∑yPw(y∣x)∑i=1nfi(x,y)f#(x,y)exp⁡(δif#(x,y))L(w+\delta) - L(w) \ge \sum_{x,y} \widetilde P(x,y) \sum_{i=1}^n \delta_i f_i(x,y) + 1 - \sum_x \widetilde P(x) \sum_y P_w(y|x) \sum_{i=1}^n \frac {f_i(x,y)}{f^{\#} (x,y)} \exp(\delta_i f^{\#} (x,y))L(w+δ)−L(w)≥x,y∑​P(x,y)i=1∑n​δi​fi​(x,y)+1−x∑​P(x)y∑​Pw​(y∣x)i=1∑n​f#(x,y)fi​(x,y)​exp(δi​f#(x,y))
PreviousMaximum EntropyNextHMM

Last updated 5 years ago

Was this helpful?