Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • 正态分布
  • gamma
  • power-law
  • 参考佳文

Was this helpful?

  1. MATH-probability

Maximum entropy probability distribution

PreviousGibbs SamplingNextConjugate prior

Last updated 5 years ago

Was this helpful?

一般形式:

∫−∞∞p(x)dx=1E(fj(x))=∫−∞∞fj(x)p(x)dx=ajH(x)=−∫−∞∞p(x)ln⁡p(x)dxJ(p(x))=−∫−∞∞p(x)ln⁡p(x)dx+λ0(∫−∞∞p(x)dx−1)+∑jλj(∫−∞∞fj(x)p(x)dx−aj)∂J∂p(x)=−ln⁡p(x)−1+λ0+∑jλjfj(x)=0p(x)=e−1+λ0e∑jλjfj(x)\begin{array}{lcl} \int_{-\infty}^\infty p(x) dx &=& 1 \\ E(f_j(x)) &=& \int_{-\infty}^\infty f_j(x) p(x) dx=a_j \\ H(x) &=& - \int_{-\infty}^\infty p(x) \ln p(x) dx \\ J(p(x)) &=& - \int_{-\infty}^\infty p(x) \ln p(x) dx + \lambda_0 (\int_{-\infty}^\infty p(x) dx -1) + \sum_j \lambda_j (\int_{-\infty}^\infty f_j(x) p(x) dx- a_j) \\ \frac {\partial J}{\partial p(x)} &=& -\ln p(x) -1 + \lambda_0 + \sum_j \lambda_j f_j(x) = 0 \\ p(x) &=& e^{-1+\lambda_0 } e^{\sum_j \lambda_j f_j(x)} \end{array}∫−∞∞​p(x)dxE(fj​(x))H(x)J(p(x))∂p(x)∂J​p(x)​======​1∫−∞∞​fj​(x)p(x)dx=aj​−∫−∞∞​p(x)lnp(x)dx−∫−∞∞​p(x)lnp(x)dx+λ0​(∫−∞∞​p(x)dx−1)+∑j​λj​(∫−∞∞​fj​(x)p(x)dx−aj​)−lnp(x)−1+λ0​+∑j​λj​fj​(x)=0e−1+λ0​e∑j​λj​fj​(x)​

看最后一个式子的形式,指数分布族。目前之前只知道,指数分布族在理论上一定有共轭先验。

Γ(x)=∫0+∞e−ttx−1dt(x>0)=(x−1)!这是Gamma函数,不是Gamma分布ψ(x)=ddxln⁡Γ(x)=Γ′(x)Γ(x)digamma functionB(p,q)=Γ(p)Γ(q)Γ(p+q)beta functionγEEuler’s constant\begin{array}{lcl} \Gamma(x) &=& \int_0^{+\infty} e^{-t} t^{x-1} dt \quad (x \gt 0) = (x-1)! \qquad \text{这是Gamma函数,不是Gamma分布} \\ \psi(x) &=& \frac {d}{d x} \ln \Gamma(x) = \frac {\Gamma^\prime (x)} {\Gamma(x)} \qquad \text{digamma function} \\ B(p,q) &=& \frac {\Gamma(p) \Gamma(q)}{\Gamma(p+q)} \qquad \text{beta function} \\ \gamma_E \qquad \text{Euler's constant} \\ \end{array}Γ(x)ψ(x)B(p,q)γE​Euler’s constant​===​∫0+∞​e−ttx−1dt(x>0)=(x−1)!这是Gamma函数,不是Gamma分布dxd​lnΓ(x)=Γ(x)Γ′(x)​digamma functionΓ(p+q)Γ(p)Γ(q)​beta function​

正态分布

gamma

约束条件(即均值和几何平均值)

用乘子法构造拉格朗日函数去最大化熵

或者用变分原理+乘子法

求解

最后得:

power-law

幂率是几何平均约束下的最大熵分布

几何平均值 离散:几何平均值m的含义是N个个体的变量x的连续乘积再开N次方

连续:

构造拉格朗日函数

参考佳文

有人思考过对一阶矩和二阶矩的限制是自然的吗?

同样过过程,只不过是用了变分法。

最后一部分

∫−∞∞p(x)dx=1E(x)=kθE(ln⁡(x))=ψ(k)+ln⁡(θ)\int_{-\infty}^\infty p(x) dx = 1 \\ E(x) = k \theta \\ E(\ln(x)) = \psi(k) + \ln(\theta) \\∫−∞∞​p(x)dx=1E(x)=kθE(ln(x))=ψ(k)+ln(θ)
J(p(x))=−∫−∞∞p(x)ln⁡p(x)dx+λ0(∫−∞∞p(x)dx−1)+λ1(∫−∞∞p(x)xdx−k)+λ2(∫−∞∞p(x)ln⁡(x)dx−ψ(k)−ln⁡(θ))∂J∂p(x)=−ln⁡p(x)−1+λ0+λ1x+λ2ln⁡(x)=0p(x)=exp⁡(λ0−1+λ1x+λ2ln⁡(x))=eλ0−1xλ2eλ1x\begin{align} J(p(x)) = & - \int_{-\infty}^\infty p(x) \ln p(x) dx \\ & + \lambda_0 (\int_{-\infty}^\infty p(x) dx - 1) \\ & + \lambda_1 (\int_{-\infty}^\infty p(x)x dx - k) \\ & + \lambda_2 (\int_{-\infty}^\infty p(x) \ln(x) dx- \psi(k) - \ln(\theta)) \\ \end{align} \\ \frac {\partial J}{\partial p(x)} = -\ln p(x) -1 + \lambda_0 + \lambda_1 x + \lambda_2 \ln(x) = 0 \\ p(x) = \exp(\lambda_0 -1 + \lambda_1 x + \lambda_2 \ln(x)) = e^{\lambda_0 -1} x ^ {\lambda_2} e^{\lambda_1 x} \\J(p(x))=​−∫−∞∞​p(x)lnp(x)dx+λ0​(∫−∞∞​p(x)dx−1)+λ1​(∫−∞∞​p(x)xdx−k)+λ2​(∫−∞∞​p(x)ln(x)dx−ψ(k)−ln(θ))​​∂p(x)∂J​=−lnp(x)−1+λ0​+λ1​x+λ2​ln(x)=0p(x)=exp(λ0​−1+λ1​x+λ2​ln(x))=eλ0​−1xλ2​eλ1​x
δ(H(p(x)))+δλ0(∫−∞∞p(x)dx)+δλ1(∫−∞∞p(x)xdx)+δλ2(∫−∞∞p(x)ln⁡(x)dx)=0p(x)=exp⁡(λ0−1+λ1x+λ2ln⁡(x))=eλ0−1xλ2eλ1x\delta (H(p(x))) + \delta \lambda_0 (\int_{-\infty}^\infty p(x) dx ) + \delta \lambda_1 (\int_{-\infty}^\infty p(x)x dx ) + \delta \lambda_2 (\int_{-\infty}^\infty p(x) \ln(x) dx ) = 0 \\ p(x) = \exp(\lambda_0 -1 + \lambda_1 x + \lambda_2 \ln(x)) = e^{\lambda_0 -1} x ^ {\lambda_2} e^{\lambda_1 x} \\δ(H(p(x)))+δλ0​(∫−∞∞​p(x)dx)+δλ1​(∫−∞∞​p(x)xdx)+δλ2​(∫−∞∞​p(x)ln(x)dx)=0p(x)=exp(λ0​−1+λ1​x+λ2​ln(x))=eλ0​−1xλ2​eλ1​x
∫0∞xne−axdx=n!an+1∫−∞∞p(x)dx=1⇒eλ0−1=(λ0−1)λ2+1Γ(λ2+1)\int_0^\infty x^n e^{-ax} dx = \frac {n!}{a^{n+1}} \\ \int_{-\infty}^\infty p(x) dx = 1 \\ \Rightarrow e^{\lambda_0 -1} = \frac {(\lambda_0 -1)^{\lambda_2+1}}{\Gamma(\lambda_2+1)} \\∫0∞​xne−axdx=an+1n!​∫−∞∞​p(x)dx=1⇒eλ0​−1=Γ(λ2​+1)(λ0​−1)λ2​+1​
p(x)=1θkΓ(k)xk−1e−xθ实在是凑不出来θp(x) = \frac {1}{\theta^k \Gamma(k)} x^{k-1} e^{- \frac {x}{\theta}} \\ \text{实在是凑不出来}\thetap(x)=θkΓ(k)1​xk−1e−θx​实在是凑不出来θ
1=∫xmin∞p(x)dx=∫xmin∞x−adx=C1−a[x−a+1]xmin∞a>1C=(a−1)xxmin∞p(x)=a−1xmin(xxmin)−a1 = \int_{x_{min}}^\infty p(x) dx = \int_{x_{min}}^\infty x^{-a} dx = \frac {C}{1-a} [x^{-a+1}]_{x_{min}}^\infty \qquad a \gt 1 \\ C = (a-1) x_{x_{min}}^\infty \\ p(x) = \frac {a-1}{x_{min}} (\frac {x}{x_{min}})^{-a} \\1=∫xmin​∞​p(x)dx=∫xmin​∞​x−adx=1−aC​[x−a+1]xmin​∞​a>1C=(a−1)xxmin​∞​p(x)=xmin​a−1​(xmin​x​)−a
mn=x1n1⋅x2n2⋅…⋅xnnnNln⁡m=n1ln⁡x1+n2ln⁡x2+…+nnln⁡xn=∑njln⁡xjm^n = x_1^{n_1} \cdot x_2^{n_2} \cdot \ldots \cdot x_n^{n_n}\\ N \ln m = n_1 \ln x_1 + n_2 \ln x_2 + \ldots + n_n \ln x_n = \sum n_j \ln x_jmn=x1n1​​⋅x2n2​​⋅…⋅xnnn​​Nlnm=n1​lnx1​+n2​lnx2​+…+nn​lnxn​=∑nj​lnxj​
ln⁡m=∫abp(x)ln⁡xdx\ln m = \int_a^b p(x) \ln x dxlnm=∫ab​p(x)lnxdx
F=−∫p(x)ln⁡p(x)dx+λ0(∫p(x)dx−1)+λ1(∫p(x)ln⁡xdx−ln⁡m)∂J∂p(x)=−ln⁡p(x)−1+λ0+λ1ln⁡(x)=0p(x)=e−1+λ0xλ1F = -\int p(x) \ln p(x) d x + \lambda_0 (\int p(x) dx - 1) + \lambda_1 (\int p(x) \ln x dx - \ln m) \\ \frac {\partial J}{\partial p(x)} = -\ln p(x) -1 + \lambda_0 + \lambda_1 \ln(x) = 0 \\ p(x) = e^{-1 + \lambda_0} x^{\lambda_1}F=−∫p(x)lnp(x)dx+λ0​(∫p(x)dx−1)+λ1​(∫p(x)lnxdx−lnm)∂p(x)∂J​=−lnp(x)−1+λ0​+λ1​ln(x)=0p(x)=e−1+λ0​xλ1​

张学文的《组成论》 190页

给定前k阶矩的最大熵分布是什么?
为什么正态分布在自然界如此常见?
Central limit theorem
是否许多变量可以用正态分布很好地描述?
变分法简介
Power law
人类行为服从的幂律分布是否违背了中心极限定理?
人类行为时空特性的统计力学(一)——认识幂律分布
Maximum entropy probability distribution
Exponential family