Maximum entropy probability distribution

一般形式:

p(x)dx=1E(fj(x))=fj(x)p(x)dx=ajH(x)=p(x)lnp(x)dxJ(p(x))=p(x)lnp(x)dx+λ0(p(x)dx1)+jλj(fj(x)p(x)dxaj)Jp(x)=lnp(x)1+λ0+jλjfj(x)=0p(x)=e1+λ0ejλjfj(x)\begin{array}{lcl} \int_{-\infty}^\infty p(x) dx &=& 1 \\ E(f_j(x)) &=& \int_{-\infty}^\infty f_j(x) p(x) dx=a_j \\ H(x) &=& - \int_{-\infty}^\infty p(x) \ln p(x) dx \\ J(p(x)) &=& - \int_{-\infty}^\infty p(x) \ln p(x) dx + \lambda_0 (\int_{-\infty}^\infty p(x) dx -1) + \sum_j \lambda_j (\int_{-\infty}^\infty f_j(x) p(x) dx- a_j) \\ \frac {\partial J}{\partial p(x)} &=& -\ln p(x) -1 + \lambda_0 + \sum_j \lambda_j f_j(x) = 0 \\ p(x) &=& e^{-1+\lambda_0 } e^{\sum_j \lambda_j f_j(x)} \end{array}

看最后一个式子的形式,指数分布族。目前之前只知道,指数分布族在理论上一定有共轭先验。

Γ(x)=0+ettx1dt(x>0)=(x1)!这是Gamma函数,不是Gamma分布ψ(x)=ddxlnΓ(x)=Γ(x)Γ(x)digamma functionB(p,q)=Γ(p)Γ(q)Γ(p+q)beta functionγEEuler’s constant\begin{array}{lcl} \Gamma(x) &=& \int_0^{+\infty} e^{-t} t^{x-1} dt \quad (x \gt 0) = (x-1)! \qquad \text{这是Gamma函数,不是Gamma分布} \\ \psi(x) &=& \frac {d}{d x} \ln \Gamma(x) = \frac {\Gamma^\prime (x)} {\Gamma(x)} \qquad \text{digamma function} \\ B(p,q) &=& \frac {\Gamma(p) \Gamma(q)}{\Gamma(p+q)} \qquad \text{beta function} \\ \gamma_E \qquad \text{Euler's constant} \\ \end{array}

给定前k阶矩的最大熵分布是什么?

有人思考过对一阶矩和二阶矩的限制是自然的吗? 为什么正态分布在自然界如此常见? Central limit theorem

正态分布

是否许多变量可以用正态分布很好地描述? 同样过过程,只不过是用了变分法。

变分法简介 最后一部分

gamma

约束条件(即均值和几何平均值)

p(x)dx=1E(x)=kθE(ln(x))=ψ(k)+ln(θ)\int_{-\infty}^\infty p(x) dx = 1 \\ E(x) = k \theta \\ E(\ln(x)) = \psi(k) + \ln(\theta) \\

用乘子法构造拉格朗日函数去最大化熵

J(p(x))=p(x)lnp(x)dx+λ0(p(x)dx1)+λ1(p(x)xdxk)+λ2(p(x)ln(x)dxψ(k)ln(θ))Jp(x)=lnp(x)1+λ0+λ1x+λ2ln(x)=0p(x)=exp(λ01+λ1x+λ2ln(x))=eλ01xλ2eλ1x\begin{align} J(p(x)) = & - \int_{-\infty}^\infty p(x) \ln p(x) dx \\ & + \lambda_0 (\int_{-\infty}^\infty p(x) dx - 1) \\ & + \lambda_1 (\int_{-\infty}^\infty p(x)x dx - k) \\ & + \lambda_2 (\int_{-\infty}^\infty p(x) \ln(x) dx- \psi(k) - \ln(\theta)) \\ \end{align} \\ \frac {\partial J}{\partial p(x)} = -\ln p(x) -1 + \lambda_0 + \lambda_1 x + \lambda_2 \ln(x) = 0 \\ p(x) = \exp(\lambda_0 -1 + \lambda_1 x + \lambda_2 \ln(x)) = e^{\lambda_0 -1} x ^ {\lambda_2} e^{\lambda_1 x} \\

或者用变分原理+乘子法

δ(H(p(x)))+δλ0(p(x)dx)+δλ1(p(x)xdx)+δλ2(p(x)ln(x)dx)=0p(x)=exp(λ01+λ1x+λ2ln(x))=eλ01xλ2eλ1x\delta (H(p(x))) + \delta \lambda_0 (\int_{-\infty}^\infty p(x) dx ) + \delta \lambda_1 (\int_{-\infty}^\infty p(x)x dx ) + \delta \lambda_2 (\int_{-\infty}^\infty p(x) \ln(x) dx ) = 0 \\ p(x) = \exp(\lambda_0 -1 + \lambda_1 x + \lambda_2 \ln(x)) = e^{\lambda_0 -1} x ^ {\lambda_2} e^{\lambda_1 x} \\

求解

0xneaxdx=n!an+1p(x)dx=1eλ01=(λ01)λ2+1Γ(λ2+1)\int_0^\infty x^n e^{-ax} dx = \frac {n!}{a^{n+1}} \\ \int_{-\infty}^\infty p(x) dx = 1 \\ \Rightarrow e^{\lambda_0 -1} = \frac {(\lambda_0 -1)^{\lambda_2+1}}{\Gamma(\lambda_2+1)} \\

最后得:

p(x)=1θkΓ(k)xk1exθ实在是凑不出来θp(x) = \frac {1}{\theta^k \Gamma(k)} x^{k-1} e^{- \frac {x}{\theta}} \\ \text{实在是凑不出来}\theta

power-law

幂率是几何平均约束下的最大熵分布

1=xminp(x)dx=xminxadx=C1a[xa+1]xmina>1C=(a1)xxminp(x)=a1xmin(xxmin)a1 = \int_{x_{min}}^\infty p(x) dx = \int_{x_{min}}^\infty x^{-a} dx = \frac {C}{1-a} [x^{-a+1}]_{x_{min}}^\infty \qquad a \gt 1 \\ C = (a-1) x_{x_{min}}^\infty \\ p(x) = \frac {a-1}{x_{min}} (\frac {x}{x_{min}})^{-a} \\

几何平均值 离散:几何平均值m的含义是N个个体的变量x的连续乘积再开N次方

mn=x1n1x2n2xnnnNlnm=n1lnx1+n2lnx2++nnlnxn=njlnxjm^n = x_1^{n_1} \cdot x_2^{n_2} \cdot \ldots \cdot x_n^{n_n}\\ N \ln m = n_1 \ln x_1 + n_2 \ln x_2 + \ldots + n_n \ln x_n = \sum n_j \ln x_j

连续:

lnm=abp(x)lnxdx\ln m = \int_a^b p(x) \ln x dx

构造拉格朗日函数

F=p(x)lnp(x)dx+λ0(p(x)dx1)+λ1(p(x)lnxdxlnm)Jp(x)=lnp(x)1+λ0+λ1ln(x)=0p(x)=e1+λ0xλ1F = -\int p(x) \ln p(x) d x + \lambda_0 (\int p(x) dx - 1) + \lambda_1 (\int p(x) \ln x dx - \ln m) \\ \frac {\partial J}{\partial p(x)} = -\ln p(x) -1 + \lambda_0 + \lambda_1 \ln(x) = 0 \\ p(x) = e^{-1 + \lambda_0} x^{\lambda_1}

Power law 人类行为服从的幂律分布是否违背了中心极限定理? 人类行为时空特性的统计力学(一)——认识幂律分布

参考佳文

Maximum entropy probability distribution Exponential family 张学文的《组成论》 190页

Last updated

Was this helpful?