Generalized Linear Models

自然指数分布族

自然指数分布族 Exponential family$$,则称x服从自然指数分布族分布。 多看看wiki的介绍。涉及到的东西很多。

指数分布族包括:

正太分布

N(x;μ,σ)=12πσexp((xμ)22σ2)N(x;\mu,\sigma) = \frac {1}{\sqrt {2\pi} \sigma} \exp(-\frac {(x-\mu)^2}{2\sigma^2})

bernoulli分布

p(yp)=py(1p)1y=exp(ylogp1p+log(1p))link function:η(p)=logp1presponse function:p=11+eηp(y|p) = p^y(1-p)^{1-y} = \exp (y \log \frac {p}{1-p} + \log (1-p)) \\ \text{link function:} \eta(p) = \log \frac {p}{1-p} \\ \text{response function:} p = \frac {1}{1+e^{-\eta}}

泊松分布

p(xλ)=λx!eλ=exp(xlnλλ)1x!p(x|\lambda) = \frac {\lambda^*}{x!} e^{-\lambda} = \exp(x \ln \lambda-\lambda) \frac {1}{x!}

student分布

T(xμ,σ,v)=[1+1v(xμσ)2]v+12T(x|\mu,\sigma,v) = [1+\frac {1}{v}(\frac {x-\mu} {\sigma})^2]^{- \frac {v+1}{2}}

学生t-分布

此分布形式上与高斯分布类似,弥补了高斯分布的一个不足,就是高斯分布对离群的数据非常敏感,但是Student t分布更鲁棒。一般设置ν=4,在大多数实际问题中都有很好的性能,当ν大于等于5时将会是去鲁棒性,同时会迅速收敛到高斯分布。 特别的,当ν=1时,被称为柯西分布(Cauchy)。

Gamma 分布 怎么来理解伽玛(gamma)分布?

为什么弄出个指数分布族?MLAPP page313

  • 指数分布族理论上都有共轭先验分布

  • 将分布全部转换成指数形式,然后给定的约束条件下,熵值最大的函数就是他们各自的分布

  • It can be shown that, under certain regularity conditions, the exponential family is the only

    family of distributions with finite-sized sufficient statistics, meaning that we can compress

    the data into a fixed-sized summary without loss of information. This is particularly useful

    for online learning, as we will see later.

  • The exponential family is the only family of distributions for which conjugate priors exist,

    which simplifies the computation of the posterior (see Section 9.2.5).

  • The exponential family can be shown to be the family of distributions that makes the least

    set of assumptions subject to some user-chosen constraints (see Section 9.2.6).

  • The exponential family is at the core of generalized linear models, as discussed in Section 9.3.

  • The exponential family is at the core of variational inference, as discussed in Section 21.2.

  • 指数簇分布的最大熵等价于其指数形式的最大似然

广义线性模型

广义线性模型,是为了克服线性回归模型的缺点出现的,是线性回归模型的推广。 首先自变量可以是离散的,也可以是连续的。离散的可以是0-1变量,也可以是多种取值的变量。 与线性回归模型相比较,有以下推广:

  • 随机误差项不一定服从正态分布,可以服从二项、泊松、负二项、正态、伽马、逆高斯等分布,这些分布被统称为指数分布族。

  • 引入联接函数g(⋅)。因变量和自变量通过联接函数产生影响,即Y=g(Xβ),联接函数满足单调,可导。常用的联接函数有恒等 Y=XβY=X\beta,对数Y=ln(Xβ)Y=\ln(X\beta),幂函数Y=(Xβ)kY=(X\beta)^k,平方根Y=XβY=\sqrt {X\beta}Y=logit(ln(Y1Y))=XβY= logit(\ln(\frac {Y}{1-Y})) = X\beta等。

根据不同的数据,可以自由选择不同的模型。大家比较熟悉的Logit模型就是使用Logit联接、随机误差项服从二项分布得到模型。

three assumptions

  • p(y|x;θ)满足指数分布族,也就是说,给定x和θ,y的分布情况满足以η为参数的指数分布族的分布。

  • 给定x,我们的目标是预测T(y)的期望值,也即hθ(x)=E[T(y)|x]

  • 自然参数η和输入x是线性关系:η=θTx

  • y | x; θ ∼ ExponentialFamily(η). I.e., given x and θ, the distribution of y follows some exponential family distribution, with parameter η.

  • Given x, our goal is to predict the expected value of T(y) given x. In most of our examples, we will have T(y) = y, so this means we would like the prediction h(x) output by our learned hypothesis h to 25 satisfy h(x) = E[y|x]. (Note that this assumption is satisfied in the choices for hθ(x) for both logistic regression and linear regression. For instance, in logistic regression, we had hθ(x) = p(y = 1|x; θ) = 0 · p(y = 0|x; θ) + 1 · p(y = 1|x; θ) = E[y|x; θ].)

  • The natural parameter η and the inputs x are related linearly: η = θ T x. (Or, if η is vector-valued, then ηi = θ T i x.)

对于广义线性模型,取决于采用什么分布。采用正太分布,则得到最小二乘模型。采用伯努利分布,则得到logistic模型。然后用梯度下降等求线性部分参数。

建模

参考佳文

广义线性模型

广义线性模

统一分布:指数模型家族

Last updated