Generalized Linear Models

自然指数分布族

自然指数分布族 Exponential family$$，则称x服从自然指数分布族分布。多看看wiki的介绍。涉及到的东西很多。

指数分布族包括：

N(x;\mu,\sigma) = \frac {1}{\sqrt {2\pi} \sigma} \exp(-\frac {(x-\mu)^2}{2\sigma^2})

p(y|p) = p^y(1-p)^{1-y} = \exp (y \log \frac {p}{1-p} + \log (1-p)) \\ \text{link function:} \eta(p) = \log \frac {p}{1-p} \\ \text{response function:} p = \frac {1}{1+e^{-\eta}}

p(x|\lambda) = \frac {\lambda^*}{x!} e^{-\lambda} = \exp(x \ln \lambda-\lambda) \frac {1}{x!}

T(x|\mu,\sigma,v) = [1+\frac {1}{v}(\frac {x-\mu} {\sigma})^2]^{- \frac {v+1}{2}}

此分布形式上与高斯分布类似，弥补了高斯分布的一个不足，就是高斯分布对离群的数据非常敏感，但是Student t分布更鲁棒。一般设置ν=4，在大多数实际问题中都有很好的性能，当ν大于等于5时将会是去鲁棒性，同时会迅速收敛到高斯分布。特别的，当ν=1时，被称为柯西分布（Cauchy）。

为什么弄出个指数分布族？MLAPP page313

指数分布族理论上都有共轭先验分布
将分布全部转换成指数形式，然后给定的约束条件下，熵值最大的函数就是他们各自的分布
It can be shown that, under certain regularity conditions, the exponential family is the only
family of distributions with finite-sized sufficient statistics, meaning that we can compress
the data into a fixed-sized summary without loss of information. This is particularly useful
for online learning, as we will see later.
The exponential family is the only family of distributions for which conjugate priors exist,
which simplifies the computation of the posterior (see Section 9.2.5).
The exponential family can be shown to be the family of distributions that makes the least
set of assumptions subject to some user-chosen constraints (see Section 9.2.6).
The exponential family is at the core of generalized linear models, as discussed in Section 9.3.
The exponential family is at the core of variational inference, as discussed in Section 21.2.
指数簇分布的最大熵等价于其指数形式的最大似然。

广义线性模型，是为了克服线性回归模型的缺点出现的，是线性回归模型的推广。首先自变量可以是离散的，也可以是连续的。离散的可以是0-1变量，也可以是多种取值的变量。与线性回归模型相比较，有以下推广：

随机误差项不一定服从正态分布，可以服从二项、泊松、负二项、正态、伽马、逆高斯等分布，这些分布被统称为指数分布族。
引入联接函数g(⋅)。因变量和自变量通过联接函数产生影响，即Y=g(Xβ)，联接函数满足单调，可导。常用的联接函数有恒等 $Y=X\beta$ ，对数 $Y=\ln(X\beta)$ ，幂函数 $Y=(X\beta)^k$ ，平方根 $Y=\sqrt {X\beta}$ ， $Y= logit(\ln(\frac {Y}{1-Y})) = X\beta$ 等。

根据不同的数据，可以自由选择不同的模型。大家比较熟悉的Logit模型就是使用Logit联接、随机误差项服从二项分布得到模型。

p(y|x;θ)满足指数分布族，也就是说，给定x和θ，y的分布情况满足以η为参数的指数分布族的分布。
给定x，我们的目标是预测T(y)的期望值，也即hθ(x)=E[T(y)|x]
自然参数η和输入x是线性关系:η=θTx
y | x; θ ∼ ExponentialFamily(η). I.e., given x and θ, the distribution of y follows some exponential family distribution, with parameter η.
Given x, our goal is to predict the expected value of T(y) given x. In most of our examples, we will have T(y) = y, so this means we would like the prediction h(x) output by our learned hypothesis h to 25 satisfy h(x) = E[y|x]. (Note that this assumption is satisfied in the choices for hθ(x) for both logistic regression and linear regression. For instance, in logistic regression, we had hθ(x) = p(y = 1|x; θ) = 0 · p(y = 0|x; θ) + 1 · p(y = 1|x; θ) = E[y|x; θ].)
The natural parameter η and the inputs x are related linearly: η = θ T x. (Or, if η is vector-valued, then ηi = θ T i x.)

对于广义线性模型，取决于采用什么分布。采用正太分布，则得到最小二乘模型。采用伯努利分布，则得到logistic模型。然后用梯度下降等求线性部分参数。

Last updated 5 years ago

Was this helpful?