Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • 自然指数分布族
  • 正太分布
  • bernoulli分布
  • 泊松分布
  • student分布
  • 广义线性模型
  • three assumptions
  • 建模
  • 参考佳文

Was this helpful?

  1. Linear model

Generalized Linear Models

PreviousLinear RegressionNextNonlinear regression

Last updated 4 years ago

Was this helpful?

自然指数分布族

自然指数分布族 $$,则称x服从自然指数分布族分布。 多看看wiki的介绍。涉及到的东西很多。

指数分布族包括:

  • ,多元正态分布

  • (对k个结果的事件建模),

  • (对计数过程建模)

  • (对实数的间隔问题建模)

  • (对小数建模)

  • (对概率分布进行建模)

  • (协方差矩阵的分布)

正太分布

N(x;μ,σ)=12πσexp⁡(−(x−μ)22σ2)N(x;\mu,\sigma) = \frac {1}{\sqrt {2\pi} \sigma} \exp(-\frac {(x-\mu)^2}{2\sigma^2})N(x;μ,σ)=2π​σ1​exp(−2σ2(x−μ)2​)

bernoulli分布

p(y∣p)=py(1−p)1−y=exp⁡(ylog⁡p1−p+log⁡(1−p))link function:η(p)=log⁡p1−presponse function:p=11+e−ηp(y|p) = p^y(1-p)^{1-y} = \exp (y \log \frac {p}{1-p} + \log (1-p)) \\ \text{link function:} \eta(p) = \log \frac {p}{1-p} \\ \text{response function:} p = \frac {1}{1+e^{-\eta}}p(y∣p)=py(1−p)1−y=exp(ylog1−pp​+log(1−p))link function:η(p)=log1−pp​response function:p=1+e−η1​

泊松分布

p(x∣λ)=λ∗x!e−λ=exp⁡(xln⁡λ−λ)1x!p(x|\lambda) = \frac {\lambda^*}{x!} e^{-\lambda} = \exp(x \ln \lambda-\lambda) \frac {1}{x!}p(x∣λ)=x!λ∗​e−λ=exp(xlnλ−λ)x!1​

student分布

T(x∣μ,σ,v)=[1+1v(x−μσ)2]−v+12T(x|\mu,\sigma,v) = [1+\frac {1}{v}(\frac {x-\mu} {\sigma})^2]^{- \frac {v+1}{2}}T(x∣μ,σ,v)=[1+v1​(σx−μ​)2]−2v+1​

此分布形式上与高斯分布类似,弥补了高斯分布的一个不足,就是高斯分布对离群的数据非常敏感,但是Student t分布更鲁棒。一般设置ν=4,在大多数实际问题中都有很好的性能,当ν大于等于5时将会是去鲁棒性,同时会迅速收敛到高斯分布。 特别的,当ν=1时,被称为柯西分布(Cauchy)。

为什么弄出个指数分布族?MLAPP page313

  • 指数分布族理论上都有共轭先验分布

  • 将分布全部转换成指数形式,然后给定的约束条件下,熵值最大的函数就是他们各自的分布

  • It can be shown that, under certain regularity conditions, the exponential family is the only

    family of distributions with finite-sized sufficient statistics, meaning that we can compress

    the data into a fixed-sized summary without loss of information. This is particularly useful

    for online learning, as we will see later.

  • The exponential family is the only family of distributions for which conjugate priors exist,

    which simplifies the computation of the posterior (see Section 9.2.5).

  • The exponential family can be shown to be the family of distributions that makes the least

    set of assumptions subject to some user-chosen constraints (see Section 9.2.6).

  • The exponential family is at the core of generalized linear models, as discussed in Section 9.3.

  • The exponential family is at the core of variational inference, as discussed in Section 21.2.

  • 指数簇分布的最大熵等价于其指数形式的最大似然。

广义线性模型

广义线性模型,是为了克服线性回归模型的缺点出现的,是线性回归模型的推广。 首先自变量可以是离散的,也可以是连续的。离散的可以是0-1变量,也可以是多种取值的变量。 与线性回归模型相比较,有以下推广:

  • 随机误差项不一定服从正态分布,可以服从二项、泊松、负二项、正态、伽马、逆高斯等分布,这些分布被统称为指数分布族。

  • 引入联接函数g(⋅)。因变量和自变量通过联接函数产生影响,即Y=g(Xβ),联接函数满足单调,可导。常用的联接函数有恒等 Y=XβY=X\betaY=Xβ,对数Y=ln⁡(Xβ)Y=\ln(X\beta)Y=ln(Xβ),幂函数Y=(Xβ)kY=(X\beta)^kY=(Xβ)k,平方根Y=XβY=\sqrt {X\beta}Y=Xβ​,Y=logit(ln⁡(Y1−Y))=XβY= logit(\ln(\frac {Y}{1-Y})) = X\betaY=logit(ln(1−YY​))=Xβ等。

根据不同的数据,可以自由选择不同的模型。大家比较熟悉的Logit模型就是使用Logit联接、随机误差项服从二项分布得到模型。

three assumptions

  • p(y|x;θ)满足指数分布族,也就是说,给定x和θ,y的分布情况满足以η为参数的指数分布族的分布。

  • 给定x,我们的目标是预测T(y)的期望值,也即hθ(x)=E[T(y)|x]

  • 自然参数η和输入x是线性关系:η=θTx

  • y | x; θ ∼ ExponentialFamily(η). I.e., given x and θ, the distribution of y follows some exponential family distribution, with parameter η.

  • Given x, our goal is to predict the expected value of T(y) given x. In most of our examples, we will have T(y) = y, so this means we would like the prediction h(x) output by our learned hypothesis h to 25 satisfy h(x) = E[y|x]. (Note that this assumption is satisfied in the choices for hθ(x) for both logistic regression and linear regression. For instance, in logistic regression, we had hθ(x) = p(y = 1|x; θ) = 0 · p(y = 0|x; θ) + 1 · p(y = 1|x; θ) = E[y|x; θ].)

  • The natural parameter η and the inputs x are related linearly: η = θ T x. (Or, if η is vector-valued, then ηi = θ T i x.)

对于广义线性模型,取决于采用什么分布。采用正太分布,则得到最小二乘模型。采用伯努利分布,则得到logistic模型。然后用梯度下降等求线性部分参数。

建模

参考佳文

Gamma 分布

Exponential family
Normal distribution
Bernoulli distribution
Poisson distribution
Gamma distribution
Beta distribution
Dirichlet distribution
Wishart distribution
学生t-分布
怎么来理解伽玛(gamma)分布?
广义线性模型
广义线性模
统一分布:指数模型家族