Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • GBDT(Gradient Boosting Decision Tree)
  • LambdaMART
  • 参考佳文

Was this helpful?

  1. boosting

gradient boosting

PreviouscatboostNextNewton Boosting

Last updated 5 years ago

Was this helpful?

优化目标:m个模型叠加起来最小化损失函数 {βm,αm}m=1M=arg⁡min⁡L(y,∑m=1Mβmh(x,αm)){\{\beta_m,\alpha_m\}}_{m=1}^M = \arg \min L(y, \sum_{m=1}^M \beta_m h(x,\alpha_m)){βm​,αm​}m=1M​=argminL(y,∑m=1M​βm​h(x,αm​)) , 设Fm(x)=Fm−1(x)+βmh(x,αm)=∑i=1mβmh(x,αm)F_m(x) = F_{m-1}(x) + \beta_m h(x,\alpha_m) = \sum_{i=1}^m \beta_m h(x,\alpha_m)Fm​(x)=Fm−1​(x)+βm​h(x,αm​)=∑i=1m​βm​h(x,αm​) , 则βm,αm=arg⁡min⁡L(y,Fm−1(x)+βmh(x,αm))\beta_m,\alpha_m = \arg \min L(y, F_{m-1}(x) + \beta_m h(x,\alpha_m))βm​,αm​=argminL(y,Fm−1​(x)+βm​h(x,αm​))

每一步新建的模型使得损失函数下降为当前步的负梯度方向。 gm(x)=−[∂L(y,Fm(x))∂Fm(x)]Fm(x)g_m(x) = - [\frac {\partial L(y,F_m(x))}{\partial F_m(x)} ]_{F_m(x)}gm​(x)=−[∂Fm​(x)∂L(y,Fm​(x))​]Fm​(x)​,

为了使当前模型能够与负梯度方向一致,故优化该式子 βm,αm=arg⁡min⁡(gm(x)−βmh(x,αm))2\beta_m ,\alpha_m = \arg \min (g_m(x) - \beta_m h(x,\alpha_m) )^2βm​,αm​=argmin(gm​(x)−βm​h(x,αm​))2 最后叠加到全部模型Fm(x)=Fm−1(x)+βmh(x,αm)F_m(x) = F_{m-1}(x) + \beta_m h(x,\alpha_m)Fm​(x)=Fm−1​(x)+βm​h(x,αm​)

GBDT(Gradient Boosting Decision Tree)

每添加一个模型,朝着误差的负梯度方向下降,不能朝着牛顿方向下降吗?

泰勒展开

f(x)=f(x0)0!+f′(x0)1!(x−x0)+f′′(x0)2!(x−x0)2+…+f(n)(x0)n!(x−x0)n+Rn(x)f ( x ) = \frac { f \left( x _ { 0 } \right) } { 0 ! } + \frac { f ^ { \prime } \left( x _ { 0 } \right) } { 1 ! } \left( x - x _ { 0 } \right) + \frac { f ^ { \prime \prime } \left( x _ { 0 } \right) } { 2 ! } \left( x - x _ { 0 } \right) ^ { 2 } + \ldots + \frac { f ^ { ( n ) } \left( x _ { 0 } \right) } { n ! } \left( x - x _ { 0 } \right) ^ { n } + R _ { n } ( x )f(x)=0!f(x0​)​+1!f′(x0​)​(x−x0​)+2!f′′(x0​)​(x−x0​)2+…+n!f(n)(x0​)​(x−x0​)n+Rn​(x)

常用的

ex=1+11!x+12!x2+13!x3+o(x3)ln⁡(1+x)=x−12x2+13!x3+o(x3)sin⁡x=x−13!x3+15!x5+∘(x5)arcsin⁡x=x+12×x33+1×32×4×x55+1×3×52×4×6×x77+∘(x7)cos⁡x=1−12!x2+14!x4+o(x4)11−x=1+x+x2+a(a−1)2!x2+a(a−1)(a−2)3!x3+o(x3)\begin{array} { l } { e ^ { x } = 1 + \frac { 1 } { 1 ! } x + \frac { 1 } { 2 ! } x ^ { 2 } + \frac { 1 } { 3 ! } x ^ { 3 } + o \left( x ^ { 3 } \right) } \\ { \ln ( 1 + x ) = x - \frac { 1 } { 2 } x ^ { 2 } + \frac { 1 } { 3 ! } x ^ { 3 } + o \left( x ^ { 3 } \right) } \\ { \sin x = x - \frac { 1 } { 3 ! } x ^ { 3 } + \frac { 1 } { 5 ! } x ^ { 5 } + \circ \left( x ^ { 5 } \right) } \\ { \arcsin x = x + \frac { 1 } { 2 } \times \frac { x ^ { 3 } } { 3 } + \frac { 1 \times 3 } { 2 \times 4 } \times \frac { x ^ { 5 } } { 5 } + \frac { 1 \times 3 \times 5 } { 2 \times 4 \times 6 } \times \frac { x ^ { 7 } } { 7 } + \circ \left( x ^ { 7 } \right) } \\ { \cos x = 1 - \frac { 1 } { 2 ! } x ^ { 2 } + \frac { 1 } { 4 ! } x ^ { 4 } + o \left( x ^ { 4 } \right) } \\ { \frac { 1 } { 1 - x } = 1 + x + x ^ { 2 } + \frac { a ( a - 1 ) } { 2 ! } x ^ { 2 } + \frac { a ( a - 1 ) ( a - 2 ) } { 3 ! } x ^ { 3 } + o \left( x ^ { 3 } \right) } \end{array}ex=1+1!1​x+2!1​x2+3!1​x3+o(x3)ln(1+x)=x−21​x2+3!1​x3+o(x3)sinx=x−3!1​x3+5!1​x5+∘(x5)arcsinx=x+21​×3x3​+2×41×3​×5x5​+2×4×61×3×5​×7x7​+∘(x7)cosx=1−2!1​x2+4!1​x4+o(x4)1−x1​=1+x+x2+2!a(a−1)​x2+3!a(a−1)(a−2)​x3+o(x3)​

LambdaMART

在排序中,大神直接定义了每步优化的梯度!!!然后套用了gradient boosting框架,就有了LambdaMART。

参考佳文

台湾大学的讲义:

在gradient boosting框架下,每个基模型都是决策树。

台湾大学林轩田老师的讲课很清晰容易理解。

Gradient Boosting
GBDT(Gradient Boosting Decision Tree) 没有实现只有原理
GBM之GBRT总结
GBDT 一个Python代码的解释
梯度提升算法
Gradient Boosting
GBDT