Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • L-BFGS###
  • BFGS ####
  • L-BFGS法计算方向更精美一点的排版####
  • 整个L-BFGS法下降过程####
  • 待扩展###
  • 并行化###
  • 参考佳文####

Was this helpful?

  1. MATH-Convex optimization

L-BFGS

L-BFGS###

先介绍下BFGS算法过程

BFGS ####

Gk+1=VkTGkVk+ρkskskTρk=1ykTsk,Vk=I−ρkykskTG_{k+1} = V_k^T G_k V_k + \rho_k s_k s_k^T \\ \rho_k = \frac {1}{y_k^T s_k} , V_k = I - \rho_k y_k s_k^TGk+1​=VkT​Gk​Vk​+ρk​sk​skT​ρk​=ykT​sk​1​,Vk​=I−ρk​yk​skT​

缺点:矩阵本身的存储大小GkG_kGk​

Gk+1=(Vk−1T…V0T)G0(V0…Vk−1)+(Vk−1T…V1T)s0ρ0s0T(V1…Vk−1)+(Vk−1T…V2T)s1ρ1s1T(V2…Vk−1)+(Vk−1T…V3T)s2ρ2s2T(V3…Vk−1)+…+Vk−1Tsk−2ρk−2sk−2TVk−1+sk−1ρk−1sk−1T\begin{align} G_{k+1} & = (V_{k-1}^T \ldots V_0^T) G_0 (V_0 \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_1^T) s_0 \rho_0 s_0^T (V_1 \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_2^T) s_1 \rho_1 s_1^T (V_2 \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_3^T) s_2 \rho_2 s_2^T (V_3 \ldots V_{k-1}) \\ & + \ldots \\ & + V_{k-1}^T s_{k-2} \rho_{k-2} s_{k-2}^T V_{k-1} \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \\ \end{align}Gk+1​​=(Vk−1T​…V0T​)G0​(V0​…Vk−1​)+(Vk−1T​…V1T​)s0​ρ0​s0T​(V1​…Vk−1​)+(Vk−1T​…V2T​)s1​ρ1​s1T​(V2​…Vk−1​)+(Vk−1T​…V3T​)s2​ρ2​s2T​(V3​…Vk−1​)+…+Vk−1T​sk−2​ρk−2​sk−2T​Vk−1​+sk−1​ρk−1​sk−1T​​​

从易存储的初始矩阵出发,可以迭代求出GkG_kGk​。而且计算到一定步数之后,可以不用从第一步开始迭代,比如可以取从前m步开始迭代,找一个Gk0G_k^0Gk0​跟前m步迭代结果“差不多”就可以近似了。 所以递归公式为:

Gk+1=(Vk−1T…Vk−mT)Gk0(Vk−m…Vk−1)+(Vk−1T…Vk−m+1T)sk−mρk−msk−mT(Vk−m+1…Vk−1)+(Vk−1T…Vk−m+2T)sk−m+1ρk−m+1sk−m+1T(Vk−m+2…Vk−1)+(Vk−1T…Vk−m+3T)sk−m+2ρk−m+2sk−m+2T(Vk−m+3…Vk−1)+…+Vk−1Tsk−2ρk−2sk−2TVk−1+sk−1ρk−1sk−1T\begin{align} G_{k+1} & = (V_{k-1}^T \ldots V_{k-m}^T) G_k^0 (V_{k-m} \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_{k-m+1}^T) s_{k-m} \rho_{k-m} s_{k-m}^T (V_{k-m+1} \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_{k-m+2}^T) s_{k-m+1} \rho_{k-m+1} s_{k-m+1}^T (V_{k-m+2} \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_{k-m+3}^T) s_{k-m+2} \rho_{k-m+2} s_{k-m+2}^T (V_{k-m+3} \ldots V_{k-1}) \\ & + \ldots \\ & + V_{k-1}^T s_{k-2} \rho_{k-2} s_{k-2}^T V_{k-1} \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \end{align}Gk+1​​=(Vk−1T​…Vk−mT​)Gk0​(Vk−m​…Vk−1​)+(Vk−1T​…Vk−m+1T​)sk−m​ρk−m​sk−mT​(Vk−m+1​…Vk−1​)+(Vk−1T​…Vk−m+2T​)sk−m+1​ρk−m+1​sk−m+1T​(Vk−m+2​…Vk−1​)+(Vk−1T​…Vk−m+3T​)sk−m+2​ρk−m+2​sk−m+2T​(Vk−m+3​…Vk−1​)+…+Vk−1T​sk−2​ρk−2​sk−2T​Vk−1​+sk−1​ρk−1​sk−1T​​​

所求方向:

Gk+1∇f=(Vk−1T…Vk−mT)Gk0(Vk−m…Vk−1)∇f+(Vk−1T…Vk−m+1T)sk−mρk−msk−mT(Vk−m+1…Vk−1)∇f+(Vk−1T…Vk−m+2T)sk−m+1ρk−m+1sk−m+1T(Vk−m+2…Vk−1)∇f+(Vk−1T…Vk−m+3T)sk−m+2ρk−m+2sk−m+2T(Vk−m+3…Vk−1)∇f+…+Vk−1Tsk−2ρk−2sk−2TVk−1∇f+sk−1ρk−1sk−1T∇f\begin{align} G_{k+1} \nabla f & = (V_{k-1}^T \ldots V_{k-m}^T) G_k^0 (V_{k-m} \ldots V_{k-1}) \nabla f \\ & + (V_{k-1}^T \ldots V_{k-m+1}^T) s_{k-m} \rho_{k-m} s_{k-m}^T (V_{k-m+1} \ldots V_{k-1}) \nabla f \\ & + (V_{k-1}^T \ldots V_{k-m+2}^T) s_{k-m+1} \rho_{k-m+1} s_{k-m+1}^T (V_{k-m+2} \ldots V_{k-1}) \nabla f \\ & + (V_{k-1}^T \ldots V_{k-m+3}^T) s_{k-m+2} \rho_{k-m+2} s_{k-m+2}^T (V_{k-m+3} \ldots V_{k-1}) \nabla f \\ & + \ldots \\ & + V_{k-1}^T s_{k-2} \rho_{k-2} s_{k-2}^T V_{k-1} \nabla f \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \nabla f \end{align}Gk+1​∇f​=(Vk−1T​…Vk−mT​)Gk0​(Vk−m​…Vk−1​)∇f+(Vk−1T​…Vk−m+1T​)sk−m​ρk−m​sk−mT​(Vk−m+1​…Vk−1​)∇f+(Vk−1T​…Vk−m+2T​)sk−m+1​ρk−m+1​sk−m+1T​(Vk−m+2​…Vk−1​)∇f+(Vk−1T​…Vk−m+3T​)sk−m+2​ρk−m+2​sk−m+2T​(Vk−m+3​…Vk−1​)∇f+…+Vk−1T​sk−2​ρk−2​sk−2T​Vk−1​∇f+sk−1​ρk−1​sk−1T​∇f​​

等一等,这个式子中有大量的重复计算:

Gk+1∇f=(Vk−1T…Vk−mT)Gk0(Vk−m…Vk−1)∇f+(Vk−1T…Vk−m+1T)sk−mρk−msk−mT(Vk−m+1…Vk−1)∇f+(Vk−1T…Vk−m+2T)sk−m+1ρk−m+1sk−m+1T(Vk−m+2…Vk−1)∇f+(Vk−1T…Vk−m+3T)sk−m+2ρk−m+2sk−m+2T(Vk−m+3…Vk−1)∇f+…+Vk−1Tsk−2ρk−2sk−2TVk−1∇f+sk−1ρk−1sk−1T∇f\begin{align} G_{k+1} \nabla f &= \color{RoyalBlue}{ (V_{k-1}^T \ldots V_{k-m}^T) } G_k^0 \color{ForestGreen}{ (V_{k-m} \ldots V_{k-1}) \nabla f } \\ & + \color{RoyalBlue}{(V_{k-1}^T \ldots V_{k-m+1}^T)} s_{k-m} \rho_{k-m} s_{k-m}^T \color{ForestGreen}{ (V_{k-m+1} \ldots V_{k-1}) \nabla f }\\ & + \color{RoyalBlue}{(V_{k-1}^T \ldots V_{k-m+2}^T)} s_{k-m+1} \rho_{k-m+1} s_{k-m+1}^T \color{ForestGreen}{ (V_{k-m+2} \ldots V_{k-1}) \nabla f }\\ & + \color{RoyalBlue}{(V_{k-1}^T \ldots V_{k-m+3}^T)} s_{k-m+2} \rho_{k-m+2} s_{k-m+2}^T \color{ForestGreen}{ (V_{k-m+3} \ldots V_{k-1}) \nabla f }\\ & + \ldots \\ & + \color{RoyalBlue}{ V_{k-1}^T } s_{k-2} \rho_{k-2} s_{k-2}^T \color{ForestGreen}{ V_{k-1} \nabla f} \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \color{ForestGreen}{\nabla f } \end{align}Gk+1​∇f​=(Vk−1T​…Vk−mT​)Gk0​(Vk−m​…Vk−1​)∇f+(Vk−1T​…Vk−m+1T​)sk−m​ρk−m​sk−mT​(Vk−m+1​…Vk−1​)∇f+(Vk−1T​…Vk−m+2T​)sk−m+1​ρk−m+1​sk−m+1T​(Vk−m+2​…Vk−1​)∇f+(Vk−1T​…Vk−m+3T​)sk−m+2​ρk−m+2​sk−m+2T​(Vk−m+3​…Vk−1​)∇f+…+Vk−1T​sk−2​ρk−2​sk−2T​Vk−1​∇f+sk−1​ρk−1​sk−1T​∇f​​

右侧优化:

  • 递推:qi=Viqi+1q_i = V_i q_{i+1}qi​=Vi​qi+1​

  • 初始:qk=∇fq_k = \nabla fqk​=∇f

  • 得到:$$

    \begin{align}

    q{k-1} &= V{k-1} \nabla f \

    q{k-2} &= V{k-2} V_{k-1} \nabla f \

    \ldots \

    q{k-m} &= V{k-m} \ldots V{k-2} V{k-1} \nabla f

    \end{align}

    $$

  • αk−i=ρk−isk−iT(Vk−i+1Vk−i+2…Vk−2Vk−1)∇f\alpha_{k-i} = \rho_{k-i} s_{k-i}^T ( V_{k-i+1} V_{k-i+2} \ldots V_{k-2} V_{k-1}) \nabla fαk−i​=ρk−i​sk−iT​(Vk−i+1​Vk−i+2​…Vk−2​Vk−1​)∇f

然后左侧优化:

  • 递推:ri+1=Vi+1Tri+si+1αi+1=ri+si+1(αi+1−ρi+1yi+1Tri)r_{i+1} = V_{i+1}^T r_i + s_{i+1} \alpha_{i+1} = r_i + s_{i+1} (\alpha_{i+1} - \rho_{i+1} y_{i+1}^T r_i)ri+1​=Vi+1T​ri​+si+1​αi+1​=ri​+si+1​(αi+1​−ρi+1​yi+1T​ri​)

  • 初始:rk−m=Vk−mTGk0(Vk−m…Vk−1)∇f+sk−mαk−mr_{k-m} = V_{k-m}^T G_k^0 (V_{k-m} \ldots V_{k-1}) \nabla f + s_{k-m} \alpha_{k-m}rk−m​=Vk−mT​Gk0​(Vk−m​…Vk−1​)∇f+sk−m​αk−m​

  • 得到$$

    r{k-1} = (V{k-1}^T \ldots V{k-m}^T) G_k^0 (V{k-m} \ldots V_{k-1}) \nabla f \

  • (V{k-1}^T \ldots V{k-m+1}^T) s{k-m} \alpha{k-m} \

  • (V{k-1}^T \ldots V{k-m+2}^T) s{k-m+1} \alpha{k-m+1} \

  • (V{k-1}^T \ldots V{k-m+3}^T) s{k-m+2} \alpha{k-m+2} \

  • \ldots \

  • V{k-1}^T s{k-2} \alpha_{k-2} \

  • s{k-1} \alpha{k-1}

    $$

rk−1r_{k-1}rk−1​就是最后要求解的值。

L-BFGS算法过程:

Gk0G_k^0Gk0​ 的选择:sk−1Tyk−1yk−1Tyk−1I\frac { s_{k-1}^T y_{k-1} }{ y_{k-1}^T y_{k-1} } Iyk−1T​yk−1​sk−1T​yk−1​​I

L-BFGS法计算方向更精美一点的排版####

看第6步,就直接计算出 Gk0G_k^0Gk0​ 了。之前看到有些代码是传入了Gk0G_k^0Gk0​ 。

整个L-BFGS法下降过程####

待扩展###

百度有个shooting算法 ,比LBFGS更快,对hession矩阵分块,使得特征值差不多的靠的更近。

并行化###

参考佳文####

Previous牛顿法Next最速下降法

Last updated 5 years ago

Was this helpful?

《Large-scale L-BFGS using MapReduce》 整个算法过程(Algorithm 1)中,第2,3,4步可以优化。 第2,4步很按数据容易拆分。 第3步拆分如下:

Python L-BFGS
理解L-BFGS算法
机器学习优化算法L-BFGS及其分布式实现