L-BFGS

L-BFGS###

先介绍下BFGS算法过程

BFGS ####

Gk+1=VkTGkVk+ρkskskTρk=1ykTsk,Vk=IρkykskTG_{k+1} = V_k^T G_k V_k + \rho_k s_k s_k^T \\ \rho_k = \frac {1}{y_k^T s_k} , V_k = I - \rho_k y_k s_k^T

缺点:矩阵本身的存储大小GkG_k

Gk+1=(Vk1TV0T)G0(V0Vk1)+(Vk1TV1T)s0ρ0s0T(V1Vk1)+(Vk1TV2T)s1ρ1s1T(V2Vk1)+(Vk1TV3T)s2ρ2s2T(V3Vk1)++Vk1Tsk2ρk2sk2TVk1+sk1ρk1sk1T\begin{align} G_{k+1} & = (V_{k-1}^T \ldots V_0^T) G_0 (V_0 \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_1^T) s_0 \rho_0 s_0^T (V_1 \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_2^T) s_1 \rho_1 s_1^T (V_2 \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_3^T) s_2 \rho_2 s_2^T (V_3 \ldots V_{k-1}) \\ & + \ldots \\ & + V_{k-1}^T s_{k-2} \rho_{k-2} s_{k-2}^T V_{k-1} \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \\ \end{align}

从易存储的初始矩阵出发,可以迭代求出GkG_k。而且计算到一定步数之后,可以不用从第一步开始迭代,比如可以取从前m步开始迭代,找一个Gk0G_k^0跟前m步迭代结果“差不多”就可以近似了。 所以递归公式为:

Gk+1=(Vk1TVkmT)Gk0(VkmVk1)+(Vk1TVkm+1T)skmρkmskmT(Vkm+1Vk1)+(Vk1TVkm+2T)skm+1ρkm+1skm+1T(Vkm+2Vk1)+(Vk1TVkm+3T)skm+2ρkm+2skm+2T(Vkm+3Vk1)++Vk1Tsk2ρk2sk2TVk1+sk1ρk1sk1T\begin{align} G_{k+1} & = (V_{k-1}^T \ldots V_{k-m}^T) G_k^0 (V_{k-m} \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_{k-m+1}^T) s_{k-m} \rho_{k-m} s_{k-m}^T (V_{k-m+1} \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_{k-m+2}^T) s_{k-m+1} \rho_{k-m+1} s_{k-m+1}^T (V_{k-m+2} \ldots V_{k-1}) \\ & + (V_{k-1}^T \ldots V_{k-m+3}^T) s_{k-m+2} \rho_{k-m+2} s_{k-m+2}^T (V_{k-m+3} \ldots V_{k-1}) \\ & + \ldots \\ & + V_{k-1}^T s_{k-2} \rho_{k-2} s_{k-2}^T V_{k-1} \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \end{align}

所求方向:

Gk+1f=(Vk1TVkmT)Gk0(VkmVk1)f+(Vk1TVkm+1T)skmρkmskmT(Vkm+1Vk1)f+(Vk1TVkm+2T)skm+1ρkm+1skm+1T(Vkm+2Vk1)f+(Vk1TVkm+3T)skm+2ρkm+2skm+2T(Vkm+3Vk1)f++Vk1Tsk2ρk2sk2TVk1f+sk1ρk1sk1Tf\begin{align} G_{k+1} \nabla f & = (V_{k-1}^T \ldots V_{k-m}^T) G_k^0 (V_{k-m} \ldots V_{k-1}) \nabla f \\ & + (V_{k-1}^T \ldots V_{k-m+1}^T) s_{k-m} \rho_{k-m} s_{k-m}^T (V_{k-m+1} \ldots V_{k-1}) \nabla f \\ & + (V_{k-1}^T \ldots V_{k-m+2}^T) s_{k-m+1} \rho_{k-m+1} s_{k-m+1}^T (V_{k-m+2} \ldots V_{k-1}) \nabla f \\ & + (V_{k-1}^T \ldots V_{k-m+3}^T) s_{k-m+2} \rho_{k-m+2} s_{k-m+2}^T (V_{k-m+3} \ldots V_{k-1}) \nabla f \\ & + \ldots \\ & + V_{k-1}^T s_{k-2} \rho_{k-2} s_{k-2}^T V_{k-1} \nabla f \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \nabla f \end{align}

等一等,这个式子中有大量的重复计算:

Gk+1f=(Vk1TVkmT)Gk0(VkmVk1)f+(Vk1TVkm+1T)skmρkmskmT(Vkm+1Vk1)f+(Vk1TVkm+2T)skm+1ρkm+1skm+1T(Vkm+2Vk1)f+(Vk1TVkm+3T)skm+2ρkm+2skm+2T(Vkm+3Vk1)f++Vk1Tsk2ρk2sk2TVk1f+sk1ρk1sk1Tf\begin{align} G_{k+1} \nabla f &= \color{RoyalBlue}{ (V_{k-1}^T \ldots V_{k-m}^T) } G_k^0 \color{ForestGreen}{ (V_{k-m} \ldots V_{k-1}) \nabla f } \\ & + \color{RoyalBlue}{(V_{k-1}^T \ldots V_{k-m+1}^T)} s_{k-m} \rho_{k-m} s_{k-m}^T \color{ForestGreen}{ (V_{k-m+1} \ldots V_{k-1}) \nabla f }\\ & + \color{RoyalBlue}{(V_{k-1}^T \ldots V_{k-m+2}^T)} s_{k-m+1} \rho_{k-m+1} s_{k-m+1}^T \color{ForestGreen}{ (V_{k-m+2} \ldots V_{k-1}) \nabla f }\\ & + \color{RoyalBlue}{(V_{k-1}^T \ldots V_{k-m+3}^T)} s_{k-m+2} \rho_{k-m+2} s_{k-m+2}^T \color{ForestGreen}{ (V_{k-m+3} \ldots V_{k-1}) \nabla f }\\ & + \ldots \\ & + \color{RoyalBlue}{ V_{k-1}^T } s_{k-2} \rho_{k-2} s_{k-2}^T \color{ForestGreen}{ V_{k-1} \nabla f} \\ & + s_{k-1} \rho_{k-1} s_{k-1}^T \color{ForestGreen}{\nabla f } \end{align}

右侧优化:

  • 递推:qi=Viqi+1q_i = V_i q_{i+1}

  • 初始:qk=fq_k = \nabla f

  • 得到:$$

    \begin{align}

    q{k-1} &= V{k-1} \nabla f \

    q{k-2} &= V{k-2} V_{k-1} \nabla f \

    \ldots \

    q{k-m} &= V{k-m} \ldots V{k-2} V{k-1} \nabla f

    \end{align}

    $$

  • αki=ρkiskiT(Vki+1Vki+2Vk2Vk1)f\alpha_{k-i} = \rho_{k-i} s_{k-i}^T ( V_{k-i+1} V_{k-i+2} \ldots V_{k-2} V_{k-1}) \nabla f

然后左侧优化:

  • 递推:ri+1=Vi+1Tri+si+1αi+1=ri+si+1(αi+1ρi+1yi+1Tri)r_{i+1} = V_{i+1}^T r_i + s_{i+1} \alpha_{i+1} = r_i + s_{i+1} (\alpha_{i+1} - \rho_{i+1} y_{i+1}^T r_i)

  • 初始:rkm=VkmTGk0(VkmVk1)f+skmαkmr_{k-m} = V_{k-m}^T G_k^0 (V_{k-m} \ldots V_{k-1}) \nabla f + s_{k-m} \alpha_{k-m}

  • 得到$$

    r{k-1} = (V{k-1}^T \ldots V{k-m}^T) G_k^0 (V{k-m} \ldots V_{k-1}) \nabla f \

  • (V{k-1}^T \ldots V{k-m+1}^T) s{k-m} \alpha{k-m} \

  • (V{k-1}^T \ldots V{k-m+2}^T) s{k-m+1} \alpha{k-m+1} \

  • (V{k-1}^T \ldots V{k-m+3}^T) s{k-m+2} \alpha{k-m+2} \

  • \ldots \

  • V{k-1}^T s{k-2} \alpha_{k-2} \

  • s{k-1} \alpha{k-1}

    $$

rk1r_{k-1}就是最后要求解的值。

L-BFGS算法过程:

Gk0G_k^0 的选择:sk1Tyk1yk1Tyk1I\frac { s_{k-1}^T y_{k-1} }{ y_{k-1}^T y_{k-1} } I

L-BFGS法计算方向更精美一点的排版####

看第6步,就直接计算出 Gk0G_k^0 了。之前看到有些代码是传入了Gk0G_k^0

整个L-BFGS法下降过程####

待扩展###

百度有个shooting算法 ,比LBFGS更快,对hession矩阵分块,使得特征值差不多的靠的更近。

并行化###

参考佳文####

Python L-BFGS 理解L-BFGS算法 机器学习优化算法L-BFGS及其分布式实现

Last updated