SAG
Last updated
Last updated
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction 注意区分公式中的w,不戴波浪的是内层mini batch的参数,戴波浪的是外层的,也就是最终的。 每过一段时间计算一次所有样本的梯度 每个阶段内部的单次更新采用来更新当前参数 option I 与option II,选一个来更新参数。见:
In practical implementations, it is natural to choose option I, or take w˜s to be the average of the past t iterates. However, our analysis depends on option II. Note that each stage s requires 2m + n gradient computations (for some convex problems, one may save the intermediate gradients ∇ψi( ˜w), and thus only m + n gradient computations are needed). Therefore it is natural to choose m to be the same order of n but slightly larger (for example m = 2n for convex problems and m = 5n for nonconvex problems in our experiments).
【论文每日读】NIPS 2016 Tutorial 优化问题 SVRG应该是“最好”的算法了
StochasticVarianceReducedGradient.py