SAG

Stochastic Variance Reduced Gradient

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction 注意区分公式中的w，不戴波浪的是内层mini batch的参数，戴波浪的是外层的，也就是最终的。每过一段时间计算一次所有样本的梯度 $\tilde{\mu}$ 每个阶段内部的单次更新采用 $\triangledown \psi_{i_t} (w_{t-1}) - \triangledown \psi_{i_t} (\tilde{w}) + \tilde{\mu}$ 来更新当前参数 option I 与option II，选一个来更新参数。见：

In practical implementations, it is natural to choose option I, or take w˜s to be the average of the past t iterates. However, our analysis depends on option II. Note that each stage s requires 2m + n gradient computations (for some convex problems, one may save the intermediate gradients ∇ψi( ˜w), and thus only m + n gradient computations are needed). Therefore it is natural to choose m to be the same order of n but slightly larger (for example m = 2n for convex problems and m = 5n for nonconvex problems in our experiments).

【论文每日读】NIPS 2016 Tutorial 优化问题 SVRG应该是“最好”的算法了

代码

StochasticVarianceReducedGradient.py

class SVRGOptimizer():
    def __init__(self,m,learning_rate,gradientFunc):
        self.m = m
        self.eta = learning_rate
        self.gradientFunc = gradientFunc

    def minimize(self,loss):
        w = np.random.normal(dim)
        batchs_num = X_train.shape[0]/batch_size
        for epoch in range(epochs_n):
            fullGradient = self.gradientFunc(w,X_train)
            wInter = w
            for batch in batchs:
                batchGradientInter = self.gradientFunc(wInter,batch)
                batchGradient = self.gradientFunc(w,batch)
                wInter = wInter - self.eta*(batchGradientInter-batchGradient+fullGradient)
            w = wInter
        return w

http://cs.nju.edu.cn/lwj/slides/PDSL.pdf

Riemannian stochastic variance reduced gradient

PreviousADMM NextMATH-碎碎念

Last updated 5 years ago

Was this helpful?