Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • SAG
  • Stochastic Variance Reduced Gradient
  • 代码

Was this helpful?

  1. MATH-Convex optimization

SAG

PreviousADMMNextMATH-碎碎念

Last updated 4 years ago

Was this helpful?

SAG

Stochastic Variance Reduced Gradient

注意区分公式中的w,不戴波浪的是内层mini batch的参数,戴波浪的是外层的,也就是最终的。 每过一段时间计算一次所有样本的梯度μ~\tilde{\mu}μ~​ 每个阶段内部的单次更新采用▽ψit(wt−1)−▽ψit(w~)+μ~\triangledown \psi_{i_t} (w_{t-1}) - \triangledown \psi_{i_t} (\tilde{w}) + \tilde{\mu}▽ψit​​(wt−1​)−▽ψit​​(w~)+μ~​来更新当前参数 option I 与option II,选一个来更新参数。见:

In practical implementations, it is natural to choose option I, or take w˜s to be the average of the past t iterates. However, our analysis depends on option II. Note that each stage s requires 2m + n gradient computations (for some convex problems, one may save the intermediate gradients ∇ψi( ˜w), and thus only m + n gradient computations are needed). Therefore it is natural to choose m to be the same order of n but slightly larger (for example m = 2n for convex problems and m = 5n for nonconvex problems in our experiments).

代码

StochasticVarianceReducedGradient.py

class SVRGOptimizer():
    def __init__(self,m,learning_rate,gradientFunc):
        self.m = m
        self.eta = learning_rate
        self.gradientFunc = gradientFunc

    def minimize(self,loss):
        w = np.random.normal(dim)
        batchs_num = X_train.shape[0]/batch_size
        for epoch in range(epochs_n):
            fullGradient = self.gradientFunc(w,X_train)
            wInter = w
            for batch in batchs:
                batchGradientInter = self.gradientFunc(wInter,batch)
                batchGradient = self.gradientFunc(w,batch)
                wInter = wInter - self.eta*(batchGradientInter-batchGradient+fullGradient)
            w = wInter
        return w

SVRG应该是“最好”的算法了

【论文每日读】NIPS 2016 Tutorial 优化问题
http://cs.nju.edu.cn/lwj/slides/PDSL.pdf
Riemannian stochastic variance reduced gradient
线性收敛的随机优化算法 之 SAG、SVRG
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction