Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • L1范数产生稀疏解###
  • proximal gradient descent###
  • ISTA###
  • FISTA###

Was this helpful?

  1. MATH-Convex optimization

ISTA

Previous原对偶内点法NextADMM

Last updated 5 years ago

Was this helpful?

L1范数产生稀疏解###

min⁡F(x)=∣∣Ax−b∣∣2+λ∣∣x∣∣pp\min F(x) = ||Ax-b||^2 + \lambda ||x||_p^pminF(x)=∣∣Ax−b∣∣2+λ∣∣x∣∣pp​ 假设最优解为x∗x^*x∗,则原问题等价于: min⁡xF(x)=∣∣Ax−b∣∣2s.t.∣∣x∣∣pp≤CC=∣∣x∗∣∣pp\min_x F(x) = ||Ax-b||^2 \qquad s.t. \quad ||x||_p^p \le C \quad C=||x^*||_p^pminx​F(x)=∣∣Ax−b∣∣2s.t.∣∣x∣∣pp​≤CC=∣∣x∗∣∣pp​

proximal gradient descent###

在梯度下降中,如果 ∇f(x)\nabla f(x)∇f(x) 满足L-Lipschitz,即: ∥∇f(xk+1)−∇f(xk)∥≤L∥xk+1−xk∥\left \| \nabla f(x_{k+1}) - \nabla f(x_k) \right \| \le L \left \| x_{k+1} - x_k \right \|∥∇f(xk+1​)−∇f(xk​)∥≤L∥xk+1​−xk​∥ , 在xkx_kxk​点泰勒展开:

f(x,xk)=f(xk)+⟨x−xk,∇f(xk)⟩+L2∥x−xk∥2=L2∥x−(xk−tk∇f(xk))∥2+φ(xk)f(x,x_k) = f(x_{k}) + \left \langle x-x_{k},\nabla f(x_{k}) \right \rangle + \frac {L}{2} \left \| x-x_{k} \right \|^2 \\ = \frac {L}{2} \left \| x- (x_{k} -t_k \nabla f(x_{k})) \right \|^2 + \varphi (x_k)f(x,xk​)=f(xk​)+⟨x−xk​,∇f(xk​)⟩+2L​∥x−xk​∥2=2L​∥x−(xk​−tk​∇f(xk​))∥2+φ(xk​)

最小在 xk+1=xk−1L∇f(xk+1)x_{k+1} = x_{k} - \frac {1}{L} \nabla f(x_{k+1})xk+1​=xk​−L1​∇f(xk+1​)。 如果优化目标中,加入非光滑的惩罚项,比如L1,对光滑的部分做上述展开,则称为proximal gradient descent(PGD)。

ISTA###

ISTA算法解决非平滑的,不可求导的凸优化问题,比如带能够产生稀疏解的L1范数问题,min⁡{f(x)+λ∥x∥1}\min \{ f(x) + \lambda \left \| x \right \|_1 \}min{f(x)+λ∥x∥1​}。

通过对目标函数进行分解,将其平滑的部分用近端正则逼近。即每步在优化原问题的一个变化上界。 每一步迭代中,优化变量完全解耦,且子问题存在闭式解。

列:min⁡F(x)=∥Ax−b∥2+λ∥x∥pp\min F(x) = \left \| Ax-b \right \|^2 + \lambda \left \| x \right \|_p^pminF(x)=∥Ax−b∥2+λ∥x∥pp​。 该问题等价于:min⁡x∥Ax−b∥2s.t.∥x∥pp<CC=∥x∗∥ppx∗是最优解\min_x \left \| Ax-b \right \|^2 \quad s.t. \left \| x \right \|_p^p \lt C \quad C=\left \| x^* \right \|_p^p \quad x^*\text{是最优解}minx​∥Ax−b∥2s.t.∥x∥pp​<CC=∥x∗∥pp​x∗是最优解。

每一步迭代求解:

xk=arg⁡min⁡x{f(xk−1)+⟨x−xk−1,∇f(xk−1)⟩+12tk∥x−xk−1∥2+λ∥x∥1}≈arg⁡min⁡x{12tk∥x−(xk−1−tk∇f(xk−1))∥2+λ∥x∥1}x_k = \arg \min_x \{ {\color{Blue} {f(x_{k-1}) + \left \langle x-x_{k-1},\nabla f(x_{k-1}) \right \rangle + \frac {1}{2t_k} \left \| x-x_{k-1} \right \|^2 }} + \lambda \left \| x \right \|_1 \} \\ \approx \arg \min_x \{ \frac {1}{2t_k} \left \| x- (x_{k-1} -t_k \nabla f(x_{k-1})) \right \|^2 + \lambda \left \| x \right \|_1 \}xk​=argxmin​{f(xk−1​)+⟨x−xk−1​,∇f(xk−1​)⟩+2tk​1​∥x−xk−1​∥2+λ∥x∥1​}≈argxmin​{2tk​1​∥x−(xk−1​−tk​∇f(xk−1​))∥2+λ∥x∥1​}

存在闭式解:

xk=τλtk(xk−1−tk∇f(xk−1))τα(x)=sign(x)max⁡{0,∣x∣−α}x_k = \tau_{\lambda t_k} (x_{k-1} -t_k \nabla f(x_{k-1})) \\ \tau_{\alpha}(x) = sign(x) \max \{0, \left| x \right| -\alpha\}xk​=τλtk​​(xk−1​−tk​∇f(xk−1​))τα​(x)=sign(x)max{0,∣x∣−α}

收敛: ∥xk−xk−1∥2→0\left \| x_k - x_{k-1} \right \|^2 \to 0∥xk​−xk−1​∥2→0时收敛,上界等于原目标函数。

如何确定tkt_ktk​?

from MLAPP: Chapter 13. Sparse linear models (page 446)

FISTA###

《A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems》

关于Nesterov’s method可以见SGD的Nesterov Momentum。

Proximal Gradient Descent for L1 Regularization
Python implementation of the Fast Iterative Shrinkage/Thresholding Algorithm.