Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page

Was this helpful?

  1. rank

LambdaRank

PreviousRankNetNextSimRank

Last updated 5 years ago

Was this helpful?

LambdaRank就是在RankNet中定义了一个λ\lambdaλ 作为梯度,然后从新反推出新的损失函数。

RankNet的损失函数中设计到两个神经网络的输出si,sjs_i,s_jsi​,sj​ ,所以梯度如下:

∂Cij∂Wk=∂Cij∂si∂si∂Wk+∂Cij∂sj∂sj∂Wk\frac {\partial C_{ij}}{\partial W_k} = \frac {\partial C_{ij}}{\partial s_i} \frac {\partial s_i}{\partial W_k} + \frac {\partial C_{ij}}{\partial s_j} \frac {\partial s_j}{\partial W_k}∂Wk​∂Cij​​=∂si​∂Cij​​∂Wk​∂si​​+∂sj​∂Cij​​∂Wk​∂sj​​

一次请求召回的待排序文档集D的损失C=∑i,j∈D∂Cij∂WkC = \sum_{i,j \in D} \frac {\partial C_{ij}}{\partial W_k}C=∑i,j∈D​∂Wk​∂Cij​​

对梯度展开

∂Cij∂si=∂12(1−Sij)σ(si−sj)+log⁡(1+e−σ(si−sj))∂si=12(1−Sij)−11+e−σ(si−sj)=−∂Cij∂sj\frac {\partial C_{ij}}{\partial s_i} = \frac {\partial \frac {1}{2} (1-S_{ij}) \sigma(s_i-s_j) + \log (1+e^{-\sigma(s_i-s_j)}) }{\partial s_i} = \frac {1}{2} (1-S_{ij}) - \frac {1}{1+e^{-\sigma(s_i-s_j)}} = - \frac {\partial C_{ij}}{\partial s_j}∂si​∂Cij​​=∂si​∂21​(1−Sij​)σ(si​−sj​)+log(1+e−σ(si​−sj​))​=21​(1−Sij​)−1+e−σ(si​−sj​)1​=−∂sj​∂Cij​​
∂Cij∂Wk=(12(1−Sij)−11+e−σ(si−sj))(∂si∂Wk−∂sj∂Wk)=λij(∂si∂Wk−∂sj∂Wk)λij=12(1−Sij)−11+e−σ(si−sj)\frac {\partial C_{ij}}{\partial W_k} = (\frac {1}{2} (1-S_{ij}) - \frac {1}{1+e^{-\sigma(s_i-s_j)}})(\frac {\partial s_i}{\partial W_k} - \frac {\partial s_j}{\partial W_k}) = \lambda_{ij} (\frac {\partial s_i}{\partial W_k} - \frac {\partial s_j}{\partial W_k}) \\ \lambda_{ij} = \frac {1}{2} (1-S_{ij}) - \frac {1}{1+e^{-\sigma(s_i-s_j)}}∂Wk​∂Cij​​=(21​(1−Sij​)−1+e−σ(si​−sj​)1​)(∂Wk​∂si​​−∂Wk​∂sj​​)=λij​(∂Wk​∂si​​−∂Wk​∂sj​​)λij​=21​(1−Sij​)−1+e−σ(si​−sj​)1​
对于文档pair,由于di▹dj因此:Sij=1所以λij=−11+e−σ(si−sj)\text{对于文档pair,由于} d_i \triangleright d_j \text{因此:} S_{ij} = 1 \text{所以}\\ \lambda_{ij} = - \frac {1}{1+e^{-\sigma(s_i-s_j)}}对于文档pair,由于di​▹dj​因此:Sij​=1所以λij​=−1+e−σ(si​−sj​)1​

LambdaRank相比RankNet的优势在于分解因式后训练速度变快,同时考虑了评价指标,直接对问题求解,效果更明显。

Ranklib开源工具包定义的数据格式如下:

label qid:$id $feaid:$feavalue $feaid:$feavalue … #description

每行代表一个样本,相同查询请求的样本的qid相同,label表示该样本和该查询请求的相关程度,description描述该样本属于哪个待排序文档,用于区分不同的文档。

参考佳文

因此对每个文档did_idi​,有λi=∑j(i,j)∈Dλij−∑j(j,i)∈Dλij\lambda_i = \sum_{j(i,j) \in D} \lambda_{ij} - \sum_{j(j,i) \in D} \lambda_{ij}λi​=∑j(i,j)∈D​λij​−∑j(j,i)∈D​λij​,即每一个文档下一次调序的方向和强度取决于所有同一query的其他不同label的文档。

此外,还引入评价指标Z(如NDCG、ERR等),把交换两个文档的位置引起的评价指标的变化∣ΔZi,j∣\left| \Delta Z_{i,j} \right|∣ΔZi,j​∣作为其中一个因子。

通过梯度反推出LambdaRank的损失函数:Cij=log⁡(1+e−σ(si−sj))∣ΔZi,j∣C_{ij} = \log (1+e^{-\sigma(s_i-s_j)}) \left| \Delta Z_{i,j} \right|Cij​=log(1+e−σ(si​−sj​))∣ΔZi,j​∣ 。

达观数据帮你揭开搜索引擎排序的神秘面纱

Learning To Rank之LambdaMART的前世今生
http://datayuan.baijia.baidu.com/article/486753