Machine Learning
  • Introduction
  • man
  • Linear model
    • Linear Regression
    • Generalized Linear Models
    • Nonlinear regression
  • bayes
    • bayesian network
    • Variational Bayesian inference
    • Gaussian Process Regression
  • Logistic Regression
    • L1 regularization
    • L2 regularization
    • softmax
    • Overflow and Underflow
  • SVM
    • C-SVM
    • C-SVM求解
  • EM
    • GMM
  • Maximum Entropy
    • IIS
  • HMM
    • viterbi algorithm
  • CRF
  • Random Forest
    • bagging
    • random forest
  • boosting
    • catboost
    • gradient boosting
    • Newton Boosting
    • online boosting
    • gcForest
    • Mixture models
    • XGBoost
    • lightGBM
    • SecureBoost
  • LDA
  • rank
    • RankNet
    • LambdaRank
    • SimRank
  • Factorization Machine
    • Field-aware Factorization Machine
    • xdeepFM
  • Clustering
    • BIRCH
    • Deep Embedding Clustering
  • Kalman filtering
  • word2vec
  • 关联规则挖掘
  • MATH-Mathematical Analysis
    • measure
  • MATH-probability
    • Variational Inference
    • Dirichlet分布
    • Gibbs Sampling
    • Maximum entropy probability distribution
    • Conjugate prior
    • Gaussian Process
    • Markov process
    • Poisson process
    • measure
    • Gumbel
  • MATH-Linear Algebra
    • SVD
    • SVD-推荐
    • PCA
    • Linear Discriminant Analysis
    • Nonnegative Matrix Factorization
  • MATH-Convex optimization
    • 梯度下降
    • 随机梯度下降
    • 牛顿法
    • L-BFGS
    • 最速下降法
    • 坐标下降法
    • OWL-QN
    • 对偶问题
    • 障碍函数法
    • 原对偶内点法
    • ISTA
    • ADMM
    • SAG
  • MATH-碎碎念
    • cost function
    • Learning Theory
    • sampling
    • Entropy
    • variational inference
    • basis function
    • Diffie–Hellman key exchange
    • wavelet transform
    • 图
    • Portfolio
    • 凯利公式
  • ML碎碎念
    • 特征
    • test
    • TF-IDF
    • population stability index
    • Shapley Values
  • 课件
    • xgboost算法演进
  • Time Series
  • PID
  • graph
    • SimRank
    • community detection
    • FRAUDAR
    • Anti-Trust Rank
    • Struc2Vec
    • graph theory
    • GNN
  • Anomaly Detection
    • Isolation Forest
    • Time Series
  • Dimensionality Reduction
    • Deep Embedded Clustering
  • Federated Learning
  • automl
  • Look-alike
  • KNN
  • causal inference
Powered by GitBook
On this page
  • 总结
  • Hierarchical Softmax
  • 句子的相似度

Was this helpful?

word2vec

PreviousKalman filteringNext关联规则挖掘

Last updated 5 years ago

Was this helpful?

截图

看上面的链接内容,直接从第5章开始看。

总结

L=log⁡∏uP(u∣Content(w))L=\log \prod_{u} P(u|Content(w))L=log∏u​P(u∣Content(w))

P(u∣Content(w))=σ(XwTθu)orσ(−XwTθu)P(u|Content(w)) = \sigma(X_w^T \theta^u) \quad or \quad \sigma(-X_w^T \theta^u)P(u∣Content(w))=σ(XwT​θu)orσ(−XwT​θu)

Parallelizing Word2Vec in Shared and Distributed Memory

两个词经常一块出现<=>两个词在某方面有相似语义<=>两个向量在某些维度取值类似

上面这篇文章要细读

  1. 仅仅利用了word co-occurrence 。忽略了语法等。

  2. interchanged 。A small, fluffy roosety climbed a tree. 根据上下文,可以认为roosety就是松鼠squirrel,因为这两个可以交换。

  3. 用pointwise mutual information (PMI)度量两个point的距离 PMI(a,b)=log⁡[P(a,b)P(a)P(b)]=log⁡[P(a∣b)P(a)]PMI(a, b) = \log \left[ \frac{P(a,b)}{P(a)P(b)} \right] = \log \left[ \frac{P(a|b)}{P(a)} \right]PMI(a,b)=log[P(a)P(b)P(a,b)​]=log[P(a)P(a∣b)​] ,一般用近似的PMI(a,b)=v⃗a⋅v⃗bPMI(a, b) = \vec{v}_a \cdot \vec{v}_bPMI(a,b)=va​⋅vb​ 。这个就是向量内积,衡量两个vector接近。

  4. 上面的PMI可以用来做推荐系统

Hierarchical Softmax

句子的相似度

算出句子每个词向量之后用

word2vec相较于之前的Word Embedding方法好在什么地方呢?

king - man + woman is queen; but why?

w

下的WMD_tutorial.ipynb

https://arxiv.org/abs/1604.04661
https://github.com/IntelLabs/pWord2Vec
http://www.52cs.org/?p=22
https://www.zhihu.com/question/53011711/answer/133115595
http://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html
GloVe: Global Vectors for Word Representation
为何做DL用word2vec比glove多?
定制你的个性化词向量:解读+中文实战
Embedding算法之矩阵分解
word2vec模型和源码解析
一篇通俗易懂的word2vec
word2vec中关于霍夫曼树的应用原理
ord2vec原理(二) 基于Hierarchical Softmax的模型
用綫性規劃去計算句子之間的相似度
Finding similar documents with Word2Vec and WMD
如何用 word2vec 计算两个句子之间的相似度?
分布的相似度(距离)用什么模型比较好?
word2vec 中的数学原理详解