# softmax

Logistic Regession宗旨是以线性分割面分割各类别 $$\log \frac {Pr(Y=i|x)}{ Pr(Y=j|x)}$$ .同时保持各类别的概率和为1.

LR依从 **bayesian** 法则，即它对样本的分类是看哪个类别的概率最大:

$$
G(x) = \arg \max\_k Pr(Y=k|x)
$$

因此类别i和j的分界面由决定

$$
Pr(Y=i|x) =  Pr(Y=j|x)
$$

**各类别概率**

$$
\log \frac{P(Y=1|x)}{P(Y=k|x)} = w\_1 \cdot x  \\
\log \frac{P(Y=2|x)}{P(Y=k|x)} = w\_2 \cdot x  \\
..... \\
\log \frac{P(Y=k-1|x)}{P(Y=k|x)} = w\_{k-1} \cdot x  \\
\\
P(Y=1|x) = P(Y=k|x) \exp(w\_1 \cdot x) \\
P(Y=2|x) = P(Y=k|x) \exp(w\_2 \cdot x) \\
..... \\
P(Y=k-1|x) = P(Y=k|x) \exp(w\_{k-1} \cdot x) \\
\\
P(Y=k|x) = \frac {1}{1+\sum\_{k=1}^{K-1} \exp(w\_k \cdot x)} \\
P(Y=1|x) = \frac {exp(w\_1 \cdot x)}{1+\sum\_{k=1}^{K-1} \exp(w\_k \cdot x)} \\
P(Y=2|x) = \frac {exp(w\_2 \cdot x)}{1+\sum\_{k=1}^{K-1} \exp(w\_k \cdot x)} \\
$$

这个过程都是$$P(Y=k|x)$$倍数，若

$$
P(Y=k|x) = \frac {\exp(w\_k \cdot x)}{\sum\_{k=1}^{K} \exp(w\_k \cdot x)} \\
\text{则} \\
P(Y=1|x) = \frac {\exp(w\_1 \cdot x)}{\sum\_{k=1}^{K} \exp(w\_k \cdot x)} \\
P(Y=2|x) = \frac {\exp(w\_2 \cdot x)}{\sum\_{k=1}^{K} \exp(w\_k \cdot x)} \\
$$

则，这就是softmax。

> 怎么感觉像无向图模型中一般用的势能函数一样。给每个类别赋一个势能，然后归一化。然后求势能最大的那个类别。

所以

$$
p\_j = \frac {\exp(w\_jx)} {\sum\_i^K \exp(w\_i x)}
$$

## softmax loss

一般分类的估计就是使用（负对数）极大似然估计（negative log-likelihood ）， 二分类的最大似然就是交叉熵。

$$
L(f(x\_i,\theta),y\_i) = -\frac {1}{m} \log \prod\_{i=1}^m \prod\_{k=1}^K p\_{i,k}^{1(y\_i = k)} + \lambda \sum\_{k=1} |\theta\_k| \\
\= -\frac {1}{m} \sum\_{i=1}^m \sum\_{k=1}^K {1(y\_i = k)} \log \frac {\exp(f(x\_i,\theta))} {\sum\_j \exp(f(x\_i,\theta))} + \lambda \sum\_{k=1} |\theta\_k| \\
$$

来个简化版的：

$$
L(x\_i,\theta\_k,y\_i) = -\frac {1}{m} \log \prod\_{i=1}^m \prod\_{k=1}^K p\_{i,k}^{1(y\_i = k)}  \\
\= -\frac {1}{m} \sum\_{i=1}^m  {1(y\_i = k)} \log \frac {\exp(x\_i \theta\_k)} {\sum\_j \exp(x\_i \theta\_k)}  \\
$$

求导得：

$$
\frac {\partial L}{\partial \theta\_k} = -\frac {1}{m} \sum\_{i=1}^m x\_i \[ {1(y\_i = k)} - p(y\_i = k | x\_i; \theta\_k) ]
$$

### 向量化表示

$$
h(\eta) = \frac {1}{\sum\_k e^{\eta\_k}} \begin{bmatrix}
e^{\eta\_1}      \\
e^{\eta\_2}      \\
\vdots \\
e^{\eta\_K}
\end{bmatrix} \\
J = -\sum\_{k=1}^K 1{y=k} \log(h\_k) \\
\frac {\partial J}{\partial \eta} = h-e\_y \quad e\_y \text{是第y个元素为1，其余为0的列向量}
$$

然后对于全部样本，优化函数为：

$$
L = -\sum\_{n=1}^N e\_y \log(h(\eta)) \\
\frac {\partial L}{\partial \eta} = \sum\_{n=1}^N (h-e\_y) \\
$$

### 参考佳文

[探究最陌生的老朋友Softmax](https://mp.weixin.qq.com/s?__biz=MzA3Mjk0OTgyMg==\&mid=2651123524\&idx=1\&sn=0546ceca3d88e2ff1e66fbecc99bd6a7)

[Softmax vs. Softmax-Loss: Numerical Stability](http://freemind.pluskid.org/machine-learning/softmax-vs-softmax-loss-numerical-stability/)\
[softmax回归](http://blog.csdn.net/acdreamers/article/details/44663305)\
[Softmax回归](http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92)\
[Caffe Softmax层的实现原理](https://www.zhihu.com/question/28927103)

[Softmax与交叉熵的数学意义](https://zhuanlan.zhihu.com/p/78711135)

[ArcFace，CosFace，SphereFace，三种人脸识别算法的损失函数的设计](https://zhuanlan.zhihu.com/p/285598652)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://json007.gitbook.io/svm/lr/softmax.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
