# understand-transformer

**Repository Path**: goodshred/understand-transformer

## Basic Information

- **Project Name**: understand-transformer
- **Description**: 理解transformer
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-01-08
- **Last Updated**: 2024-01-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

### transformer代码及快速入门文档
 [transformer代码及快速入门文档](https://transformers.run/back/attention/)

### transformer模型结构
![img](img/transformer_architecture.png)

[Transformer模型详解（图解最完整版）](https://zhuanlan.zhihu.com/p/338817680)

[Transformer的PyTorch实现](https://wmathor.com/index.php/archives/1455/)

[Transformer的PyTorch实现-b站视频讲解](https://www.bilibili.com/video/BV1mk4y1q7eK?p=2)
https://zhuanlan.zhihu.com/p/340727654?utm_id=0

https://blog.51cto.com/u_16213599/8761507

https://cloud.tencent.com/developer/article/1888512


输入处理
词向量
位置向量
编码器
解码器


词嵌入/词向量 (Embedding)
位置编码（Positional Encoding）
多头注意力（Multi-Head Attention）
残差连接（Add：让网络只关注当前差异的部分），层标准化（Norm：Layer Normalization：将每一层神经元的输入都转成均值方差都一样的，这样可以加快收敛）：二者组合可以避免梯度消失或梯度爆炸问题
前馈神经（Feed Forward）

### 残差连接&梯度消失&梯度爆炸
对于resnet残差连接可以用“传话筒”游戏来通俗理解：类似于《王牌》中的传话筒，腾哥在看到了“狗中赤兔”这个词后，形象地演给花花看，花花又演给晓彤看，最后晓彤演给玲姐看，结果玲姐看完一脸懵～。可以看出，“狗中赤兔”在传递过程中信息是不断减少的，腾哥获得了最多的信息，而玲姐获得的最少，这就类似于浅层网络获得的信息多，而深层少，最终深层网络无法理解传来的信息，也就是玲姐猜不出来题。（这一现象称之为“梯度消失”，就是指信息一层层不断减少直至消失）那怎么办呢？为了解决这个问题，腾哥就跳过花花晓彤，单独给玲姐演了一遍，结果玲姐顿悟—“狗中赤兔”！这相当于浅层网络绕开中间网络，把信息直接传给了深层网络，深层网络秒懂。残差连接就是将信息直接传给深层网络，避免了浅层网络对信息的削减。（还有一种“梯度爆炸”现象，是指每一层网络传递的信息越来越多，导致深层网络直接“死机”了）

### transformer知识点
[注意力机制的本质|Self-Attention|Transformer|QKV矩阵-非常易懂](https://b23.tv/OoIpVRv)

根据腰围预测体重：Q是要查询的腰围比如57，已有的数据字典腰围key：体重value有
{51:40,56:43,58:48}, 根据下图softmax的公式可以预测出腰围57对应的体重
![img](img/transformerQKV.png)
根据腰围胸围预测体重身高
![img](img/transform_muti_dim_handle.png)
为了缓解梯度消失的问题，我们还会除以一个特征维度
![img](img/transformerScaledDotModel.png)
如果Q=K=V那就是自注意力机制
![img](img/transformerSelfAttention.png)

### HuggingFace模型下载
#### 背景
huggingface.co国内被墙了，不能访问，里面的模型可以从镜像网站下载
[异性岛-互联高科](https://aliendao.cn/#/)

[如何快速稳定地从huggingface下载模型](https://zhuanlan.zhihu.com/p/647843635)

#### 下载链接
单个下载地址：http://61.133.217.142:20800/download//models/bert-base-uncased/coreml/fill-mask/float32_model.mlpackage/Data/com.apple.CoreML/weights/weight.bin

model_download.py脚本支持断点续传

#### 操作方法
1. 如果网络断了，把有问题的那个文件删掉再重新下,应该是有个旧文件断点的记录没了，删了重下
2. 网站有限流机制，下载完一个文件后可以等一会再下，不然会报连接不上
3. 试着退出pycharm，重启后再跑命令试试
```python
pip install huggingface_hub
python model_download.py --repo_id 模型ID
# 例如
python model_download.py --repo_id Qwen/Qwen-7B
```

# python 语法

### FAQ
问题：urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with LibreSSL 2.8.3
解决：pip install urllib3==1.26.6


### 对象方法调用
python可以根据类名，加参数个数及类型找到对应方法
```python
class MultiHeadAttention(nn.Module):
    def __init__(self, config):
    # 带self说明这是类的方法  
    def forward(self, query, key, value, mask=None, query_mask=None, key_mask=None):

def f4():
    multihead_attn = MultiHeadAttention(config)
    query = key = value = inputs_embeds
    # python可以根据类名，加参数个数及类型找到对应方法，这里就是调用了forward方法
    attn_output = multihead_attn(query, key, value)
```
### 遍历字典
```python
import torch
# 还需要安装 pip install numpy
if __name__ == '__main__':
    src_text = torch.tensor([1, 2, 3], dtype=torch.float)
    print(src_text)
    # 结果：tensor([1., 2., 3.])
```
```python
text="a b c"
text.split() 结果 [a,b,c]
text.split(' ',1) 结果 [a,b c]

# 找到索引
text = "hello world"
word2idx = {"hello": 0, "world": 1}

indices = [word2idx[word] for word in text.split()]
print(indices) 结果 [0, 1]
```