# ERNIE-Pytorch

**Repository Path**: pessoa92/ERNIE-Pytorch

## Basic Information

- **Project Name**: ERNIE-Pytorch
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2020-09-09
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ERNIE-Pytorch

This project is to convert [ERNIE](https://github.com/PaddlePaddle/ERNIE) to [huggingface's](https://github.com/huggingface/pytorch-transformers) format.

ERNIE is based on the Bert model and has better performance on Chinese NLP tasks.

**Currently this project only supports the conversion of ERNIE 1.0 version.**

## How to use
You can use the version I have converted or convert it by yourself.

requirements

```txt
paddlepaddle-gpu==1.4.0.post87
```

### Directly Download

Directly download has converted ERNIE model:

|model|description|
|:---:|:---:|
|[ERNIE 1.0 Base for Chinese(pre-train step max-seq-len-128)](https://drive.google.com/open?id=1k7G41gaQvaqOhmQt-b5KSj27YcHjdSpV)|with params, config and vocabs|
|[ERNIE 1.0 Base for Chinese(pre-train step max-seq-len-512)](https://drive.google.com/open?id=1il88pC5DabgypSYAF8pq_E2cuNrNuUAC)|with params, config and vocabs|

### Convert by yourself

1. Download the paddle-paddle version ERNIE1.0 model from [here](https://github.com/PaddlePaddle/ERNIE#models), and move to this project path.

2. check the `add_argument` in `convert_ernie_to_pytorch.py` and run `python convert_ernie_to_pytorch.py`, you can get the log:

```
===================extract weights start====================
word_embedding -> bert.embeddings.word_embeddings.weight (18000, 768)
pos_embedding -> bert.embeddings.position_embeddings.weight (513, 768)
sent_embedding -> bert.embeddings.token_type_embeddings.weight (2, 768)
pre_encoder_layer_norm_scale -> bert.embeddings.LayerNorm.gamma (768,)
pre_encoder_layer_norm_bias -> bert.embeddings.LayerNorm.beta (768,)
encoder_layer_0_multi_head_att_query_fc.w_0 -> bert.encoder.layer.0.attention.self.query.weight (768, 768)
encoder_layer_0_multi_head_att_query_fc.b_0 -> bert.encoder.layer.0.attention.self.query.bias (768,)
encoder_layer_0_multi_head_att_key_fc.w_0 -> bert.encoder.layer.0.attention.self.key.weight (768, 768)
encoder_layer_0_multi_head_att_key_fc.b_0 -> bert.encoder.layer.0.attention.self.key.bias (768,)
encoder_layer_0_multi_head_att_value_fc.w_0 -> bert.encoder.layer.0.attention.self.value.weight (768, 768)
encoder_layer_0_multi_head_att_value_fc.b_0 -> bert.encoder.layer.0.attention.self.value.bias (768,)
encoder_layer_0_multi_head_att_output_fc.w_0 -> bert.encoder.layer.0.attention.output.dense.weight (768, 768)
encoder_layer_0_multi_head_att_output_fc.b_0 -> bert.encoder.layer.0.attention.output.dense.bias (768,)
encoder_layer_0_post_att_layer_norm_bias -> bert.encoder.layer.0.attention.output.LayerNorm.bias (768,)
encoder_layer_0_post_att_layer_norm_scale -> bert.encoder.layer.0.attention.output.LayerNorm.weight (768,)
encoder_layer_0_ffn_fc_0.w_0 -> bert.encoder.layer.0.intermediate.dense.weight (3072, 768)
encoder_layer_0_ffn_fc_0.b_0 -> bert.encoder.layer.0.intermediate.dense.bias (3072,)
encoder_layer_0_ffn_fc_1.w_0 -> bert.encoder.layer.0.output.dense.weight (768, 3072)
encoder_layer_0_ffn_fc_1.b_0 -> bert.encoder.layer.0.output.dense.bias (768,)
encoder_layer_0_post_ffn_layer_norm_bias -> bert.encoder.layer.0.output.LayerNorm.bias (768,)
encoder_layer_0_post_ffn_layer_norm_scale -> bert.encoder.layer.0.output.LayerNorm.weight (768,)
.......
encoder_layer_11_multi_head_att_query_fc.w_0 -> bert.encoder.layer.11.attention.self.query.weight (768, 768)
encoder_layer_11_multi_head_att_query_fc.b_0 -> bert.encoder.layer.11.attention.self.query.bias (768,)
encoder_layer_11_multi_head_att_key_fc.w_0 -> bert.encoder.layer.11.attention.self.key.weight (768, 768)
encoder_layer_11_multi_head_att_key_fc.b_0 -> bert.encoder.layer.11.attention.self.key.bias (768,)
encoder_layer_11_multi_head_att_value_fc.w_0 -> bert.encoder.layer.11.attention.self.value.weight (768, 768)
encoder_layer_11_multi_head_att_value_fc.b_0 -> bert.encoder.layer.11.attention.self.value.bias (768,)
encoder_layer_11_multi_head_att_output_fc.w_0 -> bert.encoder.layer.11.attention.output.dense.weight (768, 768)
encoder_layer_11_multi_head_att_output_fc.b_0 -> bert.encoder.layer.11.attention.output.dense.bias (768,)
encoder_layer_11_post_att_layer_norm_bias -> bert.encoder.layer.11.attention.output.LayerNorm.bias (768,)
encoder_layer_11_post_att_layer_norm_scale -> bert.encoder.layer.11.attention.output.LayerNorm.weight (768,)
encoder_layer_11_ffn_fc_0.w_0 -> bert.encoder.layer.11.intermediate.dense.weight (3072, 768)
encoder_layer_11_ffn_fc_0.b_0 -> bert.encoder.layer.11.intermediate.dense.bias (3072,)
encoder_layer_11_ffn_fc_1.w_0 -> bert.encoder.layer.11.output.dense.weight (768, 3072)
encoder_layer_11_ffn_fc_1.b_0 -> bert.encoder.layer.11.output.dense.bias (768,)
encoder_layer_11_post_ffn_layer_norm_bias -> bert.encoder.layer.11.output.LayerNorm.bias (768,)
encoder_layer_11_post_ffn_layer_norm_scale -> bert.encoder.layer.11.output.LayerNorm.weight (768,)
pooled_fc.w_0 -> bert.pooler.dense.weight (768, 768)
pooled_fc.b_0 -> bert.pooler.dense.bias (768,)
====================extract weights done!===================
======================save model start======================
finish save model
finish save config
finish save vocab
======================save model done!======================
```


## Test

```Python
#!/usr/bin/env python
# encoding: utf-8
import torch
from pytorch_transformers import BertTokenizer, BertModel

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('./ERNIE-converted')

input_ids = torch.tensor([tokenizer.encode("这是百度的ERNIE1.0模型")])

model = BertModel.from_pretrained('./ERNIE-converted')

all_hidden_states, all_attentions = model(input_ids)[-2:]

print('all_hidden_states shape', all_hidden_states.shape)
print(all_hidden_states)
"""
I1207 13:11:57.768735 4573365696 configuration_utils.py:148] loading configuration file ./ERNIE-converted/config.json
I1207 13:11:57.769177 4573365696 configuration_utils.py:168] Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": null,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 513,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 2,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 18000
}

I1207 13:11:57.769847 4573365696 modeling_utils.py:334] loading weights file ./ERNIE-converted/pytorch_model.bin
all_hidden_states shape torch.Size([1, 12, 768])
tensor([[[-0.2229, -0.3131,  0.0088,  ...,  0.0199, -1.0507,  0.5315],
         [-0.8425, -0.0086,  0.2039,  ..., -0.1681,  0.0459, -1.1015],
         [ 0.7147,  0.1788,  0.7055,  ...,  0.4651,  0.8798, -0.5982],
         ...,
         [-0.9507, -0.3732, -0.9508,  ...,  0.4992, -0.0545,  1.2238],
         [ 0.2940,  0.0286, -0.2381,  ...,  1.0630,  0.0387, -0.5267],
         [-0.1940,  0.1136,  0.0118,  ...,  0.9859,  0.4807, -1.5650]]],
       grad_fn=<NativeLayerNormBackward>)

"""
```

You can use `BertForMaskedLM` from [pytorch-transformers](https://github.com/huggingface/pytorch-transformers) to test the converted model, an example is shown below, where bert-base is google's Chinese-BERT, bert-wwm and bert-wwm-ext are download from [Chinese-BERT-wwm](https://github.com/ymcui/Chinese-BERT-wwm).
```
input: [MASK] [MASK] [MASK] 是中国神魔小说的经典之作，与《三国演义》《水浒传》《红楼梦》并称为中国古典四大名著。
output:
{
        "bert-base": "《 神 》",
        "bert-wwm": "天 神 奇",
        "bert-wwm-ext": "西 游 记",
        "ernie": "西 游 记"
}
```

## Citation

If you use this work in a scientific publication, I would appreciate references to the following BibTex entry:

```latex
@misc{nghuyong2019@ERNIE-Pytorch,
  title={ERNIEPytorch},
  author={Yong Hu},
  howpublished={\url{https://github.com/nghuyong/ERNIE-Pytorch}},
  year={2019}
}
```

## Reference

1. https://arxiv.org/abs/1904.09223
2. https://github.com/PaddlePaddle/LARK/issues/37#issuecomment-474203851