Created by: yurakuratov
This PR fixes a problem with pre-trained BERT models by DeepPavlov. Previous checkpoints did not include some weights that are usually used only for pre-training.
Checkpoints from DeepPavlov docs http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#bert did not include bias parameter in NSP head.
Checkpoints from HuggingFace https://huggingface.co/DeepPavlov did not include MLM head and NSP head parameters.
Updated checkpoints:
- RuBERT (DeepPavlov/rubert-base-cased)
- Slavic BERT (DeepPavlov/bert-base-bg-cs-pl-ru-cased)
- Conversational BERT (DeepPavlov/bert-base-cased-conversational)
- Conversational RuBERT (DeepPavlov/rubert-base-cased-conversational)
Also, tokenizer configuration (tokenizer_config.json) was added to every DeepPavlov BERT model.
TODO:
-
update urls in DeepPavlov docs -
upload fixed models to HF
Related issues and discussions: