Instructions to use UCSYNLP/MyanBERTa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use UCSYNLP/MyanBERTa with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="UCSYNLP/MyanBERTa")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("UCSYNLP/MyanBERTa") model = AutoModelForMaskedLM.from_pretrained("UCSYNLP/MyanBERTa") - Notebooks
- Google Colab
- Kaggle
Model description
This model is a BERT based Myanmar pre-trained language model. MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words). As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.
Cite this work as:
Aye Mya Hlaing, Win Pa Pa, "MyanBERTa: A Pre-trained Language Model For
Myanmar", In Proceedings of 2022 International Conference on Communication and Computer Research (ICCR2022), November 2022, Seoul, Republic of Korea
- Downloads last month
- 74