Instructions to use BM-K/KoSimCSE-roberta-multitask with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BM-K/KoSimCSE-roberta-multitask with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="BM-K/KoSimCSE-roberta-multitask")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("BM-K/KoSimCSE-roberta-multitask") model = AutoModel.from_pretrained("BM-K/KoSimCSE-roberta-multitask") - Inference
- Notebooks
- Google Colab
- Kaggle
| language: ko | |
| tags: | |
| - korean | |
| https://github.com/BM-K/Sentence-Embedding-is-all-you-need | |
| # Korean-Sentence-Embedding | |
| ๐ญ Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides environments where individuals can train models. | |
| ## Quick tour | |
| ```python | |
| import torch | |
| from transformers import AutoModel, AutoTokenizer | |
| def cal_score(a, b): | |
| if len(a.shape) == 1: a = a.unsqueeze(0) | |
| if len(b.shape) == 1: b = b.unsqueeze(0) | |
| a_norm = a / a.norm(dim=1)[:, None] | |
| b_norm = b / b.norm(dim=1)[:, None] | |
| return torch.mm(a_norm, b_norm.transpose(0, 1)) * 100 | |
| model = AutoModel.from_pretrained('BM-K/KoSimCSE-roberta-multitask') | |
| AutoTokenizer.from_pretrained('BM-K/KoSimCSE-roberta-multitask') | |
| sentences = ['์นํ๊ฐ ๋คํ์ ๊ฐ๋ก ์ง๋ฌ ๋จน์ด๋ฅผ ์ซ๋๋ค.', | |
| '์นํ ํ ๋ง๋ฆฌ๊ฐ ๋จน์ด ๋ค์์ ๋ฌ๋ฆฌ๊ณ ์๋ค.', | |
| '์์ญ์ด ํ ๋ง๋ฆฌ๊ฐ ๋๋ผ์ ์ฐ์ฃผํ๋ค.'] | |
| inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt") | |
| embeddings, _ = model(**inputs, return_dict=False) | |
| score01 = cal_score(embeddings[0][0], embeddings[1][0]) | |
| score02 = cal_score(embeddings[0][0], embeddings[2][0]) | |
| ``` | |
| ## Performance | |
| - Semantic Textual Similarity test set results <br> | |
| | Model | AVG | Cosine Pearson | Cosine Spearman | Euclidean Pearson | Euclidean Spearman | Manhattan Pearson | Manhattan Spearman | Dot Pearson | Dot Spearman | | |
| |------------------------|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:| | |
| | KoSBERT<sup>โ </sup><sub>SKT</sub> | 77.40 | 78.81 | 78.47 | 77.68 | 77.78 | 77.71 | 77.83 | 75.75 | 75.22 | | |
| | KoSBERT | 80.39 | 82.13 | 82.25 | 80.67 | 80.75 | 80.69 | 80.78 | 77.96 | 77.90 | | |
| | KoSRoBERTa | 81.64 | 81.20 | 82.20 | 81.79 | 82.34 | 81.59 | 82.20 | 80.62 | 81.25 | | |
| | | | | | | | | | | | |
| | KoSentenceBART | 77.14 | 79.71 | 78.74 | 78.42 | 78.02 | 78.40 | 78.00 | 74.24 | 72.15 | | |
| | KoSentenceT5 | 77.83 | 80.87 | 79.74 | 80.24 | 79.36 | 80.19 | 79.27 | 72.81 | 70.17 | | |
| | | | | | | | | | | | |
| | KoSimCSE-BERT<sup>โ </sup><sub>SKT</sub> | 81.32 | 82.12 | 82.56 | 81.84 | 81.63 | 81.99 | 81.74 | 79.55 | 79.19 | | |
| | KoSimCSE-BERT | 83.37 | 83.22 | 83.58 | 83.24 | 83.60 | 83.15 | 83.54 | 83.13 | 83.49 | | |
| | KoSimCSE-RoBERTa | 83.65 | 83.60 | 83.77 | 83.54 | 83.76 | 83.55 | 83.77 | 83.55 | 83.64 | | |
| | | | | | | | | | | | | |
| | KoSimCSE-BERT-multitask | 85.71 | 85.29 | 86.02 | 85.63 | 86.01 | 85.57 | 85.97 | 85.26 | 85.93 | | |
| | KoSimCSE-RoBERTa-multitask | 85.77 | 85.08 | 86.12 | 85.84 | 86.12 | 85.83 | 86.12 | 85.03 | 85.99 | |