Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
10
This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large-instruct on the measuring-embeddings-v3 dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Lauther/measuring-embeddings-v3-multilingual-e5-large-instruct-20e")
# Run inference
sentences = [
'What is the table structure for secondary equipment?',
'How are flow computers and measurement systems related?\nFlow computers can have multiple systems assigned to them. However, a measurement system can only be assigned to one flow computer.\n\nDatabase terminology:\nIn the database, this relationship is referred to as:\n- Meter streams\n- Meter runs\n- Sections\n\nStorage of the relationship:\nThe relationship between a flow computer and its assigned measurement system is stored in a special table.\n\nUser context:\nWhen a user refers to a "meter stream," they are indicating that they are searching for a measurement system assigned to a specific flow computer.',
'What kind of data store an equipment?\nEquipments can capture meteorological data, such as pressure, temperature, and volume (magnitudes). This data is essential for users to perform various calculations.\n\nData storage:\n- The measured values are stored in a special table in the database for magnitudes. This table contains the values of the variables captured by the equipments.\n- These values are **direct measurements** from the fluid (e.g., raw pressure, temperature, or volume readings). **They are not calculated values**, such as uncertainty.\n- The values stored in the variable values table are **different** from variable uncertainty values, which are calculated separately and represent the margin of error.\n\nAccessing the data:\n- Users typically access the data by referring to the readings from the measurement system, not directly from the individual equipments.\n- The readings are stored in a "variable values" table within the database.\n\nLinking variable names:\nIf the user needs to know the name of a variable, they must link the data to another table that stores information about the types of variables.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
How can I combine the sub-query with the main query to fetch the last uncertainty report? |
What do measurement equipment measure? |
0.1 |
What is the column name for the calibration date in the calibration table? |
How are flow computers and measurement systems related? |
0.1 |
What is the name of the table that contains the flow computer tags? |
What is equipment calibration? |
0.05 |
CoSENTLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
}
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
Identify any additional tables or columns that might be needed for the query. |
How are flow computers and measurement systems related? |
0.2 |
What columns in these tables contain the measurement system tag and the flow computer tag? |
How does a flow computer generate and store reports? |
0.1 |
Identify the column that stores the calibration number. |
What kind of data store an equipment? |
0.1 |
CoSENTLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 7per_device_eval_batch_size: 7gradient_accumulation_steps: 4learning_rate: 3e-05num_train_epochs: 20warmup_ratio: 0.1overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 7per_device_eval_batch_size: 7per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 3e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 20max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 9.5153 | 2560 | 6.782 | - |
| 9.5524 | 2570 | 7.3027 | - |
| 9.5894 | 2580 | 7.3348 | - |
| 9.6265 | 2590 | 7.7864 | - |
| 9.6636 | 2600 | 6.3552 | - |
| 9.7006 | 2610 | 7.151 | - |
| 9.7377 | 2620 | 6.1664 | - |
| 9.7748 | 2630 | 6.0398 | - |
| 9.8119 | 2640 | 7.0452 | - |
| 9.8489 | 2650 | 7.2457 | - |
| 9.8860 | 2660 | 6.7531 | - |
| 9.9231 | 2670 | 6.7149 | - |
| 9.9601 | 2680 | 6.4635 | - |
| 9.9972 | 2690 | 6.2237 | - |
| 10.0371 | 2700 | 6.1798 | 2.9939 |
| 10.0741 | 2710 | 7.2224 | - |
| 10.1112 | 2720 | 6.5327 | - |
| 10.1483 | 2730 | 7.4686 | - |
| 10.1854 | 2740 | 6.1404 | - |
| 10.2224 | 2750 | 7.0005 | - |
| 10.2595 | 2760 | 5.7726 | - |
| 10.2966 | 2770 | 6.5327 | - |
| 10.3336 | 2780 | 7.5015 | - |
| 10.3707 | 2790 | 6.5526 | - |
| 10.4078 | 2800 | 6.2078 | - |
| 10.4449 | 2810 | 6.1 | - |
| 10.4819 | 2820 | 7.1027 | - |
| 10.5190 | 2830 | 8.639 | - |
| 10.5561 | 2840 | 6.9937 | - |
| 10.5931 | 2850 | 7.2734 | 2.8532 |
| 10.6302 | 2860 | 7.6321 | - |
| 10.6673 | 2870 | 7.5788 | - |
| 10.7044 | 2880 | 6.7864 | - |
| 10.7414 | 2890 | 7.4237 | - |
| 10.7785 | 2900 | 6.9813 | - |
| 10.8156 | 2910 | 6.6884 | - |
| 10.8526 | 2920 | 6.7464 | - |
| 10.8897 | 2930 | 7.7989 | - |
| 10.9268 | 2940 | 7.3568 | - |
| 10.9639 | 2950 | 8.6706 | - |
| 11.0 | 2960 | 6.5687 | - |
| 11.0371 | 2970 | 5.8992 | - |
| 11.0741 | 2980 | 6.4543 | - |
| 11.1112 | 2990 | 6.1386 | - |
| 11.1483 | 3000 | 6.9047 | 2.9147 |
| 11.1854 | 3010 | 7.405 | - |
| 11.2224 | 3020 | 7.5441 | - |
| 11.2595 | 3030 | 6.7524 | - |
| 11.2966 | 3040 | 7.698 | - |
| 11.3336 | 3050 | 7.6167 | - |
| 11.3707 | 3060 | 7.1516 | - |
| 11.4078 | 3070 | 6.7458 | - |
| 11.4449 | 3080 | 6.7608 | - |
| 11.4819 | 3090 | 7.1508 | - |
| 11.5190 | 3100 | 6.9155 | - |
| 11.5561 | 3110 | 6.6664 | - |
| 11.5931 | 3120 | 8.3841 | - |
| 11.6302 | 3130 | 7.1934 | - |
| 11.6673 | 3140 | 6.9681 | - |
| 11.7044 | 3150 | 7.2187 | 2.7509 |
| 11.7414 | 3160 | 7.3155 | - |
| 11.7785 | 3170 | 7.3103 | - |
| 11.8156 | 3180 | 7.1959 | - |
| 11.8526 | 3190 | 6.8164 | - |
| 11.8897 | 3200 | 7.5836 | - |
| 11.9268 | 3210 | 5.2671 | - |
| 11.9639 | 3220 | 6.4929 | - |
| 12.0 | 3230 | 7.0892 | - |
| 12.0371 | 3240 | 7.0877 | - |
| 12.0741 | 3250 | 5.8302 | - |
| 12.1112 | 3260 | 5.6145 | - |
| 12.1483 | 3270 | 6.5808 | - |
| 12.1854 | 3280 | 6.6826 | - |
| 12.2224 | 3290 | 5.9819 | - |
| 12.2595 | 3300 | 6.68 | 3.0175 |
| 12.2966 | 3310 | 6.1685 | - |
| 12.3336 | 3320 | 6.4473 | - |
| 12.3707 | 3330 | 6.3965 | - |
| 12.4078 | 3340 | 6.6278 | - |
| 12.4449 | 3350 | 5.4575 | - |
| 12.4819 | 3360 | 7.3019 | - |
| 12.5190 | 3370 | 7.4843 | - |
| 12.5561 | 3380 | 6.709 | - |
| 12.5931 | 3390 | 6.7168 | - |
| 12.6302 | 3400 | 7.0223 | - |
| 12.6673 | 3410 | 6.5089 | - |
| 12.7044 | 3420 | 6.5094 | - |
| 12.7414 | 3430 | 7.2317 | - |
| 12.7785 | 3440 | 6.6885 | - |
| 12.8156 | 3450 | 6.9693 | 2.8462 |
| 12.8526 | 3460 | 6.8242 | - |
| 12.8897 | 3470 | 6.6899 | - |
| 12.9268 | 3480 | 6.9113 | - |
| 12.9639 | 3490 | 7.1903 | - |
| 13.0 | 3500 | 7.3286 | - |
| 13.0371 | 3510 | 6.5465 | - |
| 13.0741 | 3520 | 5.6804 | - |
| 13.1112 | 3530 | 5.6412 | - |
| 13.1483 | 3540 | 6.6161 | - |
| 13.1854 | 3550 | 5.761 | - |
| 13.2224 | 3560 | 5.5669 | - |
| 13.2595 | 3570 | 5.6184 | - |
| 13.2966 | 3580 | 6.2996 | - |
| 13.3336 | 3590 | 4.99 | - |
| 13.3707 | 3600 | 5.9974 | 3.2358 |
| 13.4078 | 3610 | 5.6962 | - |
| 13.4449 | 3620 | 6.3662 | - |
| 13.4819 | 3630 | 7.0398 | - |
| 13.5190 | 3640 | 7.7358 | - |
| 13.5561 | 3650 | 7.9063 | - |
| 13.5931 | 3660 | 5.7823 | - |
| 13.6302 | 3670 | 6.9861 | - |
| 13.6673 | 3680 | 7.2855 | - |
| 13.7044 | 3690 | 5.6785 | - |
| 13.7414 | 3700 | 6.4071 | - |
| 13.7785 | 3710 | 6.4294 | - |
| 13.8156 | 3720 | 6.0842 | - |
| 13.8526 | 3730 | 5.9422 | - |
| 13.8897 | 3740 | 7.0778 | - |
| 13.9268 | 3750 | 8.1597 | 3.0093 |
| 13.9639 | 3760 | 6.3154 | - |
| 14.0 | 3770 | 6.2416 | - |
| 14.0371 | 3780 | 5.9958 | - |
| 14.0741 | 3790 | 5.7032 | - |
| 14.1112 | 3800 | 4.9524 | - |
| 14.1483 | 3810 | 5.386 | - |
| 14.1854 | 3820 | 5.6353 | - |
| 14.2224 | 3830 | 5.0873 | - |
| 14.2595 | 3840 | 4.9255 | - |
| 14.2966 | 3850 | 5.1423 | - |
| 14.3336 | 3860 | 6.0775 | - |
| 14.3707 | 3870 | 4.5073 | - |
| 14.4078 | 3880 | 6.8347 | - |
| 14.4449 | 3890 | 6.5397 | - |
| 14.4819 | 3900 | 7.2143 | 3.3080 |
| 14.5190 | 3910 | 6.1123 | - |
| 14.5561 | 3920 | 6.6048 | - |
| 14.5931 | 3930 | 6.3464 | - |
| 14.6302 | 3940 | 6.3618 | - |
| 14.6673 | 3950 | 6.5718 | - |
| 14.7044 | 3960 | 5.9785 | - |
| 14.7414 | 3970 | 6.5758 | - |
| 14.7785 | 3980 | 6.4308 | - |
| 14.8156 | 3990 | 6.0208 | - |
| 14.8526 | 4000 | 6.0303 | - |
| 14.8897 | 4010 | 6.6396 | - |
| 14.9268 | 4020 | 6.0184 | - |
| 14.9639 | 4030 | 6.6248 | - |
| 15.0 | 4040 | 6.4538 | - |
| 15.0371 | 4050 | 6.4742 | 3.1761 |
| 15.0741 | 4060 | 5.5295 | - |
| 15.1112 | 4070 | 6.8753 | - |
| 15.1483 | 4080 | 5.639 | - |
| 15.1854 | 4090 | 5.6232 | - |
| 15.2224 | 4100 | 6.3026 | - |
| 15.2595 | 4110 | 6.1182 | - |
| 15.2966 | 4120 | 5.4736 | - |
| 15.3336 | 4130 | 6.2961 | - |
| 15.3707 | 4140 | 5.4742 | - |
| 15.4078 | 4150 | 5.4707 | - |
| 15.4449 | 4160 | 4.7272 | - |
| 15.4819 | 4170 | 6.1026 | - |
| 15.5190 | 4180 | 5.0468 | - |
| 15.5561 | 4190 | 5.5796 | - |
| 15.5931 | 4200 | 6.9046 | 3.1433 |
| 15.6302 | 4210 | 5.6123 | - |
| 15.6673 | 4220 | 6.7246 | - |
| 15.7044 | 4230 | 5.7076 | - |
| 15.7414 | 4240 | 6.6772 | - |
| 15.7785 | 4250 | 5.6038 | - |
| 15.8156 | 4260 | 4.9544 | - |
| 15.8526 | 4270 | 5.0661 | - |
| 15.8897 | 4280 | 5.291 | - |
| 15.9268 | 4290 | 6.6652 | - |
| 15.9639 | 4300 | 5.6797 | - |
| 16.0 | 4310 | 5.1129 | - |
| 16.0371 | 4320 | 5.4445 | - |
| 16.0741 | 4330 | 4.8946 | - |
| 16.1112 | 4340 | 6.3929 | - |
| 16.1483 | 4350 | 6.0633 | 3.1426 |
| 16.1854 | 4360 | 5.522 | - |
| 16.2224 | 4370 | 4.7067 | - |
| 16.2595 | 4380 | 5.4688 | - |
| 16.2966 | 4390 | 5.6009 | - |
| 16.3336 | 4400 | 5.1376 | - |
| 16.3707 | 4410 | 4.5196 | - |
| 16.4078 | 4420 | 5.5109 | - |
| 16.4449 | 4430 | 5.1888 | - |
| 16.4819 | 4440 | 6.0305 | - |
| 16.5190 | 4450 | 5.2791 | - |
| 16.5561 | 4460 | 5.4005 | - |
| 16.5931 | 4470 | 5.255 | - |
| 16.6302 | 4480 | 6.2026 | - |
| 16.6673 | 4490 | 6.6388 | - |
| 16.7044 | 4500 | 5.6138 | 3.2812 |
| 16.7414 | 4510 | 4.7913 | - |
| 16.7785 | 4520 | 5.6675 | - |
| 16.8156 | 4530 | 5.8975 | - |
| 16.8526 | 4540 | 5.4597 | - |
| 16.8897 | 4550 | 5.137 | - |
| 16.9268 | 4560 | 4.5395 | - |
| 16.9639 | 4570 | 4.6304 | - |
| 17.0 | 4580 | 5.8098 | - |
| 17.0371 | 4590 | 4.0267 | - |
| 17.0741 | 4600 | 4.9194 | - |
| 17.1112 | 4610 | 4.1852 | - |
| 17.1483 | 4620 | 5.129 | - |
| 17.1854 | 4630 | 4.469 | - |
| 17.2224 | 4640 | 5.4298 | - |
| 17.2595 | 4650 | 4.5234 | 3.3447 |
| 17.2966 | 4660 | 4.6856 | - |
| 17.3336 | 4670 | 6.3431 | - |
| 17.3707 | 4680 | 5.347 | - |
| 17.4078 | 4690 | 4.9223 | - |
| 17.4449 | 4700 | 5.4404 | - |
| 17.4819 | 4710 | 4.916 | - |
| 17.5190 | 4720 | 6.1744 | - |
| 17.5561 | 4730 | 4.8039 | - |
| 17.5931 | 4740 | 5.2276 | - |
| 17.6302 | 4750 | 4.4189 | - |
| 17.6673 | 4760 | 4.1434 | - |
| 17.7044 | 4770 | 4.9443 | - |
| 17.7414 | 4780 | 5.6975 | - |
| 17.7785 | 4790 | 4.6667 | - |
| 17.8156 | 4800 | 4.9876 | 3.2924 |
| 17.8526 | 4810 | 4.4342 | - |
| 17.8897 | 4820 | 5.2595 | - |
| 17.9268 | 4830 | 5.6566 | - |
| 17.9639 | 4840 | 5.5452 | - |
| 18.0 | 4850 | 4.4986 | - |
| 18.0371 | 4860 | 4.8155 | - |
| 18.0741 | 4870 | 4.2278 | - |
| 18.1112 | 4880 | 5.4733 | - |
| 18.1483 | 4890 | 4.2394 | - |
| 18.1854 | 4900 | 5.1253 | - |
| 18.2224 | 4910 | 4.7498 | - |
| 18.2595 | 4920 | 4.9775 | - |
| 18.2966 | 4930 | 4.797 | - |
| 18.3336 | 4940 | 4.5694 | - |
| 18.3707 | 4950 | 4.6192 | 3.6615 |
| 18.4078 | 4960 | 5.8114 | - |
| 18.4449 | 4970 | 4.8035 | - |
| 18.4819 | 4980 | 4.6944 | - |
| 18.5190 | 4990 | 4.8664 | - |
| 18.5561 | 5000 | 4.6916 | - |
| 18.5931 | 5010 | 4.3352 | - |
| 18.6302 | 5020 | 5.9779 | - |
| 18.6673 | 5030 | 4.7813 | - |
| 18.7044 | 5040 | 4.632 | - |
| 18.7414 | 5050 | 4.7411 | - |
| 18.7785 | 5060 | 3.6489 | - |
| 18.8156 | 5070 | 4.5373 | - |
| 18.8526 | 5080 | 5.6129 | - |
| 18.8897 | 5090 | 4.8933 | - |
| 18.9268 | 5100 | 4.27 | 3.6957 |
| 18.9639 | 5110 | 4.5338 | - |
| 19.0 | 5120 | 5.5175 | - |
| 19.0371 | 5130 | 5.0835 | - |
| 19.0741 | 5140 | 4.6826 | - |
| 19.1112 | 5150 | 4.5391 | - |
| 19.1483 | 5160 | 5.3723 | - |
| 19.1854 | 5170 | 4.8095 | - |
| 19.2224 | 5180 | 4.7402 | - |
| 19.2595 | 5190 | 4.0488 | - |
| 19.2966 | 5200 | 3.6424 | - |
| 19.3336 | 5210 | 4.2256 | - |
| 19.3707 | 5220 | 4.4607 | - |
| 19.4078 | 5230 | 3.5702 | - |
| 19.4449 | 5240 | 4.3062 | - |
| 19.4819 | 5250 | 4.2919 | 3.6594 |
| 19.5190 | 5260 | 4.6985 | - |
| 19.5561 | 5270 | 4.6907 | - |
| 19.5931 | 5280 | 4.3865 | - |
| 19.6302 | 5290 | 3.9818 | - |
| 19.6673 | 5300 | 4.3166 | - |
| 19.7044 | 5310 | 4.9131 | - |
| 19.7414 | 5320 | 4.7641 | - |
| 19.7785 | 5330 | 5.419 | - |
| 19.8156 | 5340 | 4.068 | - |
| 19.8526 | 5350 | 4.1094 | - |
| 19.8897 | 5360 | 5.2279 | - |
| 19.9268 | 5370 | 4.4818 | - |
| 19.9639 | 5380 | 4.3103 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
Base model
intfloat/multilingual-e5-large-instruct