Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
lightonai
's Collections
DenseOn & LateOn
LightOnOCR-2 🦉
ColBERT-Zero 🐶
LateOn-Code 💻
OriOn 💫
PyLate 🐕
LightOnOCR 🦉
Embeddings datasets ⚡️
Ettin
ModernBERT
PAGnol 🇫🇷
RITA 🧿
Mamba 🐍
ArabicWeb24-ablation-models
Embeddings datasets ⚡️
updated
Apr 7
This collection gather datasets for embeddings pre-training and fine-tuning.
Upvote
5
lightonai/embeddings-pre-training
Viewer
•
Updated
Apr 16
•
1.38B
•
7.1k
•
44
lightonai/nanobeir-multilingual
Viewer
•
Updated
Sep 16, 2025
•
522k
•
632
•
11
lightonai/embeddings_supervised
Viewer
•
Updated
Oct 23, 2025
•
3.43M
•
329
•
10
Upvote
5
+1
Share collection
View history
Collection guide
Browse collections