🎯 MangaLens - Manga Speech Bubble Segmentation
A high-performance YOLO11n instance segmentation model fine-tuned for detecting and segmenting speech bubbles in manga/comic images.
�️ Demo Results
�📊 Model Performance
Final Evaluation Results (Epoch 44)
| Metric | Box Detection | Mask Segmentation |
|---|---|---|
| Precision | 97.55% | 97.66% |
| Recall | 97.03% | 97.15% |
| mAP@50 | 99.10% | 99.13% |
| mAP@50-95 | 96.67% | 94.69% |
Training Curves
| Loss Type | Final Value |
|---|---|
| Box Loss | 0.2499 |
| Segmentation Loss | 0.2762 |
| Classification Loss | 0.2109 |
| DFL Loss | 0.8064 |
🎓 Training Configuration
| Parameter | Value |
|---|---|
| Base Model | yolo11n-seg.pt |
| Image Size | 1600×1600 |
| Batch Size | 8 |
| Epochs | 100 (Early stopped at 44) |
| Optimizer | Auto (AdamW) |
| Learning Rate | 0.01 |
| Weight Decay | 0.0005 |
| Patience | 10 |
| AMP | Enabled |
Data Augmentation
- HSV Augmentation: H=0.015, S=0.7, V=0.4
- Mosaic: 1.0
- Flip Left-Right: 0.5
- Scale: 0.5
- Translate: 0.1
📚 Training Data
This model was trained on a combined dataset of:
- MS92/MangaSegmentation - Manga panel and bubble segmentation dataset
- Manga109 - Large-scale manga dataset with speech bubble annotations
🚀 Quick Start
Installation
pip install ultralytics>=8.0.0
Inference
from ultralytics import YOLO
# Load the model
model = YOLO("best.pt")
# Run inference on an image
results = model("manga_page.jpg")
# Process results
for result in results:
# Get bounding boxes
boxes = result.boxes
# Get segmentation masks
masks = result.masks
# Visualize results
result.show()
# Save results
result.save("output.jpg")
Batch Processing
from ultralytics import YOLO
from pathlib import Path
model = YOLO("best.pt")
# Process multiple images
image_folder = Path("manga_pages/")
results = model(list(image_folder.glob("*.jpg")), stream=True)
for i, result in enumerate(results):
result.save(f"output_{i}.jpg")
Extract Bubble Regions
import cv2
import numpy as np
from ultralytics import YOLO
model = YOLO("best.pt")
image = cv2.imread("manga_page.jpg")
results = model(image)[0]
# Extract each bubble as a separate image
for i, mask in enumerate(results.masks.data):
mask_np = mask.cpu().numpy()
mask_resized = cv2.resize(mask_np, (image.shape[1], image.shape[0]))
# Apply mask
bubble = image.copy()
bubble[mask_resized < 0.5] = 0
# Get bounding box and crop
coords = np.where(mask_resized >= 0.5)
if len(coords[0]) > 0:
y_min, y_max = coords[0].min(), coords[0].max()
x_min, x_max = coords[1].min(), coords[1].max()
cropped = bubble[y_min:y_max, x_min:x_max]
cv2.imwrite(f"bubble_{i}.png", cropped)
📁 Model Files
weights/
├── best.pt # Best checkpoint (recommended)
└── last.pt # Last training checkpoint
🎯 Use Cases
- Manga Translation: Automatically detect speech bubbles for text extraction and translation
- Manga Analysis: Study panel layouts and dialogue distribution
- Content Moderation: Identify and process text regions in comics
- Accessibility: Enable text-to-speech for manga readers
- Dataset Creation: Generate annotations for manga datasets
⚙️ Technical Details
Model Architecture
- Backbone: YOLO11n (Nano variant)
- Task: Instance Segmentation
- Classes: 1 (Speech Bubble)
- Input: RGB images (any size, recommended 1600×1600)
- Output: Bounding boxes + Instance masks
Inference Speed
| Device | Speed (ms/image) |
|---|---|
| GPU (T4) | ~15-25 ms |
| GPU (V100) | ~8-12 ms |
| CPU | ~200-400 ms |
📝 Citation
If you use this model in your research, please cite:
@misc{mangalens2024,
title={MangaLens: YOLO11n Speech Bubble Segmentation Model},
author={MangaLens Team},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/your-username/mangalens-bubble-segmentation}
}
📜 License
This model is released under the Apache 2.0 License.
🙏 Acknowledgements
- Ultralytics for the YOLO framework
- MS92/MangaSegmentation dataset
- Manga109 dataset
Made with ❤️ for the manga community
- Downloads last month
- 95


