🎯 MangaLens - Manga Speech Bubble Segmentation

A high-performance YOLO11n instance segmentation model fine-tuned for detecting and segmenting speech bubbles in manga/comic images.

�️ Demo Results

Detection on Various Manga Styles

Speech bubble detection on action manga with multiple bubbles


Detection on slice-of-life manga style

�📊 Model Performance

Final Evaluation Results (Epoch 44)

Metric	Box Detection	Mask Segmentation
Precision	97.55%	97.66%
Recall	97.03%	97.15%
mAP@50	99.10%	99.13%
mAP@50-95	96.67%	94.69%

Training Curves

Left: Segmentation Loss (Train vs Val) | Right: Mask mAP Metrics over epochs

Loss Type	Final Value
Box Loss	0.2499
Segmentation Loss	0.2762
Classification Loss	0.2109
DFL Loss	0.8064

🎓 Training Configuration

Parameter	Value
Base Model	`yolo11n-seg.pt`
Image Size	1600×1600
Batch Size	8
Epochs	100 (Early stopped at 44)
Optimizer	Auto (AdamW)
Learning Rate	0.01
Weight Decay	0.0005
Patience	10
AMP	Enabled

Data Augmentation

HSV Augmentation: H=0.015, S=0.7, V=0.4
Mosaic: 1.0
Flip Left-Right: 0.5
Scale: 0.5
Translate: 0.1

📚 Training Data

This model was trained on a combined dataset of:

MS92/MangaSegmentation - Manga panel and bubble segmentation dataset
Manga109 - Large-scale manga dataset with speech bubble annotations

🚀 Quick Start

Installation

pip install ultralytics>=8.0.0

Inference

from ultralytics import YOLO

# Load the model
model = YOLO("best.pt")

# Run inference on an image
results = model("manga_page.jpg")

# Process results
for result in results:
    # Get bounding boxes
    boxes = result.boxes
    
    # Get segmentation masks
    masks = result.masks
    
    # Visualize results
    result.show()
    
    # Save results
    result.save("output.jpg")

Batch Processing

from ultralytics import YOLO
from pathlib import Path

model = YOLO("best.pt")

# Process multiple images
image_folder = Path("manga_pages/")
results = model(list(image_folder.glob("*.jpg")), stream=True)

for i, result in enumerate(results):
    result.save(f"output_{i}.jpg")

Extract Bubble Regions

import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO("best.pt")
image = cv2.imread("manga_page.jpg")
results = model(image)[0]

# Extract each bubble as a separate image
for i, mask in enumerate(results.masks.data):
    mask_np = mask.cpu().numpy()
    mask_resized = cv2.resize(mask_np, (image.shape[1], image.shape[0]))
    
    # Apply mask
    bubble = image.copy()
    bubble[mask_resized < 0.5] = 0
    
    # Get bounding box and crop
    coords = np.where(mask_resized >= 0.5)
    if len(coords[0]) > 0:
        y_min, y_max = coords[0].min(), coords[0].max()
        x_min, x_max = coords[1].min(), coords[1].max()
        cropped = bubble[y_min:y_max, x_min:x_max]
        cv2.imwrite(f"bubble_{i}.png", cropped)

📁 Model Files

weights/
├── best.pt      # Best checkpoint (recommended)
└── last.pt      # Last training checkpoint

🎯 Use Cases

Manga Translation: Automatically detect speech bubbles for text extraction and translation
Manga Analysis: Study panel layouts and dialogue distribution
Content Moderation: Identify and process text regions in comics
Accessibility: Enable text-to-speech for manga readers
Dataset Creation: Generate annotations for manga datasets

⚙️ Technical Details

Model Architecture

Backbone: YOLO11n (Nano variant)
Task: Instance Segmentation
Classes: 1 (Speech Bubble)
Input: RGB images (any size, recommended 1600×1600)
Output: Bounding boxes + Instance masks

Inference Speed

Device	Speed (ms/image)
GPU (T4)	~15-25 ms
GPU (V100)	~8-12 ms
CPU	~200-400 ms

📝 Citation

If you use this model in your research, please cite:

@misc{mangalens2024,
  title={MangaLens: YOLO11n Speech Bubble Segmentation Model},
  author={MangaLens Team},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/your-username/mangalens-bubble-segmentation}
}

📜 License

This model is released under the Apache 2.0 License.

🙏 Acknowledgements

Ultralytics for the YOLO framework
MS92/MangaSegmentation dataset
Manga109 dataset

Made with ❤️ for the manga community

Downloads last month: 95

huyvux3005
/

manga109-segmentation-bubble