🎯 MangaLens - Manga Speech Bubble Segmentation

Model Task mAP50 License

A high-performance YOLO11n instance segmentation model fine-tuned for detecting and segmenting speech bubbles in manga/comic images.

�️ Demo Results

Detection on Various Manga Styles
Demo 1
Speech bubble detection on action manga with multiple bubbles
Demo 2
Detection on slice-of-life manga style

�📊 Model Performance

Final Evaluation Results (Epoch 44)

Metric Box Detection Mask Segmentation
Precision 97.55% 97.66%
Recall 97.03% 97.15%
mAP@50 99.10% 99.13%
mAP@50-95 96.67% 94.69%

Training Curves

Training Curves Left: Segmentation Loss (Train vs Val) | Right: Mask mAP Metrics over epochs

Loss Type Final Value
Box Loss 0.2499
Segmentation Loss 0.2762
Classification Loss 0.2109
DFL Loss 0.8064

🎓 Training Configuration

Parameter Value
Base Model yolo11n-seg.pt
Image Size 1600×1600
Batch Size 8
Epochs 100 (Early stopped at 44)
Optimizer Auto (AdamW)
Learning Rate 0.01
Weight Decay 0.0005
Patience 10
AMP Enabled

Data Augmentation

  • HSV Augmentation: H=0.015, S=0.7, V=0.4
  • Mosaic: 1.0
  • Flip Left-Right: 0.5
  • Scale: 0.5
  • Translate: 0.1

📚 Training Data

This model was trained on a combined dataset of:

  1. MS92/MangaSegmentation - Manga panel and bubble segmentation dataset
  2. Manga109 - Large-scale manga dataset with speech bubble annotations

🚀 Quick Start

Installation

pip install ultralytics>=8.0.0

Inference

from ultralytics import YOLO

# Load the model
model = YOLO("best.pt")

# Run inference on an image
results = model("manga_page.jpg")

# Process results
for result in results:
    # Get bounding boxes
    boxes = result.boxes
    
    # Get segmentation masks
    masks = result.masks
    
    # Visualize results
    result.show()
    
    # Save results
    result.save("output.jpg")

Batch Processing

from ultralytics import YOLO
from pathlib import Path

model = YOLO("best.pt")

# Process multiple images
image_folder = Path("manga_pages/")
results = model(list(image_folder.glob("*.jpg")), stream=True)

for i, result in enumerate(results):
    result.save(f"output_{i}.jpg")

Extract Bubble Regions

import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO("best.pt")
image = cv2.imread("manga_page.jpg")
results = model(image)[0]

# Extract each bubble as a separate image
for i, mask in enumerate(results.masks.data):
    mask_np = mask.cpu().numpy()
    mask_resized = cv2.resize(mask_np, (image.shape[1], image.shape[0]))
    
    # Apply mask
    bubble = image.copy()
    bubble[mask_resized < 0.5] = 0
    
    # Get bounding box and crop
    coords = np.where(mask_resized >= 0.5)
    if len(coords[0]) > 0:
        y_min, y_max = coords[0].min(), coords[0].max()
        x_min, x_max = coords[1].min(), coords[1].max()
        cropped = bubble[y_min:y_max, x_min:x_max]
        cv2.imwrite(f"bubble_{i}.png", cropped)

📁 Model Files

weights/
├── best.pt      # Best checkpoint (recommended)
└── last.pt      # Last training checkpoint

🎯 Use Cases

  • Manga Translation: Automatically detect speech bubbles for text extraction and translation
  • Manga Analysis: Study panel layouts and dialogue distribution
  • Content Moderation: Identify and process text regions in comics
  • Accessibility: Enable text-to-speech for manga readers
  • Dataset Creation: Generate annotations for manga datasets

⚙️ Technical Details

Model Architecture

  • Backbone: YOLO11n (Nano variant)
  • Task: Instance Segmentation
  • Classes: 1 (Speech Bubble)
  • Input: RGB images (any size, recommended 1600×1600)
  • Output: Bounding boxes + Instance masks

Inference Speed

Device Speed (ms/image)
GPU (T4) ~15-25 ms
GPU (V100) ~8-12 ms
CPU ~200-400 ms

📝 Citation

If you use this model in your research, please cite:

@misc{mangalens2024,
  title={MangaLens: YOLO11n Speech Bubble Segmentation Model},
  author={MangaLens Team},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/your-username/mangalens-bubble-segmentation}
}

📜 License

This model is released under the Apache 2.0 License.

🙏 Acknowledgements


Made with ❤️ for the manga community
Downloads last month
95
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train huyvux3005/manga109-segmentation-bubble

Space using huyvux3005/manga109-segmentation-bubble 1