Improve model card: Add metadata tags, abstract, links, usage, and benchmarks

by nielsr HF Staff - opened Jul 17, 2025

←

This PR significantly enhances the model card for UGC-VideoCaptioner-3B by:

Adding pipeline_tag: video-text-to-text and library_name: transformers to the metadata for better discoverability and Hub integration.
Including additional tags (multimodal, video-captioning, audio-visual, ugc) to provide more descriptive categories.
Providing a detailed overview of the model based on its paper abstract.
Adding direct links to the official Hugging Face paper page, the project's homepage, and the GitHub repository.
Integrating visual elements such as the tiktok_qa_sample.png and benchmark.png from the repository.
Presenting the model's "Benchmark Results" table for easy reference.
Adding a "Quick Start" guide with a Python code example for local inference using the transformers library.
Including "Evaluation" details and a BibTeX "Citation".

These additions make the model card more informative, user-friendly, and comprehensive on the Hugging Face Hub.

peiranW changed pull request status to merged Jul 17, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment