Video-Text-to-Text
Transformers
Safetensors
qwen2_5_omni
text-to-audio
multimodal
video-captioning
audio-visual
ugc
Instructions to use openinterx/UGC-VideoCaptioner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openinterx/UGC-VideoCaptioner with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForTextToWaveform processor = AutoProcessor.from_pretrained("openinterx/UGC-VideoCaptioner") model = AutoModelForTextToWaveform.from_pretrained("openinterx/UGC-VideoCaptioner") - Notebooks
- Google Colab
- Kaggle
Improve model card: Add metadata tags, abstract, links, usage, and benchmarks
#1
by nielsr HF Staff - opened
This PR significantly enhances the model card for UGC-VideoCaptioner-3B by:
- Adding
pipeline_tag: video-text-to-textandlibrary_name: transformersto the metadata for better discoverability and Hub integration. - Including additional
tags(multimodal,video-captioning,audio-visual,ugc) to provide more descriptive categories. - Providing a detailed overview of the model based on its paper abstract.
- Adding direct links to the official Hugging Face paper page, the project's homepage, and the GitHub repository.
- Integrating visual elements such as the
tiktok_qa_sample.pngandbenchmark.pngfrom the repository. - Presenting the model's "Benchmark Results" table for easy reference.
- Adding a "Quick Start" guide with a Python code example for local inference using the
transformerslibrary. - Including "Evaluation" details and a BibTeX "Citation".
These additions make the model card more informative, user-friendly, and comprehensive on the Hugging Face Hub.
peiranW changed pull request status to merged