| license: mit | |
| <h1>UTMOSv2: UTokyo-SaruLab MOS Prediction System</h1> | |
| <table> | |
| <tr> | |
| <td> | |
| <a href="https://github.com/sarulab-speech/UTMOSv2"> | |
| <img src="https://img.shields.io/badge/dynamic/json.svg?label=GitHub&logo=github&style=flat&url=https://api.github.com/repos/sarulab-speech/UTMOSv2&query=$.stargazers_count&prefix=Stars%20&labelColor=181717" alt="GitHub"/> | |
| </a> | |
| </td> | |
| <td> | |
| <a href="https://huggingface.co/spaces/sarulab-speech/UTMOSv2"> | |
| <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" alt="Hugging Face Spaces"/> | |
| </a> | |
| </td> | |
| </tr> | |
| </table> | |
| <table> | |
| <tr> | |
| <td> | |
| <a href="http://arxiv.org/abs/2409.09305"> | |
| <img src="https://img.shields.io/badge/arXiv-2409.09305-b31b1b.svg" alt="arXiv"/> | |
| </a> | |
| </td> | |
| <td> | |
| <a href="https://github.com/sarulab-speech/UTMOSv2/blob/main/poster.pdf"> | |
| <img src="https://img.shields.io/badge/IEEE%20SLT%202024-Poster-blue.svg" alt="poster"/> | |
| </a> | |
| </td> | |
| <td> | |
| <a href="https://colab.research.google.com/github/sarulab-speech/UTMOSv2/blob/main/quickstart.ipynb"> | |
| <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | |
| </a> | |
| </td> | |
| </tr> | |
| </table> | |
| For more details, please refer to our GitHub repository: https://github.com/sarulab-speech/UTMOSv2 | |
| ## 🔖 Citation | |
| ```bibtex | |
| @inproceedings{baba2024utmosv2, | |
| title = {The T05 System for The {V}oice{MOS} {C}hallenge 2024: Transfer Learning from Deep Image Classifier to Naturalness {MOS} Prediction of High-Quality Synthetic Speech}, | |
| author = {Baba, Kaito and Nakata, Wataru and Saito, Yuki and Saruwatari, Hiroshi}, | |
| booktitle = {IEEE Spoken Language Technology Workshop (SLT)}, | |
| year = {2024}, | |
| } | |
| ``` | |