Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

moonshotai
/
Kimi-Audio-7B-Instruct

Text-to-Speech
KimiAudio
Safetensors
English
Chinese
audio
audio-language-model
speech-recognition
audio-understanding
audio-generation
chat
custom_code
Model card Files Files and versions
xet
Community
21

Instructions to use moonshotai/Kimi-Audio-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • KimiAudio

    How to use moonshotai/Kimi-Audio-7B-Instruct with KimiAudio:

    # Example usage for KimiAudio
    # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git
    
    from kimia_infer.api.kimia import KimiAudio
    
    model = KimiAudio(model_path="moonshotai/Kimi-Audio-7B-Instruct", load_detokenizer=True)
    
    sampling_params = {
        "audio_temperature": 0.8,
        "audio_top_k": 10,
        "text_temperature": 0.0,
        "text_top_k": 5,
    }
    
    # For ASR
    asr_audio = "asr_example.wav"
    messages_asr = [
        {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
        {"role": "user", "message_type": "audio", "content": asr_audio}
    ]
    _, text = model.generate(messages_asr, **sampling_params, output_type="text")
    print(text)
    
    # For Q&A
    qa_audio = "qa_example.wav"
    messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}]
    wav, text = model.generate(messages_conv, **sampling_params, output_type="both")
    sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000)
    print(text)
    
  • Notebooks
  • Google Colab
  • Kaggle
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

Free studio vocal data for Kimi Audio vocal pipeline benchmarking

#21 opened 25 days ago by
MachineAI87

Add Kimi-Audio EOS and pad token ids

2
#20 opened 3 months ago by
tunglinwood

Kaggle code needs update

#19 opened 10 months ago by
elijahross

Fix incorrect unk_id assignment

#16 opened 12 months ago by
codecho

Request: DOI

#14 opened about 1 year ago by
huseyinyolcu

supported languages?

👍 1
1
#12 opened about 1 year ago by
nononameneeded2001

About the weight files of the Whisper Encoder

1
#11 opened about 1 year ago by
codecho

how can I fine tune this for farsi?

#10 opened about 1 year ago by
uncleMehrzad

Cannot Run Model in Hugging Face Spaces: AutoProcessor/Processor Not Found

#9 opened about 1 year ago by
ranagame

Будет ли поддержка Русского языка?

#8 opened about 1 year ago by
fduches2

A video on how to set up this in a Colab notebook

1
#7 opened about 1 year ago by
ritheshSree

Vocoder Architecture?

#6 opened about 1 year ago by
yukiarimo

Base model?

1
#4 opened about 1 year ago by
deltanym

Issue with long audio (~1 min) output, or prompt instruct following

👀 1
2
#2 opened about 1 year ago by
JosephusCheung

Update correct task tag

1
#1 opened about 1 year ago by
reach-vb
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs