Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

zhangj1an
/
kimi_audio_7b_random

KimiAudio
Safetensors
vllm-omni
test-fixture
custom_code
Model card Files Files and versions
xet
Community

Instructions to use zhangj1an/kimi_audio_7b_random with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • KimiAudio

    How to use zhangj1an/kimi_audio_7b_random with KimiAudio:

    # Example usage for KimiAudio
    # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git
    
    from kimia_infer.api.kimia import KimiAudio
    
    model = KimiAudio(model_path="zhangj1an/kimi_audio_7b_random", load_detokenizer=True)
    
    sampling_params = {
        "audio_temperature": 0.8,
        "audio_top_k": 10,
        "text_temperature": 0.0,
        "text_top_k": 5,
    }
    
    # For ASR
    asr_audio = "asr_example.wav"
    messages_asr = [
        {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
        {"role": "user", "message_type": "audio", "content": asr_audio}
    ]
    _, text = model.generate(messages_asr, **sampling_params, output_type="text")
    print(text)
    
    # For Q&A
    qa_audio = "qa_example.wav"
    messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}]
    wav, text = model.generate(messages_conv, **sampling_params, output_type="both")
    sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000)
    print(text)
    
  • Notebooks
  • Google Colab
  • Kaggle
kimi_audio_7b_random
615 MB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 2 commits
zhangj1an's picture
zhangj1an
Upload folder using huggingface_hub
9bb5d49 verified 15 days ago
  • audio_detokenizer
    Upload folder using huggingface_hub 15 days ago
  • vocoder
    Upload folder using huggingface_hub 15 days ago
  • whisper-large-v3
    Upload folder using huggingface_hub 15 days ago
  • .gitattributes
    1.52 kB
    initial commit 15 days ago
  • .gitignore
    0 Bytes
    Upload folder using huggingface_hub 15 days ago
  • README.md
    1.89 kB
    Upload folder using huggingface_hub 15 days ago
  • config.json
    1.23 kB
    Upload folder using huggingface_hub 15 days ago
  • configuration_moonshot_kimia.py
    2.48 kB
    Upload folder using huggingface_hub 15 days ago
  • generation_config.json
    24 Bytes
    Upload folder using huggingface_hub 15 days ago
  • model.safetensors
    556 MB
    xet
    Upload folder using huggingface_hub 15 days ago
  • model.safetensors.index.json
    5.56 kB
    Upload folder using huggingface_hub 15 days ago
  • modeling_moonshot_kimia.py
    35.6 kB
    Upload folder using huggingface_hub 15 days ago
  • special_tokens_map.json
    13.4 kB
    Upload folder using huggingface_hub 15 days ago
  • tiktoken.model
    2.56 MB
    xet
    Upload folder using huggingface_hub 15 days ago
  • tokenization_kimia.py
    11.3 kB
    Upload folder using huggingface_hub 15 days ago
  • tokenizer_config.json
    113 kB
    Upload folder using huggingface_hub 15 days ago