|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- moonshotai/Kimi-Audio-7B-Instruct |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
# Kimi-Audio Whisper Encoder |
|
|
|
|
|
Kimi-Audioでファインチューニングされたwhisperエンコーダー。音声から連続的な音響特徴量を抽出。 |
|
|
|
|
|
## Model Info |
|
|
|
|
|
- **Base**: whisper-large-v3 |
|
|
- **Hidden Size**: 1280 |
|
|
- **Original**: [moonshotai/Kimi-Audio-7B-Instruct](https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct) |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers librosa torch |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Using Transformers (Recommended) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import librosa |
|
|
from transformers import WhisperModel |
|
|
|
|
|
# Load model |
|
|
model = WhisperModel.from_pretrained("Atotti/Kimi-Audio-Whisper-Encoder") |
|
|
model = model.encoder.to("cuda", dtype=torch.bfloat16) |
|
|
model.eval() |
|
|
|
|
|
# Load audio |
|
|
audio, sr = librosa.load("audio.wav", sr=16000) |
|
|
|
|
|
# Extract features using Whisper's feature extractor |
|
|
from transformers import WhisperFeatureExtractor |
|
|
feature_extractor = WhisperFeatureExtractor.from_pretrained("Atotti/Kimi-Audio-Whisper-Encoder") |
|
|
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt") |
|
|
input_features = inputs.input_features.to("cuda", dtype=torch.bfloat16) |
|
|
|
|
|
# Get encoder output |
|
|
with torch.no_grad(): |
|
|
encoder_output = model(input_features) |
|
|
features = encoder_output.last_hidden_state # [1, T, 1280] |
|
|
|
|
|
print(f"Features shape: {features.shape}") |
|
|
``` |
|
|
|
|
|
### Pooled Features |
|
|
|
|
|
```python |
|
|
# Mean pooling for utterance-level embedding |
|
|
pooled = features.mean(dim=1) # [1, 1280] |
|
|
``` |
|
|
|
|
|
## Output |
|
|
|
|
|
- **Sequential features**: `[batch, time_steps, 1280]` - 時系列特徴量 |
|
|
- **Pooled features**: `[batch, 1280]` - 発話レベル特徴量 |
|
|
|
|
|
## License |
|
|
|
|
|
See [moonshotai/Kimi-Audio-7B-Instruct](https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct) for license information. |
|
|
|