Bombek1
/

whisper-tiny-encoder-litert

Automatic Speech Recognition

Model card Files Files and versions

whisper-tiny-encoder-litert / README.md

Bombek1's picture

Update README.md

d63496b verified 5 days ago

|

history blame contribute delete

3.35 kB

	---
	tags:
	- whisper
	- speech
	- audio
	- litert
	- tflite
	- edge
	- on-device
	license: mit
	base_model: openai/whisper-tiny
	pipeline_tag: automatic-speech-recognition
	---

	# whisper-tiny - LiteRT

	This is a [LiteRT](https://ai.google.dev/edge/litert) (formerly TensorFlow Lite) conversion of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) for efficient on-device inference.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Original Model \| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) \|
	\| Format \| LiteRT (.tflite) \|
	\| File Size \| 31.4 MB \|
	\| Task \| Speech Recognition (Encoder Only) \|
	\| Max Sequence Length \| 3000 \|
	\| Output Dimension \| 384 \|
	\| Pooling Mode \| N/A (Encoder output) \|

	## Performance

	Benchmarked on AMD CPU (WSL2):

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Inference Latency \| 144.7 ms \|
	\| Throughput \| 6.9/sec \|
	\| Cosine Similarity vs Original \| 1.0000 ✅ \|

	## Quick Start

	```python
	import numpy as np
	from ai_edge_litert.interpreter import Interpreter
	from transformers import WhisperProcessor
	import librosa

	# Load model
	interpreter = Interpreter(model_path="openai_whisper-tiny_encoder.tflite")
	interpreter.allocate_tensors()
	input_details = interpreter.get_input_details()
	output_details = interpreter.get_output_details()

	# Load processor
	processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")

	def encode_audio(audio_path: str) -> np.ndarray:
	"""Extract encoder features from audio file."""
	audio, sr = librosa.load(audio_path, sr=16000)
	input_features = processor(audio, sampling_rate=16000, return_tensors="np").input_features

	interpreter.set_tensor(input_details[0]["index"], input_features.astype(np.float32))
	interpreter.invoke()

	return interpreter.get_tensor(output_details[0]["index"])

	# Example
	# features = encode_audio("audio.wav")
	```

	Note: This is the encoder-only model. For full ASR, you need the decoder as well.

	## Files

	- `openai_whisper-tiny_encoder.tflite` - The LiteRT model file

	## Conversion Details

	- Conversion Tool: [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch)
	- Conversion Date: 2026-01-12
	- Source Framework: PyTorch → LiteRT
	- Validation: Cosine similarity 1.0000 vs original

	## Intended Use

	- Mobile Applications: On-device semantic search, RAG systems
	- Edge Devices: IoT, embedded systems, Raspberry Pi
	- Offline Processing: Privacy-preserving inference
	- Low-latency Applications: Real-time processing

	## Limitations

	- Fixed sequence length (3000 tokens)
	- CPU inference (GPU delegate requires setup)
	- Tokenizer loaded separately from original model
	- Float32 precision

	## License

	This model inherits the license from the original:
	- License: MIT ([source](https://huggingface.co/openai/whisper-tiny))

	## Citation

	```bibtex
	@misc{radford2022whisper,
	title={Robust Speech Recognition via Large-Scale Weak Supervision},
	author={Alec Radford and Jong Wook Kim and others},
	year={2022},
	eprint={2212.04356},
	archivePrefix={arXiv},
	}
	```

	## Acknowledgments

	- Original model by [openai](https://huggingface.co/openai)
	- Conversion using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch)

	---

	Converted by [Bombek1](https://huggingface.co/Bombek1)