Lance VOX β Speech Recognition Evolved π€
π Lance VOX is a custom-built speech recognition model, designed to transcribe spoken audio into text efficiently. Lance VOX is optimized for local and cloud inference with transformers and PyTorch.
π Key Features
- β Custom transformer-based ASR architecture
- β Supports Hugging Faceβs transformers library
- β Lightweight and optimized for fast transcription
- β Fully trainable on custom datasets
- β Designed to evolve with larger datasets and enhanced features
π₯ Installation & Setup
You can load Lance VOX using transformers:
from transformers import AutoTokenizer, AutoFeatureExtractor, AutoModelForSeq2SeqLM import librosa
model_name = "NeuraCraft/LanceVox" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) feature_extractor = AutoFeatureExtractor.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True)
audio_path = "your_audio_file.wav" speech_array, sr = librosa.load(audio_path, sr=16000) input_features = feature_extractor(speech_array, sampling_rate=sr, return_tensors="pt").input_features
generated_ids = model.generate(input_features) transcription = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(transcription)
π How to Use Lance VOX
1οΈβ£ Direct Audio Transcription
audio_path = "sample.wav" speech_array, sr = librosa.load(audio_path, sr=16000) inputs = feature_extractor(speech_array, sampling_rate=sr, return_tensors="pt").input_features output_ids = model.generate(inputs) transcription = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0] print(transcription)
2οΈβ£ Fine-tuning for Custom Datasets
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments( output_dir="./lance_vox_finetuned", per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=3, save_steps=500, )
trainer = Seq2SeqTrainer( model=model, args=training_args, train_dataset=your_dataset, eval_dataset=your_eval_dataset, tokenizer=tokenizer, )
trainer.train()
π Performance & Evaluation
Lance VOX is in active development. Early evaluations focus on:
πΉ Word Error Rate (WER) β Measures transcription accuracy
πΉ Real-time transcription latency
πΉ Token prediction accuracy from audio
β Planned Enhancements
πΉ Support for multi-language transcription
πΉ Integration with real-time audio streams
πΉ Larger, more diverse training datasets
πΉ Multimodal input with text & speech
π Future Roadmap
Lance VOX is designed to evolve alongside Lance AI:
Planned Features:
π Real-time transcription with minimal latency
π Continuous learning from user corrections
π Multilingual support
π Advanced integration with Lance AI for speech-to-text and command execution
π Development & Contributions
Lance VOX is developed by NeuraCraft. Contributions, dataset suggestions, and feedback are welcome!
Contact & Updates: Developer: NeuraCraft Project Status: π§ In Development