Lance VOX – Speech Recognition Evolved 🎀

πŸš€ Lance VOX is a custom-built speech recognition model, designed to transcribe spoken audio into text efficiently. Lance VOX is optimized for local and cloud inference with transformers and PyTorch.

🌟 Key Features

  • βœ… Custom transformer-based ASR architecture
  • βœ… Supports Hugging Face’s transformers library
  • βœ… Lightweight and optimized for fast transcription
  • βœ… Fully trainable on custom datasets
  • βœ… Designed to evolve with larger datasets and enhanced features

πŸ“₯ Installation & Setup

You can load Lance VOX using transformers:

from transformers import AutoTokenizer, AutoFeatureExtractor, AutoModelForSeq2SeqLM import librosa

model_name = "NeuraCraft/LanceVox" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) feature_extractor = AutoFeatureExtractor.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True)

audio_path = "your_audio_file.wav" speech_array, sr = librosa.load(audio_path, sr=16000) input_features = feature_extractor(speech_array, sampling_rate=sr, return_tensors="pt").input_features

generated_ids = model.generate(input_features) transcription = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(transcription)

πŸ›  How to Use Lance VOX

1️⃣ Direct Audio Transcription

audio_path = "sample.wav" speech_array, sr = librosa.load(audio_path, sr=16000) inputs = feature_extractor(speech_array, sampling_rate=sr, return_tensors="pt").input_features output_ids = model.generate(inputs) transcription = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0] print(transcription)

2️⃣ Fine-tuning for Custom Datasets

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments( output_dir="./lance_vox_finetuned", per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=3, save_steps=500, )

trainer = Seq2SeqTrainer( model=model, args=training_args, train_dataset=your_dataset, eval_dataset=your_eval_dataset, tokenizer=tokenizer, )

trainer.train()


πŸ“Š Performance & Evaluation

Lance VOX is in active development. Early evaluations focus on:

πŸ”Ή Word Error Rate (WER) – Measures transcription accuracy

πŸ”Ή Real-time transcription latency

πŸ”Ή Token prediction accuracy from audio

βœ… Planned Enhancements

πŸ”Ή Support for multi-language transcription

πŸ”Ή Integration with real-time audio streams

πŸ”Ή Larger, more diverse training datasets

πŸ”Ή Multimodal input with text & speech


πŸš€ Future Roadmap

Lance VOX is designed to evolve alongside Lance AI:

Planned Features:

πŸ”œ Real-time transcription with minimal latency

πŸ”œ Continuous learning from user corrections

πŸ”œ Multilingual support

πŸ”œ Advanced integration with Lance AI for speech-to-text and command execution


πŸ— Development & Contributions

Lance VOX is developed by NeuraCraft. Contributions, dataset suggestions, and feedback are welcome!

Contact & Updates: Developer: NeuraCraft Project Status: 🚧 In Development

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support