NeuraCraft
/

Lance-ASR

Automatic Speech Recognition

text2text-generation

Model card Files Files and versions

NeuraCraft commited on 10 days ago

Commit

35d23e6

·

1 Parent(s): e90f847

Create README.md

Files changed (1) hide show

README.md +94 -0

README.md ADDED Viewed

	@@ -0,0 +1,94 @@

+---
+library_name: transformers
+model_index:
+- name: Lance ASR
+  results: []
+tags:
+- automatic-speech-recognition
+- asr
+- pytorch
+- transformer
+- lance-ai
+license: apache-2.0
+---
+# Lance ASR – The Foundation of Speech Intelligence
+🚀 **Lance ASR** is a custom-built Automatic Speech Recognition (ASR) model designed for high-efficiency local and cloud inference. It utilizes a Transformer Encoder-Decoder architecture with convolutional subsampling for processing acoustic features.
+## 🌟 Key Features
+✅ **Custom Architecture**: Not a Whisper clone; features a bespoke Conv1d-subsampling audio front-end.
+✅ **Hugging Face Compatible**: Fully integrates with `transformers` via `AutoModelForSeq2SeqLM`.
+✅ **Optimized for Precision**: Uses `bfloat16` for high-performance inference and training.
+✅ **Scalable Design**: Optimized for 768 hidden dims and 4 layers, balancing speed and accuracy.
+✅ **Seamless Tokenization**: Uses the `DWDMaiMai/tiktoken_cl100k_base` tokenizer for efficient text representation.
+---
+## 📥 Installation & Setup
+Load Lance ASR directly from your local directory or the Hugging Face Hub:
+```python
+import torch
+from transformers import AutoTokenizer, AutoFeatureExtractor, AutoModelForSeq2SeqLM
+model_name = "NeuraCraft/Lance-ASR"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
+```
+---
+## 🛠 Usage Example
+Lance ASR can transcribe audio by processing log-mel spectrograms:
+```python
+# 1. Prepare audio features (e.g., from a .wav file)
+# inputs = feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
+# 2. Generate transcription
+model.eval()
+with torch.no_grad():
+    generated_ids = model.generate(
+        inputs.input_features.to(torch.bfloat16),
+        max_new_tokens=250,
+        pad_token_id=tokenizer.eos_token_id
+    )
+transcription = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
+print(f"Transcription: {transcription}")
+```
+---
+## 📊 Model Architecture
+Lance ASR is built on a robust Transformer backbone:
+- **Audio Front-end**: Dual `Conv1d` layers with GELU activation and stride-2 subsampling.
+- **Encoder**: 4-layer `TransformerEncoder` with 12 attention heads.
+- **Decoder**: 4-layer `TransformerDecoder` with cross-attention to encoder states.
+- **Hidden Size**: 768
+- **Vocab Size**: ~100k (Tiktoken)
+---
+## 🚀 Training
+The model is trained using the `PolyAI/minds14` dataset (or custom datasets) using the Hugging Face `Trainer` API. The training script (`main.py`) supports `bf16` and automatic uploading to the Hugging Face Hub.
+```bash
+python main.py
+```
+---
+## 🏗 Development & Contributions
+Lance ASR is developed by **NeuraCraft**. We welcome contributions to improve the efficiency and accuracy of the model!
+**Project Status**: 🚧 In Active Development
+**Developer**: NeuraCraft