NeuraCraft commited on
Commit
35d23e6
Β·
1 Parent(s): e90f847

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ model_index:
4
+ - name: Lance ASR
5
+ results: []
6
+ tags:
7
+ - automatic-speech-recognition
8
+ - asr
9
+ - pytorch
10
+ - transformer
11
+ - lance-ai
12
+ license: apache-2.0
13
+ ---
14
+
15
+ # Lance ASR – The Foundation of Speech Intelligence
16
+
17
+ πŸš€ **Lance ASR** is a custom-built Automatic Speech Recognition (ASR) model designed for high-efficiency local and cloud inference. It utilizes a Transformer Encoder-Decoder architecture with convolutional subsampling for processing acoustic features.
18
+
19
+ ## 🌟 Key Features
20
+
21
+ βœ… **Custom Architecture**: Not a Whisper clone; features a bespoke Conv1d-subsampling audio front-end.
22
+ βœ… **Hugging Face Compatible**: Fully integrates with `transformers` via `AutoModelForSeq2SeqLM`.
23
+ βœ… **Optimized for Precision**: Uses `bfloat16` for high-performance inference and training.
24
+ βœ… **Scalable Design**: Optimized for 768 hidden dims and 4 layers, balancing speed and accuracy.
25
+ βœ… **Seamless Tokenization**: Uses the `DWDMaiMai/tiktoken_cl100k_base` tokenizer for efficient text representation.
26
+
27
+ ---
28
+
29
+ ## πŸ“₯ Installation & Setup
30
+
31
+ Load Lance ASR directly from your local directory or the Hugging Face Hub:
32
+
33
+ ```python
34
+ import torch
35
+ from transformers import AutoTokenizer, AutoFeatureExtractor, AutoModelForSeq2SeqLM
36
+
37
+ model_name = "NeuraCraft/Lance-ASR"
38
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
39
+ feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
40
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
41
+ ```
42
+
43
+ ---
44
+
45
+ ## πŸ›  Usage Example
46
+
47
+ Lance ASR can transcribe audio by processing log-mel spectrograms:
48
+
49
+ ```python
50
+ # 1. Prepare audio features (e.g., from a .wav file)
51
+ # inputs = feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
52
+
53
+ # 2. Generate transcription
54
+ model.eval()
55
+ with torch.no_grad():
56
+ generated_ids = model.generate(
57
+ inputs.input_features.to(torch.bfloat16),
58
+ max_new_tokens=250,
59
+ pad_token_id=tokenizer.eos_token_id
60
+ )
61
+
62
+ transcription = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
63
+ print(f"Transcription: {transcription}")
64
+ ```
65
+
66
+ ---
67
+
68
+ ## πŸ“Š Model Architecture
69
+
70
+ Lance ASR is built on a robust Transformer backbone:
71
+ - **Audio Front-end**: Dual `Conv1d` layers with GELU activation and stride-2 subsampling.
72
+ - **Encoder**: 4-layer `TransformerEncoder` with 12 attention heads.
73
+ - **Decoder**: 4-layer `TransformerDecoder` with cross-attention to encoder states.
74
+ - **Hidden Size**: 768
75
+ - **Vocab Size**: ~100k (Tiktoken)
76
+
77
+ ---
78
+
79
+ ## πŸš€ Training
80
+
81
+ The model is trained using the `PolyAI/minds14` dataset (or custom datasets) using the Hugging Face `Trainer` API. The training script (`main.py`) supports `bf16` and automatic uploading to the Hugging Face Hub.
82
+
83
+ ```bash
84
+ python main.py
85
+ ```
86
+
87
+ ---
88
+
89
+ ## πŸ— Development & Contributions
90
+
91
+ Lance ASR is developed by **NeuraCraft**. We welcome contributions to improve the efficiency and accuracy of the model!
92
+
93
+ **Project Status**: 🚧 In Active Development
94
+ **Developer**: NeuraCraft