|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
--- |
|
|
|
|
|
# wasm-32B-Instruct-V1 |
|
|
|
|
|
**wasm-32B-Instruct-V1** is a state-of-the-art instruction-tuned large language model developed by [wasmdashai](https://huggingface.co/wasmdashai). With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks, |
|
|
## π Introduction |
|
|
|
|
|
`wasm-32B-Instruct-V1` is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including: |
|
|
|
|
|
* π§ Code generation and debugging |
|
|
* π Long-context understanding |
|
|
* π£οΈ Multi-turn dialogue and reasoning |
|
|
* π Privacy-conscious edge deployments (e.g., via WebAssembly) |
|
|
|
|
|
This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency. |
|
|
|
|
|
## ποΈ Model Details |
|
|
|
|
|
* **Type**: Causal Language Model (Decoder-only) |
|
|
* **Parameters**: 32 Billion |
|
|
* **Training**: Pretraining + Instruction Fine-tuning |
|
|
* **Architecture**: Transformer with: |
|
|
|
|
|
* Rotary Position Embeddings (RoPE) |
|
|
* SwiGLU activation |
|
|
* RMSNorm |
|
|
* Attention with QKV bias |
|
|
* **Context Length**: Up to **32,768** tokens |
|
|
* **Extended Context Option**: Via `rope_scaling` (supports up to 128K with YaRN) |
|
|
* **Format**: Hugging Face Transformers-compatible |
|
|
|
|
|
## βοΈ Requirements |
|
|
|
|
|
To use this model, install the latest version of π€ `transformers` (>= 4.37.0 recommended): |
|
|
|
|
|
```bash |
|
|
pip install --upgrade transformers |
|
|
``` |
|
|
|
|
|
## π§ͺ Quickstart |
|
|
|
|
|
Here is a minimal example to load the model and generate a response: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_name = "wasmdashai/wasm-32B-Instruct-V1" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
prompt = "Explain the concept of recursion with Python code." |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## π§© Processing Long Texts |
|
|
|
|
|
This model supports **context lengths up to 32,768 tokens**. For even longer inputs, you can enable **YaRN** scaling by modifying the `config.json` as follows: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"rope_scaling": { |
|
|
"type": "yarn", |
|
|
"factor": 4.0, |
|
|
"original_max_position_embeddings": 32768 |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits. |
|
|
|
|
|
## π¦ Deployment Notes |
|
|
|
|
|
We recommend using `vLLM` for efficient deployment, especially with large input lengths or real-time serving needs. Please note: |
|
|
|
|
|
* `vLLM` currently supports static YaRN only. |
|
|
* Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs. |
|
|
|
|
|
## π¬ Contact |
|
|
|
|
|
For support, feedback, or collaboration inquiries, please contact: |
|
|
|
|
|
π§ **[modelasg@gmail.com](mailto:modelasg@gmail.com)** |
|
|
|
|
|
--- |
|
|
|
|
|
|