wasmdashai's picture
Update README.md
cc68cfb verified
---
library_name: transformers
pipeline_tag: text-generation
---
---
# wasm-32B-Instruct-V1
**wasm-32B-Instruct-V1** is a state-of-the-art instruction-tuned large language model developed by [wasmdashai](https://huggingface.co/wasmdashai). With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks,
## πŸš€ Introduction
`wasm-32B-Instruct-V1` is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including:
* 🧠 Code generation and debugging
* πŸ“š Long-context understanding
* πŸ—£οΈ Multi-turn dialogue and reasoning
* πŸ” Privacy-conscious edge deployments (e.g., via WebAssembly)
This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency.
## πŸ—οΈ Model Details
* **Type**: Causal Language Model (Decoder-only)
* **Parameters**: 32 Billion
* **Training**: Pretraining + Instruction Fine-tuning
* **Architecture**: Transformer with:
* Rotary Position Embeddings (RoPE)
* SwiGLU activation
* RMSNorm
* Attention with QKV bias
* **Context Length**: Up to **32,768** tokens
* **Extended Context Option**: Via `rope_scaling` (supports up to 128K with YaRN)
* **Format**: Hugging Face Transformers-compatible
## βš™οΈ Requirements
To use this model, install the latest version of πŸ€— `transformers` (>= 4.37.0 recommended):
```bash
pip install --upgrade transformers
```
## πŸ§ͺ Quickstart
Here is a minimal example to load the model and generate a response:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "wasmdashai/wasm-32B-Instruct-V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "Explain the concept of recursion with Python code."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## 🧩 Processing Long Texts
This model supports **context lengths up to 32,768 tokens**. For even longer inputs, you can enable **YaRN** scaling by modifying the `config.json` as follows:
```json
{
"rope_scaling": {
"type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}
```
This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits.
## πŸ“¦ Deployment Notes
We recommend using `vLLM` for efficient deployment, especially with large input lengths or real-time serving needs. Please note:
* `vLLM` currently supports static YaRN only.
* Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs.
## πŸ“¬ Contact
For support, feedback, or collaboration inquiries, please contact:
πŸ“§ **[modelasg@gmail.com](mailto:modelasg@gmail.com)**
---