wasmdashai
/

wasm-32B-Instruct-V1

Text Generation

text-generation-inference

Model card Files Files and versions

wasm-32B-Instruct-V1 / README.md

wasmdashai's picture

Update README.md

cc68cfb verified 8 months ago

|

history blame contribute delete

3.13 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	---

	---

	# wasm-32B-Instruct-V1

	wasm-32B-Instruct-V1 is a state-of-the-art instruction-tuned large language model developed by [wasmdashai](https://huggingface.co/wasmdashai). With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks,
	## 🚀 Introduction

	`wasm-32B-Instruct-V1` is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including:

	* 🧠 Code generation and debugging
	* 📚 Long-context understanding
	* 🗣️ Multi-turn dialogue and reasoning
	* 🔐 Privacy-conscious edge deployments (e.g., via WebAssembly)

	This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency.

	## 🏗️ Model Details

	* Type: Causal Language Model (Decoder-only)
	* Parameters: 32 Billion
	* Training: Pretraining + Instruction Fine-tuning
	* Architecture: Transformer with:

	* Rotary Position Embeddings (RoPE)
	* SwiGLU activation
	* RMSNorm
	* Attention with QKV bias
	* Context Length: Up to 32,768 tokens
	* Extended Context Option: Via `rope_scaling` (supports up to 128K with YaRN)
	* Format: Hugging Face Transformers-compatible

	## ⚙️ Requirements

	To use this model, install the latest version of 🤗 `transformers` (>= 4.37.0 recommended):

	```bash
	pip install --upgrade transformers
	```

	## 🧪 Quickstart

	Here is a minimal example to load the model and generate a response:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "wasmdashai/wasm-32B-Instruct-V1"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	prompt = "Explain the concept of recursion with Python code."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=512)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(response)
	```

	## 🧩 Processing Long Texts

	This model supports context lengths up to 32,768 tokens. For even longer inputs, you can enable YaRN scaling by modifying the `config.json` as follows:

	```json
	{
	"rope_scaling": {
	"type": "yarn",
	"factor": 4.0,
	"original_max_position_embeddings": 32768
	}
	}
	```

	This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits.

	## 📦 Deployment Notes

	We recommend using `vLLM` for efficient deployment, especially with large input lengths or real-time serving needs. Please note:

	* `vLLM` currently supports static YaRN only.
	* Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs.

	## 📬 Contact

	For support, feedback, or collaboration inquiries, please contact:

	📧 [modelasg@gmail.com](mailto:modelasg@gmail.com)

	---