--- library_name: transformers pipeline_tag: text-generation --- --- # wasm-32B-Instruct-V1 **wasm-32B-Instruct-V1** is a state-of-the-art instruction-tuned large language model developed by [wasmdashai](https://huggingface.co/wasmdashai). With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks, ## ๐Ÿš€ Introduction `wasm-32B-Instruct-V1` is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including: * ๐Ÿง  Code generation and debugging * ๐Ÿ“š Long-context understanding * ๐Ÿ—ฃ๏ธ Multi-turn dialogue and reasoning * ๐Ÿ” Privacy-conscious edge deployments (e.g., via WebAssembly) This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency. ## ๐Ÿ—๏ธ Model Details * **Type**: Causal Language Model (Decoder-only) * **Parameters**: 32 Billion * **Training**: Pretraining + Instruction Fine-tuning * **Architecture**: Transformer with: * Rotary Position Embeddings (RoPE) * SwiGLU activation * RMSNorm * Attention with QKV bias * **Context Length**: Up to **32,768** tokens * **Extended Context Option**: Via `rope_scaling` (supports up to 128K with YaRN) * **Format**: Hugging Face Transformers-compatible ## โš™๏ธ Requirements To use this model, install the latest version of ๐Ÿค— `transformers` (>= 4.37.0 recommended): ```bash pip install --upgrade transformers ``` ## ๐Ÿงช Quickstart Here is a minimal example to load the model and generate a response: ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "wasmdashai/wasm-32B-Instruct-V1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) prompt = "Explain the concept of recursion with Python code." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## ๐Ÿงฉ Processing Long Texts This model supports **context lengths up to 32,768 tokens**. For even longer inputs, you can enable **YaRN** scaling by modifying the `config.json` as follows: ```json { "rope_scaling": { "type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768 } } ``` This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits. ## ๐Ÿ“ฆ Deployment Notes We recommend using `vLLM` for efficient deployment, especially with large input lengths or real-time serving needs. Please note: * `vLLM` currently supports static YaRN only. * Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs. ## ๐Ÿ“ฌ Contact For support, feedback, or collaboration inquiries, please contact: ๐Ÿ“ง **[modelasg@gmail.com](mailto:modelasg@gmail.com)** ---