| --- |
| library_name: transformers |
| license: apache-2.0 |
| license_link: LICENSE |
| pipeline_tag: image-text-to-text |
| base_model: |
| - Qwen/Qwen3.5-4B |
| tags: |
| - verus |
| - coding |
| - multimodal |
| - vision |
| - 262k-context |
| language: |
| - en |
| --- |
| |
| # Verus-4B |
|
|
| [](LICENSE) |
| []() |
| []() |
| [](https://github.com/huggingface/transformers) |
|
|
| > [!Note] |
| > This repository contains model weights and configuration files for **Verus-4B** in the Hugging Face Transformers format. |
| > |
| > Compatible with Hugging Face Transformers, vLLM, SGLang, and other major inference frameworks. |
| > |
| > Primary intended use cases are **code generation**, **code review**, **debugging**, and **general coding assistance**. |
|
|
| ## Verus-4B Highlights |
|
|
| - **Coding-First**: Fine-tuned specifically on high-quality coding datasets — handles everything from simple scripts to complex multi-file implementations cleanly. |
| - **Image + Text Input**: Accepts both images and text, allowing you to describe UIs, diagrams, or screenshots alongside code questions. |
| - **262K Token Context Window**: Process entire codebases, long specifications, or lengthy conversations in a single pass. |
| - **Strong Instruction Following**: Stays focused, responds clearly, and redirects to the task at hand. |
| - **Efficient**: At 4B parameters in bfloat16, runs comfortably on a single consumer GPU with 8GB+ VRAM. |
|
|
| ## Model Overview |
|
|
| | Property | Value | |
| |---|---| |
| | Parameters | ~4B | |
| | Context Length | **262,144 tokens** | |
| | Architecture | Qwen3.5 | |
| | Chat Format | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) | |
| | Dtype | bfloat16 | |
| | License | Apache 2.0 | |
|
|
| ## Quickstart |
|
|
| ### Installation |
|
|
| ```bash |
| pip install "transformers>=4.52.0" accelerate torch |
| ``` |
|
|
| ### Code Generation |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| MODEL_ID = "8F-ai/Verus-4B" |
| |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
| model = AutoModelForCausalLM.from_pretrained( |
| MODEL_ID, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| model.eval() |
| |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are Verus, a coding assistant made by 8F-ai. You help with coding tasks and keep responses focused and clean." |
| }, |
| { |
| "role": "user", |
| "content": "Write a Python async context manager that manages a PostgreSQL connection pool using asyncpg." |
| } |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| |
| with torch.inference_mode(): |
| generated_ids = model.generate(**inputs, max_new_tokens=2048, temperature=0.1, top_p=0.95) |
| |
| output = tokenizer.decode(generated_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True) |
| print(output) |
| ``` |
|
|
| ### Image + Text Input |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| MODEL_ID = "8F-ai/Verus-4B" |
| |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
| model = AutoModelForCausalLM.from_pretrained( |
| MODEL_ID, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| model.eval() |
| |
| messages = [ |
| { |
| "role": "user", |
| "content": [ |
| {"type": "image", "image": "path/to/screenshot.png"}, |
| {"type": "text", "text": "Convert this UI screenshot into a React component using Tailwind CSS."} |
| ] |
| } |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| |
| with torch.inference_mode(): |
| generated_ids = model.generate(**inputs, max_new_tokens=2048, temperature=0.1, top_p=0.95) |
| |
| output = tokenizer.decode(generated_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True) |
| print(output) |
| ``` |
|
|
| ### Quantized Inference (4-bit NF4, ~4 GB VRAM) |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| import torch |
| |
| quantization_config = BitsAndBytesConfig( |
| load_in_4bit=True, |
| bnb_4bit_compute_dtype=torch.bfloat16, |
| bnb_4bit_use_double_quant=True, |
| bnb_4bit_quant_type="nf4", |
| ) |
| |
| tokenizer = AutoTokenizer.from_pretrained("8F-ai/Verus-4B") |
| model = AutoModelForCausalLM.from_pretrained( |
| "8F-ai/Verus-4B", |
| quantization_config=quantization_config, |
| device_map="auto", |
| ) |
| ``` |
|
|
| ## Intended Use Cases |
|
|
| | Use Case | Example | |
| |---|---| |
| | **Code Generation** | Write functions, classes, scripts in any language | |
| | **Debugging** | Identify and fix bugs from error messages or code | |
| | **Code Review** | Suggest improvements, catch issues, explain code | |
| | **UI to Code** | Convert screenshots or diagrams into working code | |
| | **Long Context Codebase** | Reason over entire repos up to ~200K tokens | |
| | **General Q&A** | Answer programming questions clearly and concisely | |
|
|
| ## Limitations |
|
|
| - **English-Primary**: Fine-tuning was conducted predominantly on English-language code and documentation. |
| - **Not for Math/Science**: Not optimized for mathematical proofs or scientific computation. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{verus4b2026, |
| title = {Verus-4B: A Coding-Focused Multimodal Language Model with 262K Context}, |
| author = {8F-ai}, |
| year = {2026}, |
| howpublished = {\url{https://huggingface.co/8F-ai/Verus-4B}}, |
| note = {Apache 2.0 License} |
| } |
| ``` |
|
|
| ## License |
|
|
| Verus-4B is released under the **Apache License 2.0**. See [LICENSE](LICENSE) for full terms. |
|
|
| Derived from [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) (Apache 2.0). |
|
|
| --- |
|
|
| <div align="center"> |
| <sub>Built with ❤️ by the 8F-ai Team</sub> |
| </div> |