--- license: apache-2.0 base_model: Qwen/Qwen2.5-Coder-7B-Instruct tags: - code - coding-assistant - arc - arc-brains - hackathon - ppt - qwen2 - lora - sft language: - en pipeline_tag: text-generation library_name: transformers --- # 🌟 Arc — Friendly Coding Expert > **Created by Arc Brains: Ibrahim Shaikh, Harsh Goswami, Manas Tamore, Ayush Thakur** Arc is a powerful coding assistant fine-tuned from **Qwen2.5-Coder-7B-Instruct** on **253K+ high-quality coding examples**. It delivers **complete, production-ready solutions** — never patches. --- ## 🎯 What Arc Excels At | Skill | Description | |-------|-------------| | 💻 **Complete Code** | Full runnable solutions with imports, error handling, docs | | 🏗️ **Hackathon Projects** | Entire apps from scratch — Flask, React, CLI tools | | 📊 **Presentations** | Generate PowerPoint slides programmatically | | 🐛 **Debugging** | Root-cause analysis with full fixes, not band-aids | | 🌐 **Multi-language** | Python, JavaScript, C++, Java, Ruby, Go, Rust, and more | | 📐 **Architecture** | Full project structure with all files | --- ## 🚀 Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-Coder-7B-Instruct", torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(base, "ibrahim2806/Arc") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct") # Chat messages = [ {"role": "system", "content": "You are Arc, a friendly coding expert by Arc Brains."}, {"role": "user", "content": "Build a full Flask REST API for a todo app with CRUD, auth, and SQLite"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True) print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` --- ## 📊 Training Details | Parameter | Value | |-----------|-------| | **Base Model** | Qwen/Qwen2.5-Coder-7B-Instruct | | **Method** | QLoRA (4-bit NF4) | | **LoRA Rank** | 64 | | **LoRA Alpha** | 128 | | **Target Modules** | q/k/v/o_proj, gate/up/down_proj | | **Learning Rate** | 2e-4 (cosine decay) | | **Epochs** | 3 | | **Effective Batch Size** | 32 | | **Max Sequence Length** | 4096 | | **Total Training Samples** | ~253,000 | | **Optimizer** | AdamW | | **Precision** | BF16 | --- ## 📚 Training Datasets | Dataset | Samples | What It Teaches | |---------|---------|-----------------| | [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback) | 68K | Multi-turn debugging, project building, iterative refinement | | [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) | 110K | Complex algorithmic & system design problems | | [Magicoder-OSS-Instruct-75K](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K) | 75K | Real-world code from open-source projects | --- ## 🏗️ Arc Brains Team Built with ❤️ by **Ibrahim Shaikh**, **Harsh Goswami**, **Manas Tamore**, and **Ayush Thakur**.