Arc / README.md
ibrahim2806's picture
Upload README.md
7e97ae9 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
  - code
  - coding-assistant
  - arc
  - arc-brains
  - hackathon
  - ppt
  - qwen2
  - lora
  - sft
language:
  - en
pipeline_tag: text-generation
library_name: transformers

🌟 Arc β€” Friendly Coding Expert

Created by Arc Brains: Ibrahim Shaikh, Harsh Goswami, Manas Tamore, Ayush Thakur

Arc is a powerful coding assistant fine-tuned from Qwen2.5-Coder-7B-Instruct on 253K+ high-quality coding examples. It delivers complete, production-ready solutions β€” never patches.


🎯 What Arc Excels At

Skill Description
πŸ’» Complete Code Full runnable solutions with imports, error handling, docs
πŸ—οΈ Hackathon Projects Entire apps from scratch β€” Flask, React, CLI tools
πŸ“Š Presentations Generate PowerPoint slides programmatically
πŸ› Debugging Root-cause analysis with full fixes, not band-aids
🌐 Multi-language Python, JavaScript, C++, Java, Ruby, Go, Rust, and more
πŸ“ Architecture Full project structure with all files

πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load
base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base, "ibrahim2806/Arc")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")

# Chat
messages = [
    {"role": "system", "content": "You are Arc, a friendly coding expert by Arc Brains."},
    {"role": "user", "content": "Build a full Flask REST API for a todo app with CRUD, auth, and SQLite"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

πŸ“Š Training Details

Parameter Value
Base Model Qwen/Qwen2.5-Coder-7B-Instruct
Method QLoRA (4-bit NF4)
LoRA Rank 64
LoRA Alpha 128
Target Modules q/k/v/o_proj, gate/up/down_proj
Learning Rate 2e-4 (cosine decay)
Epochs 3
Effective Batch Size 32
Max Sequence Length 4096
Total Training Samples ~253,000
Optimizer AdamW
Precision BF16

πŸ“š Training Datasets

Dataset Samples What It Teaches
m-a-p/Code-Feedback 68K Multi-turn debugging, project building, iterative refinement
Magicoder-Evol-Instruct-110K 110K Complex algorithmic & system design problems
Magicoder-OSS-Instruct-75K 75K Real-world code from open-source projects

πŸ—οΈ Arc Brains Team

Built with ❀️ by Ibrahim Shaikh, Harsh Goswami, Manas Tamore, and Ayush Thakur.