Arc / README.md

ibrahim2806

Upload README.md

7e97ae9 verified 24 days ago

preview code

raw

history blame contribute delete

3.3 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
  - code
  - coding-assistant
  - arc
  - arc-brains
  - hackathon
  - ppt
  - qwen2
  - lora
  - sft
language:
  - en
pipeline_tag: text-generation
library_name: transformers

🌟 Arc — Friendly Coding Expert

Created by Arc Brains: Ibrahim Shaikh, Harsh Goswami, Manas Tamore, Ayush Thakur

Arc is a powerful coding assistant fine-tuned from Qwen2.5-Coder-7B-Instruct on 253K+ high-quality coding examples. It delivers complete, production-ready solutions — never patches.

🎯 What Arc Excels At

Skill	Description
💻 Complete Code	Full runnable solutions with imports, error handling, docs
🏗️ Hackathon Projects	Entire apps from scratch — Flask, React, CLI tools
📊 Presentations	Generate PowerPoint slides programmatically
🐛 Debugging	Root-cause analysis with full fixes, not band-aids
🌐 Multi-language	Python, JavaScript, C++, Java, Ruby, Go, Rust, and more
📐 Architecture	Full project structure with all files

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load
base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base, "ibrahim2806/Arc")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")

# Chat
messages = [
    {"role": "system", "content": "You are Arc, a friendly coding expert by Arc Brains."},
    {"role": "user", "content": "Build a full Flask REST API for a todo app with CRUD, auth, and SQLite"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

📊 Training Details

Parameter	Value
Base Model	Qwen/Qwen2.5-Coder-7B-Instruct
Method	QLoRA (4-bit NF4)
LoRA Rank	64
LoRA Alpha	128
Target Modules	q/k/v/o_proj, gate/up/down_proj
Learning Rate	2e-4 (cosine decay)
Epochs	3
Effective Batch Size	32
Max Sequence Length	4096
Total Training Samples	~253,000
Optimizer	AdamW
Precision	BF16

📚 Training Datasets

Dataset	Samples	What It Teaches
m-a-p/Code-Feedback	68K	Multi-turn debugging, project building, iterative refinement
Magicoder-Evol-Instruct-110K	110K	Complex algorithmic & system design problems
Magicoder-OSS-Instruct-75K	75K	Real-world code from open-source projects

🏗️ Arc Brains Team

Built with ❤️ by Ibrahim Shaikh, Harsh Goswami, Manas Tamore, and Ayush Thakur.