ibrahim2806
/

Arc

Text Generation

coding-assistant

Model card Files Files and versions

Arc / README.md

ibrahim2806's picture

Upload README.md

7e97ae9 verified 26 days ago

|

history blame contribute delete

3.3 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Coder-7B-Instruct
	tags:
	- code
	- coding-assistant
	- arc
	- arc-brains
	- hackathon
	- ppt
	- qwen2
	- lora
	- sft
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	---

	# 🌟 Arc — Friendly Coding Expert

	> Created by Arc Brains: Ibrahim Shaikh, Harsh Goswami, Manas Tamore, Ayush Thakur

	Arc is a powerful coding assistant fine-tuned from Qwen2.5-Coder-7B-Instruct on 253K+ high-quality coding examples. It delivers complete, production-ready solutions — never patches.

	---

	## 🎯 What Arc Excels At

	\| Skill \| Description \|
	\|-------\|-------------\|
	\| 💻 Complete Code \| Full runnable solutions with imports, error handling, docs \|
	\| 🏗️ Hackathon Projects \| Entire apps from scratch — Flask, React, CLI tools \|
	\| 📊 Presentations \| Generate PowerPoint slides programmatically \|
	\| 🐛 Debugging \| Root-cause analysis with full fixes, not band-aids \|
	\| 🌐 Multi-language \| Python, JavaScript, C++, Java, Ruby, Go, Rust, and more \|
	\| 📐 Architecture \| Full project structure with all files \|

	---

	## 🚀 Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load
	base = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-Coder-7B-Instruct",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	model = PeftModel.from_pretrained(base, "ibrahim2806/Arc")
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")

	# Chat
	messages = [
	{"role": "system", "content": "You are Arc, a friendly coding expert by Arc Brains."},
	{"role": "user", "content": "Build a full Flask REST API for a todo app with CRUD, auth, and SQLite"}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
	print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
	```

	---

	## 📊 Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| Qwen/Qwen2.5-Coder-7B-Instruct \|
	\| Method \| QLoRA (4-bit NF4) \|
	\| LoRA Rank \| 64 \|
	\| LoRA Alpha \| 128 \|
	\| Target Modules \| q/k/v/o_proj, gate/up/down_proj \|
	\| Learning Rate \| 2e-4 (cosine decay) \|
	\| Epochs \| 3 \|
	\| Effective Batch Size \| 32 \|
	\| Max Sequence Length \| 4096 \|
	\| Total Training Samples \| ~253,000 \|
	\| Optimizer \| AdamW \|
	\| Precision \| BF16 \|

	---

	## 📚 Training Datasets

	\| Dataset \| Samples \| What It Teaches \|
	\|---------\|---------\|-----------------\|
	\| [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback) \| 68K \| Multi-turn debugging, project building, iterative refinement \|
	\| [Magicoder-Evol-Instruct-110K](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K) \| 110K \| Complex algorithmic & system design problems \|
	\| [Magicoder-OSS-Instruct-75K](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K) \| 75K \| Real-world code from open-source projects \|

	---

	## 🏗️ Arc Brains Team

	Built with ❤️ by Ibrahim Shaikh, Harsh Goswami, Manas Tamore, and Ayush Thakur.