---
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
- en
tags:
- base_model:adapter:Qwen/Qwen2.5-Coder-7B-Instruct
- lora
- sft
- transformers
- trl
- jac
- jaclang
- code-generation
- data-spatial-programming
---

# Jac Coder 7B LoRA

A LoRA adapter fine-tuned on the [Jac programming language](https://www.jac-lang.org/) for code generation, built on top of [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct).

Jac is a data-spatial programming language featuring walkers, nodes, edges, and graph-based computation. This adapter teaches the base model to generate idiomatic Jac backend code including node/edge definitions, walker APIs, graph traversals, and ability implementations.

## Example Outputs

**Prompt:** "Write a Jac node for a User with name, email, and age fields"
```jac
node User {
    has name: str;
    has email: str;
    has age: int = 0;
}
```

**Prompt:** "Write a Jac walker for a REST API endpoint that creates a new todo item"
```jac
node Todo {
    has title: str;
    has done: bool = False;
}

walker CreateTodo {
    has title: str;

    can create with Root entry {
        here ++> Todo(title=self.title);
        report [-->];
    }
}
```

## Model Details

- **Base model:** Qwen/Qwen2.5-Coder-7B-Instruct
- **Adapter type:** LoRA (rank 64, alpha 128)
- **Trainable params:** 161,480,704 / 7,777,097,216 (2.08%)
- **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Developed by:** [farhan98ahzan](https://huggingface.co/farhan98ahzan)
- **License:** Apache 2.0

## How to Use

### With PEFT (recommended)

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-Coder-7B-Instruct"
ADAPTER = "farhan98ahzan/jac-coder-7b-lora"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Load base model in 4-bit (for low VRAM)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

# Generate
messages = [
    {"role": "system", "content": "You are an expert Jac programming language assistant."},
    {"role": "user", "content": "Write a Jac walker that lists all users"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True)

generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))
```

### Merging the adapter (for full model export)

To merge LoRA weights into the base model, load the base model in **bf16 (not 4-bit)** to avoid rounding errors:

```python
from transformers import AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "farhan98ahzan/jac-coder-7b-lora")
merged = model.merge_and_unload()
merged.save_pretrained("jac-coder-7b-merged")
```

> **Warning:** Do not merge into a 4-bit quantized base model -- this produces corrupted weights and gibberish output.

## Training Details

### Training Data

The adapter was trained on 3,200 curated Jac code samples sourced from:

| Source | Description |
|---|---|
| jaseci/jaseci | Core Jac compiler repo -- examples, tests, reference implementations |
| BeaconLens | Full-stack Jac application (review analysis platform) |
| jac-visual-builder | Visual graph schema builder in Jac |
| Jac documentation | 936 code examples extracted from official docs |

All source files were validated with `jac check --parse_only` for syntactic correctness. Only backend Jac code was included (frontend/UI files filtered out).

**Dataset composition:**

| Type | Count | Description |
|---|---|---|
| full_file | 800 | Complete valid Jac source files |
| construct_completion | 800 | Walker/node/ability signature to body completion |
| completion | 800 | Import + partial code to complete the rest |
| doc_example | 800 | Documentation description to Jac code |

### Training Procedure

- **Method:** QLoRA (4-bit NF4 quantization + LoRA)
- **Framework:** Hugging Face TRL (SFTTrainer)
- **Epochs:** 1
- **Batch size:** 2 per device, gradient accumulation 4 (effective batch 8)
- **Learning rate:** 2e-4 with cosine schedule
- **Max sequence length:** 512 tokens
- **Precision:** bf16
- **Gradient checkpointing:** enabled
- **Packing:** disabled (required for correctness without flash attention)

### Compute Infrastructure

- **Hardware:** 2x NVIDIA Tesla T4 (15.6 GB VRAM each)
- **Platform:** Kaggle Notebooks (free tier)
- **Training time:** ~5.5 hours
- **Total steps:** 380

## Evaluation

Qualitative evaluation on held-out prompts:

| Prompt | Result |
|---|---|
| Node definition with typed fields | Correct `node` with `has` fields and defaults |
| Walker with graph traversal | Correct `walker` with `[-->]` traversal and `report` |
| REST API endpoint walker | Correct walker with `Root entry`, node creation (`++>`), and response |

The model generates syntactically valid Jac code with proper use of language-specific constructs: `node`, `walker`, `has`, `can`, `with ... entry`, `++>`, `[-->]`, `report`, and `disengage`.

## Limitations

- Trained on 1 epoch of 3,200 samples -- may not cover all Jac patterns
- Max training sequence length was 512 tokens -- longer code may be truncated
- Backend-only -- does not generate Jac frontend/UI code (`.cl.jac`)
- Based on Jac language version 0.13.5 -- syntax may differ in newer versions

## Citation

```bibtex
@misc{jac-coder-7b-lora,
  title={Jac Coder 7B LoRA},
  author={Farhan Ahzan},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/farhan98ahzan/jac-coder-7b-lora}
}
```

### Framework Versions

- PEFT 0.18.1
- Transformers 4.51.3
- TRL 0.18.1
- PyTorch 2.6.0
- BitsAndBytes 0.45.5