--- base_model: Qwen/Qwen2.5-Coder-7B-Instruct library_name: peft pipeline_tag: text-generation license: apache-2.0 language: - en tags: - base_model:adapter:Qwen/Qwen2.5-Coder-7B-Instruct - lora - sft - transformers - trl - jac - jaclang - code-generation - data-spatial-programming --- # Jac Coder 7B LoRA A LoRA adapter fine-tuned on the [Jac programming language](https://www.jac-lang.org/) for code generation, built on top of [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct). Jac is a data-spatial programming language featuring walkers, nodes, edges, and graph-based computation. This adapter teaches the base model to generate idiomatic Jac backend code including node/edge definitions, walker APIs, graph traversals, and ability implementations. ## Example Outputs **Prompt:** "Write a Jac node for a User with name, email, and age fields" ```jac node User { has name: str; has email: str; has age: int = 0; } ``` **Prompt:** "Write a Jac walker for a REST API endpoint that creates a new todo item" ```jac node Todo { has title: str; has done: bool = False; } walker CreateTodo { has title: str; can create with Root entry { here ++> Todo(title=self.title); report [-->]; } } ``` ## Model Details - **Base model:** Qwen/Qwen2.5-Coder-7B-Instruct - **Adapter type:** LoRA (rank 64, alpha 128) - **Trainable params:** 161,480,704 / 7,777,097,216 (2.08%) - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **Developed by:** [farhan98ahzan](https://huggingface.co/farhan98ahzan) - **License:** Apache 2.0 ## How to Use ### With PEFT (recommended) ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel BASE_MODEL = "Qwen/Qwen2.5-Coder-7B-Instruct" ADAPTER = "farhan98ahzan/jac-coder-7b-lora" tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True) # Load base model in 4-bit (for low VRAM) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained( BASE_MODEL, quantization_config=bnb_config, device_map="auto", trust_remote_code=True, ) # Apply LoRA adapter model = PeftModel.from_pretrained(model, ADAPTER) model.eval() # Generate messages = [ {"role": "system", "content": "You are an expert Jac programming language assistant."}, {"role": "user", "content": "Write a Jac walker that lists all users"}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True) generated = outputs[0][inputs["input_ids"].shape[1]:] print(tokenizer.decode(generated, skip_special_tokens=True)) ``` ### Merging the adapter (for full model export) To merge LoRA weights into the base model, load the base model in **bf16 (not 4-bit)** to avoid rounding errors: ```python from transformers import AutoModelForCausalLM from peft import PeftModel base = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-Coder-7B-Instruct", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained(base, "farhan98ahzan/jac-coder-7b-lora") merged = model.merge_and_unload() merged.save_pretrained("jac-coder-7b-merged") ``` > **Warning:** Do not merge into a 4-bit quantized base model -- this produces corrupted weights and gibberish output. ## Training Details ### Training Data The adapter was trained on 3,200 curated Jac code samples sourced from: | Source | Description | |---|---| | jaseci/jaseci | Core Jac compiler repo -- examples, tests, reference implementations | | BeaconLens | Full-stack Jac application (review analysis platform) | | jac-visual-builder | Visual graph schema builder in Jac | | Jac documentation | 936 code examples extracted from official docs | All source files were validated with `jac check --parse_only` for syntactic correctness. Only backend Jac code was included (frontend/UI files filtered out). **Dataset composition:** | Type | Count | Description | |---|---|---| | full_file | 800 | Complete valid Jac source files | | construct_completion | 800 | Walker/node/ability signature to body completion | | completion | 800 | Import + partial code to complete the rest | | doc_example | 800 | Documentation description to Jac code | ### Training Procedure - **Method:** QLoRA (4-bit NF4 quantization + LoRA) - **Framework:** Hugging Face TRL (SFTTrainer) - **Epochs:** 1 - **Batch size:** 2 per device, gradient accumulation 4 (effective batch 8) - **Learning rate:** 2e-4 with cosine schedule - **Max sequence length:** 512 tokens - **Precision:** bf16 - **Gradient checkpointing:** enabled - **Packing:** disabled (required for correctness without flash attention) ### Compute Infrastructure - **Hardware:** 2x NVIDIA Tesla T4 (15.6 GB VRAM each) - **Platform:** Kaggle Notebooks (free tier) - **Training time:** ~5.5 hours - **Total steps:** 380 ## Evaluation Qualitative evaluation on held-out prompts: | Prompt | Result | |---|---| | Node definition with typed fields | Correct `node` with `has` fields and defaults | | Walker with graph traversal | Correct `walker` with `[-->]` traversal and `report` | | REST API endpoint walker | Correct walker with `Root entry`, node creation (`++>`), and response | The model generates syntactically valid Jac code with proper use of language-specific constructs: `node`, `walker`, `has`, `can`, `with ... entry`, `++>`, `[-->]`, `report`, and `disengage`. ## Limitations - Trained on 1 epoch of 3,200 samples -- may not cover all Jac patterns - Max training sequence length was 512 tokens -- longer code may be truncated - Backend-only -- does not generate Jac frontend/UI code (`.cl.jac`) - Based on Jac language version 0.13.5 -- syntax may differ in newer versions ## Citation ```bibtex @misc{jac-coder-7b-lora, title={Jac Coder 7B LoRA}, author={Farhan Ahzan}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/farhan98ahzan/jac-coder-7b-lora} } ``` ### Framework Versions - PEFT 0.18.1 - Transformers 4.51.3 - TRL 0.18.1 - PyTorch 2.6.0 - BitsAndBytes 0.45.5