--- license: apache-2.0 base_model: "Qwen/Qwen2.5-Coder-0.5B-Instruct" library_name: peft pipeline_tag: text-generation tags: - lora - transformers - coding - code-generation - peft --- # ConicAI Coding LLM ## Model Details ### Model Description ConicAI LLM Model is a parameter-efficient fine-tuned coding assistant built using LoRA on top of Qwen2.5-Coder. It is designed to generate, debug, and explain code with structured outputs. * **Developed by:** GIRISH KUMAR DEWANGAN * **Model type:** Causal Language Model (Code LLM) * **Language(s):** Python, general programming * **used for:** Code generation, debugging, fixing error, getting evaluation score, check hallucination and relevancy score as well * **License:** Apache 2.0 * **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B-Instruct --- ## Model Sources * **Repository:** https://huggingface.co/girish00/ConicAI_LLM_model * **Paper:** [View Paper](./ConicAI_paper.md) --- ## Uses ### Direct Use * Code generation * Debugging * Code explanation * Learning programming --- ### Downstream Use * Coding assistants * AI-based education tools * Developer productivity tools --- ### Out-of-Scope Use * Security-critical systems * Autonomous production systems * High-risk environments --- ## Bias, Risks, and Limitations * May generate incorrect logic * Confidence scores are heuristic * Output depends on prompt quality * Limited dataset generalization --- ## Recommendations * Always validate generated code * Use structured prompts * Avoid ambiguous instructions --- ## Structured Output Framework The model produces outputs in structured JSON format: ``` { "code": "...", "explanation": "...", "confidence": 0.84, "relevancy_score": 0.82, "hallucination": false } ``` ```text This enables: -Easy API integration -Automated evaluation -Better interpretability ``` --- ## How to Get Started with the Model ```python !pip -q install -U transformers peft accelerate huggingface_hub safetensors !pip install --upgrade torchao from google.colab import userdata HF_TOKEN = userdata.get('HF_TOKEN') model = "girish00/ConicAI_LLM_model" prompt = input("Enter your prompt: ") from huggingface_hub import login, snapshot_download login(token=HF_TOKEN) repo = snapshot_download(model, token=HF_TOKEN) import sys, os sys.path.append(repo) from infer_local import build_instruction_prompt, build_structured_result from peft import PeftConfig, PeftModel from transformers import AutoTokenizer, AutoModelForCausalLM import torch, time, json cfg = PeftConfig.from_pretrained(repo) base = cfg.base_model_name_or_path tokenizer = AutoTokenizer.from_pretrained(base) base_model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, device_map="auto" ) llm = PeftModel.from_pretrained(base_model, repo) llm.eval() inputs = tokenizer(build_instruction_prompt(prompt), return_tensors="pt").to(llm.device) start = time.perf_counter() with torch.no_grad(): out = llm.generate( **inputs, max_new_tokens=320, output_scores=True, return_dict_in_generate=True, do_sample=False, pad_token_id=tokenizer.eos_token_id ) latency = int((time.perf_counter() - start) * 1000) gen_ids = out.sequences[0][inputs["input_ids"].shape[1]:].tolist() text = tokenizer.decode(gen_ids, skip_special_tokens=True) conf = [] for tid, score in zip(gen_ids, out.scores): probs = torch.softmax(score[0], dim=-1) conf.append(float(probs[tid].item())) print(json.dumps( build_structured_result( prompt, text, latency, tokenizer=tokenizer, generated_ids=gen_ids, token_confidences=conf ), indent=2 )) ``` --- ## 📊 Benchmark Results ![Benchmark](./benchmark.png) --- ## Training Details ### Dataset * Size: ~5K samples * Instruction-based coding dataset ### Training Procedure * Method: LoRA fine-tuning * Framework: Transformers + PEFT * Precision: FP16 / Mixed ### Training Hyperparameters | Parameter | Value | | ------------------- | ----- | | Epochs | 1–3 | | Batch Size | 2 | | Learning Rate | 2e-4 | | Max Sequence Length | 512 | | LoRA Rank (r) | 8 | | LoRA Alpha | 16 | | LoRA Dropout | 0.05 | --- ## Inference Configuration ```text max_new_tokens = 200 temperature = 0.2 top_p = 0.9 do_sample = True ``` --- ## Evaluation ### Metrics * Code correctness * Syntax validity * Relevancy score * Hallucination rate * Confidence score * Latency --- ### Results Summary * Higher correctness vs base model * Lower hallucination rate * Better structured outputs --- ## Technical Specifications ### Architecture * Transformer-based causal LM * LoRA adaptation --- ### Hardware * GPU recommended (optional) * CPU supported --- ### Software * Transformers * PEFT * PyTorch --- ## Environmental Impact * Low compute due to LoRA * Efficient fine-tuning --- ## Citation **BibTeX:** ```text @misc{conicai_llm, author = {Girish}, title = {ConicAI Coding LLM}, year = {2026}, publisher = {Hugging Face} } ``` --- ## Model Card Authors GIRISH KUMAR DEWANGAN --- ### Framework versions * PEFT 0.19.0