---
license: apache-2.0
base_model: "Qwen/Qwen2.5-Coder-0.5B-Instruct"
library_name: peft
pipeline_tag: text-generation

tags:
  - lora
  - transformers
  - coding
  - code-generation
  - peft
---

# ConicAI Coding LLM

## Model Details

### Model Description

ConicAI LLM Model is a parameter-efficient fine-tuned coding assistant built using LoRA on top of Qwen2.5-Coder. It is designed to generate, debug, and explain code with structured outputs.

* **Developed by:** GIRISH KUMAR DEWANGAN
* **Model type:** Causal Language Model (Code LLM)
* **Language(s):** Python, general programming
* **used for:** Code generation, debugging, fixing error, getting evaluation score, check hallucination and relevancy score as well
* **License:** Apache 2.0
* **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B-Instruct

---

## Model Sources

* **Repository:** https://huggingface.co/girish00/ConicAI_LLM_model
* **Paper:** [View Paper](./ConicAI_paper.md)

---

## Uses

### Direct Use

* Code generation
* Debugging
* Code explanation
* Learning programming

---

### Downstream Use

* Coding assistants
* AI-based education tools
* Developer productivity tools

---

### Out-of-Scope Use

* Security-critical systems
* Autonomous production systems
* High-risk environments

---

## Bias, Risks, and Limitations

* May generate incorrect logic
* Confidence scores are heuristic
* Output depends on prompt quality
* Limited dataset generalization

---

## Recommendations

* Always validate generated code
* Use structured prompts
* Avoid ambiguous instructions

---
## Structured Output Framework
The model produces outputs in structured JSON format:

```
{
  "code": "...",
  "explanation": "...",
  "confidence": 0.84,
  "relevancy_score": 0.82,
  "hallucination": false
}

```
```text
This enables:

-Easy API integration
-Automated evaluation
-Better interpretability
```
---


## How to Get Started with the Model

```python
!pip -q install -U transformers peft accelerate huggingface_hub safetensors
!pip install --upgrade torchao

from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

model = "girish00/ConicAI_LLM_model"
prompt = input("Enter your prompt: ")

from huggingface_hub import login, snapshot_download
login(token=HF_TOKEN)

repo = snapshot_download(model, token=HF_TOKEN)

import sys, os
sys.path.append(repo)

from infer_local import build_instruction_prompt, build_structured_result
from peft import PeftConfig, PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time, json

cfg = PeftConfig.from_pretrained(repo)
base = cfg.base_model_name_or_path

tokenizer = AutoTokenizer.from_pretrained(base)

base_model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)

llm = PeftModel.from_pretrained(base_model, repo)
llm.eval()

inputs = tokenizer(build_instruction_prompt(prompt), return_tensors="pt").to(llm.device)

start = time.perf_counter()

with torch.no_grad():
    out = llm.generate(
        **inputs,
        max_new_tokens=320,
        output_scores=True,
        return_dict_in_generate=True,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

latency = int((time.perf_counter() - start) * 1000)

gen_ids = out.sequences[0][inputs["input_ids"].shape[1]:].tolist()
text = tokenizer.decode(gen_ids, skip_special_tokens=True)

conf = []
for tid, score in zip(gen_ids, out.scores):
    probs = torch.softmax(score[0], dim=-1)
    conf.append(float(probs[tid].item()))

print(json.dumps(
    build_structured_result(
        prompt,
        text,
        latency,
        tokenizer=tokenizer,
        generated_ids=gen_ids,
        token_confidences=conf
    ),
    indent=2
))
```

---

## 📊 Benchmark Results

![Benchmark](./benchmark.png)

---

## Training Details

### Dataset

* Size: ~5K samples
* Instruction-based coding dataset

### Training Procedure

* Method: LoRA fine-tuning
* Framework: Transformers + PEFT
* Precision: FP16 / Mixed

### Training Hyperparameters

| Parameter           | Value |
| ------------------- | ----- |
| Epochs              | 1–3   |
| Batch Size          | 2     |
| Learning Rate       | 2e-4  |
| Max Sequence Length | 512   |
| LoRA Rank (r)       | 8     |
| LoRA Alpha          | 16    |
| LoRA Dropout        | 0.05  |

---

## Inference Configuration

```text
max_new_tokens = 200
temperature = 0.2
top_p = 0.9
do_sample = True
```

---

## Evaluation

### Metrics

* Code correctness
* Syntax validity
* Relevancy score
* Hallucination rate
* Confidence score
* Latency

---

### Results Summary

* Higher correctness vs base model
* Lower hallucination rate
* Better structured outputs

---

## Technical Specifications

### Architecture

* Transformer-based causal LM
* LoRA adaptation

---

### Hardware

* GPU recommended (optional)
* CPU supported

---

### Software

* Transformers
* PEFT
* PyTorch

---

## Environmental Impact

* Low compute due to LoRA
* Efficient fine-tuning

---

## Citation

**BibTeX:**

```text
@misc{conicai_llm,
  author = {Girish},
  title = {ConicAI Coding LLM},
  year = {2026},
  publisher = {Hugging Face}
}
```

---

## Model Card Authors

GIRISH KUMAR DEWANGAN

---


### Framework versions

* PEFT 0.19.0