Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

ConicAI_LLM_model

File size: 5,299 Bytes

---
license: apache-2.0
base_model: "Qwen/Qwen2.5-Coder-0.5B-Instruct"
library_name: peft
pipeline_tag: text-generation

tags:
  - lora
  - transformers
  - coding
  - code-generation
  - peft
---

# ConicAI Coding LLM

## Model Details

### Model Description

ConicAI LLM Model is a parameter-efficient fine-tuned coding assistant built using LoRA on top of Qwen2.5-Coder. It is designed to generate, debug, and explain code with structured outputs.

* **Developed by:** GIRISH KUMAR DEWANGAN
* **Model type:** Causal Language Model (Code LLM)
* **Language(s):** Python, general programming
* **used for:** Code generation, debugging, fixing error, getting evaluation score, check hallucination and relevancy score as well
* **License:** Apache 2.0
* **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B-Instruct

---

## Model Sources

* **Repository:** https://huggingface.co/girish00/ConicAI_LLM_model
* **Paper:** [View Paper](./ConicAI_paper.md)

---

## Uses

### Direct Use

* Code generation
* Debugging
* Code explanation
* Learning programming

---

### Downstream Use

* Coding assistants
* AI-based education tools
* Developer productivity tools

---

### Out-of-Scope Use

* Security-critical systems
* Autonomous production systems
* High-risk environments

---

## Bias, Risks, and Limitations

* May generate incorrect logic
* Confidence scores are heuristic
* Output depends on prompt quality
* Limited dataset generalization

---

## Recommendations

* Always validate generated code
* Use structured prompts
* Avoid ambiguous instructions

---
## Structured Output Framework
The model produces outputs in structured JSON format:

```
{
  "code": "...",
  "explanation": "...",
  "confidence": 0.84,
  "relevancy_score": 0.82,
  "hallucination": false
}

```
```text
This enables:

-Easy API integration
-Automated evaluation
-Better interpretability
```
---


## How to Get Started with the Model

```python
!pip -q install -U transformers peft accelerate huggingface_hub safetensors
!pip install --upgrade torchao

from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

model = "girish00/ConicAI_LLM_model"
prompt = input("Enter your prompt: ")

from huggingface_hub import login, snapshot_download
login(token=HF_TOKEN)

repo = snapshot_download(model, token=HF_TOKEN)

import sys, os
sys.path.append(repo)

from infer_local import build_instruction_prompt, build_structured_result
from peft import PeftConfig, PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, time, json

cfg = PeftConfig.from_pretrained(repo)
base = cfg.base_model_name_or_path

tokenizer = AutoTokenizer.from_pretrained(base)

base_model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)

llm = PeftModel.from_pretrained(base_model, repo)
llm.eval()

inputs = tokenizer(build_instruction_prompt(prompt), return_tensors="pt").to(llm.device)

start = time.perf_counter()

with torch.no_grad():
    out = llm.generate(
        **inputs,
        max_new_tokens=320,
        output_scores=True,
        return_dict_in_generate=True,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

latency = int((time.perf_counter() - start) * 1000)

gen_ids = out.sequences[0][inputs["input_ids"].shape[1]:].tolist()
text = tokenizer.decode(gen_ids, skip_special_tokens=True)

conf = []
for tid, score in zip(gen_ids, out.scores):
    probs = torch.softmax(score[0], dim=-1)
    conf.append(float(probs[tid].item()))

print(json.dumps(
    build_structured_result(
        prompt,
        text,
        latency,
        tokenizer=tokenizer,
        generated_ids=gen_ids,
        token_confidences=conf
    ),
    indent=2
))
```

---

## 📊 Benchmark Results

![Benchmark](./benchmark.png)

---

## Training Details

### Dataset

* Size: ~5K samples
* Instruction-based coding dataset

### Training Procedure

* Method: LoRA fine-tuning
* Framework: Transformers + PEFT
* Precision: FP16 / Mixed

### Training Hyperparameters

| Parameter           | Value |
| ------------------- | ----- |
| Epochs              | 1–3   |
| Batch Size          | 2     |
| Learning Rate       | 2e-4  |
| Max Sequence Length | 512   |
| LoRA Rank (r)       | 8     |
| LoRA Alpha          | 16    |
| LoRA Dropout        | 0.05  |

---

## Inference Configuration

```text
max_new_tokens = 200
temperature = 0.2
top_p = 0.9
do_sample = True
```

---

## Evaluation

### Metrics

* Code correctness
* Syntax validity
* Relevancy score
* Hallucination rate
* Confidence score
* Latency

---

### Results Summary

* Higher correctness vs base model
* Lower hallucination rate
* Better structured outputs

---

## Technical Specifications

### Architecture

* Transformer-based causal LM
* LoRA adaptation

---

### Hardware

* GPU recommended (optional)
* CPU supported

---

### Software

* Transformers
* PEFT
* PyTorch

---

## Environmental Impact

* Low compute due to LoRA
* Efficient fine-tuning

---

## Citation

**BibTeX:**

```text
@misc{conicai_llm,
  author = {Girish},
  title = {ConicAI Coding LLM},
  year = {2026},
  publisher = {Hugging Face}
}
```

---

## Model Card Authors

GIRISH KUMAR DEWANGAN

---


### Framework versions

* PEFT 0.19.0