Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

ConicAI_LLM_model

File size: 4,636 Bytes

47eeb2f

# **ConicAI Coding LLM: A Parameter-Efficient Framework for Structured Code Generation and Explanation**

---

## **Abstract**

Large Language Models (LLMs) have significantly advanced the field of automated code generation and reasoning. However, traditional fine-tuning approaches remain computationally expensive and often produce unstructured outputs that limit their usability in real-world applications.

In this work, we present **ConicAI Coding LLM**, a lightweight and parameter-efficient coding assistant built using Low-Rank Adaptation (LoRA) on top of the Qwen2.5-Coder architecture. The model is designed to generate, debug, and explain code while producing structured outputs that include confidence, relevancy, and hallucination indicators.

Our approach demonstrates that compact models can achieve competitive performance with improved interpretability and deployment efficiency, making them suitable for practical developer tools and educational systems.

---

## **1. Introduction**

The rapid evolution of LLMs has enabled significant improvements in code generation, debugging, and explanation tasks. Models such as Codex and QwenCoder have shown strong capabilities but require extensive computational resources for training and deployment.

Additionally, most existing systems produce **unstructured outputs**, making integration into applications difficult. There is a growing need for models that are:

* Computationally efficient
* Structurally interpretable
* Easily deployable

This work introduces a parameter-efficient solution addressing these challenges.

---

## **2. Problem Statement**

Despite advancements, current coding LLMs suffer from:

* High computational cost for full fine-tuning
* Lack of structured outputs
* Difficulty in integration into real-world systems
* Limited interpretability of generated results

---

## **3. Proposed Method**

We propose **ConicAI Coding LLM**, a framework combining:

* **LoRA-based fine-tuning** for efficiency
* **Instruction-based dataset generation**
* **Structured inference output design**

---

## **4. Methodology**

### **4.1 Base Model**

The model is built on:

* **Qwen2.5-Coder-0.5B-Instruct**

---

### **4.2 Fine-Tuning Approach**

We apply **LoRA (Low-Rank Adaptation)**:

* Reduces trainable parameters
* Enables local training
* Maintains performance

---

### **4.3 Dataset Design**

The dataset follows an instruction-driven format:

* Instruction
* Input
* Output
* Explanation

Dataset size: **~5,000 – 10,000 samples**

---

### **4.4 Structured Output Framework**

The model produces outputs in structured JSON format:

```json id="1rqx9u"

{

  "code": "...",

  "explanation": "...",

  "confidence": 0.84,

  "relevancy_score": 0.82,

  "hallucination": false

}

```

This enables:

* Easy API integration
* Automated evaluation
* Better interpretability

---

## **5. Evaluation**

### **5.1 Metrics**

We evaluate the model using:

* Code Correctness (%)
* Syntax Validity (%)
* Relevancy Score
* Hallucination Rate (%)
* Confidence Score
* Latency (ms)

---

### **5.2 Results**

The model demonstrates:

* Improved correctness over baseline models
* Lower hallucination rates
* More stable structured outputs

---

## **6. Benchmark Visualization**

![Benchmark Results](./benchmark.png)

The results indicate that ConicAI achieves better performance in correctness, syntax validity, and confidence, while maintaining lower hallucination rates compared to baseline models.

---

## **7. Results Analysis**

* **Higher correctness** due to instruction-based fine-tuning
* **Lower hallucination** from structured output constraints
* **Better usability** due to JSON output format

---

## **8. Limitations**

* Limited dataset diversity
* Heuristic-based confidence estimation
* Lack of standardized benchmark evaluation

---

## **9. Future Work**

Future improvements include:

* Scaling dataset size and diversity
* Benchmarking on datasets like HumanEval and MBPP
* Improving hallucination detection methods
* Building user interfaces and APIs

---

## **10. Conclusion**

This work demonstrates that a compact coding LLM can be effectively enhanced using LoRA to achieve efficient training, structured outputs, and improved usability. The proposed approach bridges the gap between research models and practical deployment systems.

---

## **References**

* Hugging Face Transformers
* PEFT: Parameter-Efficient Fine-Tuning
* Qwen2.5-Coder Model

---