Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

ConicAI_LLM_model / ConicAI_paper.md

girish00

upload benchmark img and conicai paper

47eeb2f verified about 1 month ago

preview code

raw

history blame contribute delete

4.64 kB

ConicAI Coding LLM: A Parameter-Efficient Framework for Structured Code Generation and Explanation

Abstract

Large Language Models (LLMs) have significantly advanced the field of automated code generation and reasoning. However, traditional fine-tuning approaches remain computationally expensive and often produce unstructured outputs that limit their usability in real-world applications.

In this work, we present ConicAI Coding LLM, a lightweight and parameter-efficient coding assistant built using Low-Rank Adaptation (LoRA) on top of the Qwen2.5-Coder architecture. The model is designed to generate, debug, and explain code while producing structured outputs that include confidence, relevancy, and hallucination indicators.

Our approach demonstrates that compact models can achieve competitive performance with improved interpretability and deployment efficiency, making them suitable for practical developer tools and educational systems.

1. Introduction

The rapid evolution of LLMs has enabled significant improvements in code generation, debugging, and explanation tasks. Models such as Codex and QwenCoder have shown strong capabilities but require extensive computational resources for training and deployment.

Additionally, most existing systems produce unstructured outputs, making integration into applications difficult. There is a growing need for models that are:

Computationally efficient
Structurally interpretable
Easily deployable

This work introduces a parameter-efficient solution addressing these challenges.

2. Problem Statement

Despite advancements, current coding LLMs suffer from:

High computational cost for full fine-tuning
Lack of structured outputs
Difficulty in integration into real-world systems
Limited interpretability of generated results

3. Proposed Method

We propose ConicAI Coding LLM, a framework combining:

LoRA-based fine-tuning for efficiency
Instruction-based dataset generation
Structured inference output design

4. Methodology

4.1 Base Model

The model is built on:

Qwen2.5-Coder-0.5B-Instruct

4.2 Fine-Tuning Approach

We apply LoRA (Low-Rank Adaptation):

Reduces trainable parameters
Enables local training
Maintains performance

4.3 Dataset Design

The dataset follows an instruction-driven format:

Instruction
Input
Output
Explanation

Dataset size: ~5,000 – 10,000 samples

4.4 Structured Output Framework

The model produces outputs in structured JSON format:

{
  "code": "...",
  "explanation": "...",
  "confidence": 0.84,
  "relevancy_score": 0.82,
  "hallucination": false
}

This enables:

Easy API integration
Automated evaluation
Better interpretability

5. Evaluation

5.1 Metrics

We evaluate the model using:

Code Correctness (%)
Syntax Validity (%)
Relevancy Score
Hallucination Rate (%)
Confidence Score
Latency (ms)

5.2 Results

The model demonstrates:

Improved correctness over baseline models
Lower hallucination rates
More stable structured outputs

6. Benchmark Visualization

The results indicate that ConicAI achieves better performance in correctness, syntax validity, and confidence, while maintaining lower hallucination rates compared to baseline models.

7. Results Analysis

Higher correctness due to instruction-based fine-tuning
Lower hallucination from structured output constraints
Better usability due to JSON output format

8. Limitations

Limited dataset diversity
Heuristic-based confidence estimation
Lack of standardized benchmark evaluation

9. Future Work

Future improvements include:

Scaling dataset size and diversity
Benchmarking on datasets like HumanEval and MBPP
Improving hallucination detection methods
Building user interfaces and APIs

10. Conclusion

This work demonstrates that a compact coding LLM can be effectively enhanced using LoRA to achieve efficient training, structured outputs, and improved usability. The proposed approach bridges the gap between research models and practical deployment systems.

References

Hugging Face Transformers
PEFT: Parameter-Efficient Fine-Tuning
Qwen2.5-Coder Model