Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use girish00/ConicAI_LLM_model with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model") - Transformers
How to use girish00/ConicAI_LLM_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model") model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use girish00/ConicAI_LLM_model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "girish00/ConicAI_LLM_model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "girish00/ConicAI_LLM_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/girish00/ConicAI_LLM_model
- SGLang
How to use girish00/ConicAI_LLM_model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "girish00/ConicAI_LLM_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "girish00/ConicAI_LLM_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "girish00/ConicAI_LLM_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "girish00/ConicAI_LLM_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
docker model run hf.co/girish00/ConicAI_LLM_model
ConicAI Coding LLM: A Parameter-Efficient Framework for Structured Code Generation and Explanation
Abstract
Large Language Models (LLMs) have significantly advanced the field of automated code generation and reasoning. However, traditional fine-tuning approaches remain computationally expensive and often produce unstructured outputs that limit their usability in real-world applications.
In this work, we present ConicAI Coding LLM, a lightweight and parameter-efficient coding assistant built using Low-Rank Adaptation (LoRA) on top of the Qwen2.5-Coder architecture. The model is designed to generate, debug, and explain code while producing structured outputs that include confidence, relevancy, and hallucination indicators.
Our approach demonstrates that compact models can achieve competitive performance with improved interpretability and deployment efficiency, making them suitable for practical developer tools and educational systems.
1. Introduction
The rapid evolution of LLMs has enabled significant improvements in code generation, debugging, and explanation tasks. Models such as Codex and QwenCoder have shown strong capabilities but require extensive computational resources for training and deployment.
Additionally, most existing systems produce unstructured outputs, making integration into applications difficult. There is a growing need for models that are:
- Computationally efficient
- Structurally interpretable
- Easily deployable
This work introduces a parameter-efficient solution addressing these challenges.
2. Problem Statement
Despite advancements, current coding LLMs suffer from:
- High computational cost for full fine-tuning
- Lack of structured outputs
- Difficulty in integration into real-world systems
- Limited interpretability of generated results
3. Proposed Method
We propose ConicAI Coding LLM, a framework combining:
- LoRA-based fine-tuning for efficiency
- Instruction-based dataset generation
- Structured inference output design
4. Methodology
4.1 Base Model
The model is built on:
- Qwen2.5-Coder-0.5B-Instruct
4.2 Fine-Tuning Approach
We apply LoRA (Low-Rank Adaptation):
- Reduces trainable parameters
- Enables local training
- Maintains performance
4.3 Dataset Design
The dataset follows an instruction-driven format:
- Instruction
- Input
- Output
- Explanation
Dataset size: ~5,000 – 10,000 samples
4.4 Structured Output Framework
The model produces outputs in structured JSON format:
{
"code": "...",
"explanation": "...",
"confidence": 0.84,
"relevancy_score": 0.82,
"hallucination": false
}
This enables:
- Easy API integration
- Automated evaluation
- Better interpretability
5. Evaluation
5.1 Metrics
We evaluate the model using:
- Code Correctness (%)
- Syntax Validity (%)
- Relevancy Score
- Hallucination Rate (%)
- Confidence Score
- Latency (ms)
5.2 Results
The model demonstrates:
- Improved correctness over baseline models
- Lower hallucination rates
- More stable structured outputs
6. Benchmark Visualization
The results indicate that ConicAI achieves better performance in correctness, syntax validity, and confidence, while maintaining lower hallucination rates compared to baseline models.
7. Results Analysis
- Higher correctness due to instruction-based fine-tuning
- Lower hallucination from structured output constraints
- Better usability due to JSON output format
8. Limitations
- Limited dataset diversity
- Heuristic-based confidence estimation
- Lack of standardized benchmark evaluation
9. Future Work
Future improvements include:
- Scaling dataset size and diversity
- Benchmarking on datasets like HumanEval and MBPP
- Improving hallucination detection methods
- Building user interfaces and APIs
10. Conclusion
This work demonstrates that a compact coding LLM can be effectively enhanced using LoRA to achieve efficient training, structured outputs, and improved usability. The proposed approach bridges the gap between research models and practical deployment systems.
References
- Hugging Face Transformers
- PEFT: Parameter-Efficient Fine-Tuning
- Qwen2.5-Coder Model
