Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

ConicAI_LLM_model / SPECIFICATION.md

girish00

update endpoint helper files

3e6e808 verified 22 days ago

preview code

raw

history blame contribute delete

3.74 kB

Project Specification

1. Project Name

Local Advanced Fine-Tuning Pipeline for Coding LLM

2. Purpose

Provide a fully local, modular workflow to fine-tune a compact coding LLM for:

code fixing
debugging
code explanation
response confidence and relevancy signals

3. Functional Requirements

FR-1 Dataset Generation

System must generate a JSON dataset with fields:
- instruction
- input
- output
- explanation
- confidence
- relevancy
Dataset size must be constrained to 5000-10000 samples.

FR-2 Model Fine-Tuning

System must support LoRA fine-tuning on:
- Qwen/Qwen2.5-Coder-0.5B-Instruct (default)
Training inputs must be tokenized and formatted from dataset records.
Training output must be stored in a configurable output directory.

FR-3 Pipeline Orchestration

System must provide a one-command execution script for:
- dataset generation
- training
- optional uploading
Pipeline must support skipping individual stages.

FR-4 Local Inference

System must generate outputs from local model folder.
Inference module must support:
- LoRA adapter outputs
- full model outputs
Inference output must be valid JSON containing:
- code
- explanation
- confidence
- important_tokens
- relevancy_score
- hallucination
- hallucination_check_reason
- latency_ms

FR-5 HF Upload

System must upload trained model artifacts to a user-specified HF repo.
Upload should be optional and independently executable.
System must support updating an existing HF model repo by uploading to the same repo_id.

4. Non-Functional Requirements

NFR-1 Reliability

Scripts must fail with clear error messages for missing files/directories.

NFR-2 Configurability

Hyperparameters and paths must be configurable via CLI.
Pipeline defaults should be read from training_config.json.

NFR-3 Performance

Must support limited-sample smoke run for CPU environments.
Tokenization must use deterministic fixed-length padding for stable LoRA training labels.
Inference should support deterministic mode by default for stable outputs.

NFR-4 Maintainability

Modules must remain decoupled and single-purpose where possible.
Documentation must include setup and run commands.

5. Input/Output Contracts

Dataset Generator

Input:
- --size (int, 5000-10000)
- --out (path)
Output:
- JSON training file at --out

Trainer

Input:
- dataset file path
- model name
- hyperparameters
Output:
- trained model artifacts in output_dir

Inference

Input:
- local model path
- prompt
- max new tokens
Output:
- structured JSON to stdout
Contract:
- required keys: code, explanation, confidence, important_tokens, relevancy_score, hallucination, hallucination_check_reason, latency_ms

Upload

Input:
- model directory path
- HF repo id
Output:
- model artifacts uploaded to HF repo

6. Default Configuration

Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Dataset size: 8000
Epochs: 3
Batch size: 2
Learning rate: 1e-4
Max length: 512

7. Validation Criteria

Project is considered runnable when:

all scripts compile
dataset generation succeeds
a smoke training run completes
inference returns valid JSON payload with required keys
upload script accepts valid model dir and repo id

8. Known Constraints

CPU training is slow for full dataset runs.
HF login/token is required for upload.
Output quality depends heavily on dataset diversity and quality.