Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

ConicAI_LLM_model / SPECIFICATION.md

girish00

update endpoint helper files

3e6e808 verified 22 days ago

preview code

raw

history blame contribute delete

3.74 kB

	# Project Specification

	## 1. Project Name

	Local Advanced Fine-Tuning Pipeline for Coding LLM

	## 2. Purpose

	Provide a fully local, modular workflow to fine-tune a compact coding LLM for:
	- code fixing
	- debugging
	- code explanation
	- response confidence and relevancy signals

	## 3. Functional Requirements

	### FR-1 Dataset Generation
	- System must generate a JSON dataset with fields:
	- `instruction`
	- `input`
	- `output`
	- `explanation`
	- `confidence`
	- `relevancy`
	- Dataset size must be constrained to 5000-10000 samples.

	### FR-2 Model Fine-Tuning
	- System must support LoRA fine-tuning on:
	- `Qwen/Qwen2.5-Coder-0.5B-Instruct` (default)
	- Training inputs must be tokenized and formatted from dataset records.
	- Training output must be stored in a configurable output directory.

	### FR-3 Pipeline Orchestration
	- System must provide a one-command execution script for:
	- dataset generation
	- training
	- optional uploading
	- Pipeline must support skipping individual stages.

	### FR-4 Local Inference
	- System must generate outputs from local model folder.
	- Inference module must support:
	- LoRA adapter outputs
	- full model outputs
	- Inference output must be valid JSON containing:
	- `code`
	- `explanation`
	- `confidence`
	- `important_tokens`
	- `relevancy_score`
	- `hallucination`
	- `hallucination_check_reason`
	- `latency_ms`

	### FR-5 HF Upload
	- System must upload trained model artifacts to a user-specified HF repo.
	- Upload should be optional and independently executable.
	- System must support updating an existing HF model repo by uploading to the same `repo_id`.

	## 4. Non-Functional Requirements

	### NFR-1 Reliability
	- Scripts must fail with clear error messages for missing files/directories.

	### NFR-2 Configurability
	- Hyperparameters and paths must be configurable via CLI.
	- Pipeline defaults should be read from `training_config.json`.

	### NFR-3 Performance
	- Must support limited-sample smoke run for CPU environments.
	- Tokenization must use deterministic fixed-length padding for stable LoRA training labels.
	- Inference should support deterministic mode by default for stable outputs.

	### NFR-4 Maintainability
	- Modules must remain decoupled and single-purpose where possible.
	- Documentation must include setup and run commands.

	## 5. Input/Output Contracts

	### Dataset Generator
	- Input:
	- `--size` (int, 5000-10000)
	- `--out` (path)
	- Output:
	- JSON training file at `--out`

	### Trainer
	- Input:
	- dataset file path
	- model name
	- hyperparameters
	- Output:
	- trained model artifacts in `output_dir`

	### Inference
	- Input:
	- local model path
	- prompt
	- max new tokens
	- Output:
	- structured JSON to stdout
	- Contract:
	- required keys: `code`, `explanation`, `confidence`, `important_tokens`, `relevancy_score`, `hallucination`, `hallucination_check_reason`, `latency_ms`

	### Upload
	- Input:
	- model directory path
	- HF repo id
	- Output:
	- model artifacts uploaded to HF repo

	## 6. Default Configuration

	- Model: `Qwen/Qwen2.5-Coder-0.5B-Instruct`
	- Dataset size: `8000`
	- Epochs: `3`
	- Batch size: `2`
	- Learning rate: `1e-4`
	- Max length: `512`

	## 7. Validation Criteria

	Project is considered runnable when:
	- all scripts compile
	- dataset generation succeeds
	- a smoke training run completes
	- inference returns valid JSON payload with required keys
	- upload script accepts valid model dir and repo id

	## 8. Known Constraints

	- CPU training is slow for full dataset runs.
	- HF login/token is required for upload.
	- Output quality depends heavily on dataset diversity and quality.