Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

girish00 commited on Apr 17

Commit

be162c5

verified ·

1 Parent(s): b8f4f0d

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +34 -101

README.md CHANGED Viewed

@@ -19,106 +19,64 @@ tags:
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
-This model is a fine-tuned coding assistant built on top of Qwen2.5-Coder using LoRA (Low-Rank Adaptation).
-It is designed to improve performance in:
-- Code generation
-- Debugging
-- Code explanation
-- Code optimization
-The model also incorporates structured outputs including explanation, confidence, and relevancy signals.
----
-- **Developed by:** GIRISH KUMAR DEWANGAN]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [Causal Language Model (Code Generation & Debugging)]
-- **Language(s) (NLP):** [Python, general programming concepts]
-- **License:** [Apache 2.0]
-- **Finetuned from model [optional]:** [Qwen/Qwen2.5-Coder-0.5B-Instruct]
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[- Fix buggy Python code
-- Explain code logic
-- Optimize code
-- Generate small functions  ]
 ### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[- Integration in coding assistants (VS Code extension, chatbots)
-- Educational tools for learning programming
-- AI-powered debugging tools
-]
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[- Production-critical code without validation
-- Security-sensitive code generation
-- Large-scale system design  ]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[- May generate incorrect or incomplete code
-- May hallucinate fixes for ambiguous inputs
-- Limited to training dataset scope
-- Confidence scores are heuristic (not calibrated)
-]
 ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-- Always validate generated code before use
-- Use human review for critical applications
-- Combine with test cases for reliability
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from peft import PeftModel
-# Base model
-base_model = "Qwen/Qwen2.5-Coder-0.5B-Instruct"
-# ConicAI LLM model
-adapter_model = "girish00/ConicAI_LLM_model"
-# Load tokenizer
-tokenizer = AutoTokenizer.from_pretrained(base_model)
-# Load base model
-model = AutoModelForCausalLM.from_pretrained(base_model)
-# Load fine-tuned adapter
-model = PeftModel.from_pretrained(model, adapter_model)
-# Test prompt
-prompt = "Fix this code: def add(a,b) return a+b"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_new_tokens=200)
-print(tokenizer.decode(outputs[0]))
-```
 ## Training Details
@@ -126,31 +84,20 @@ print(tokenizer.decode(outputs[0]))
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[Synthetic dataset (~8K–10K samples)
-Includes:
-Bug fixing tasks
-Code explanation
-Optimization tasks
-Structured outputs (explanation, confidence, relevancy)]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-Method: LoRA fine-tuning
-Framework: Transformers + PEFT
 #### Preprocessing [optional]
 #### Training Hyperparameters
-- **Training regime:** [Batch size: 2
-Epochs: 1–2
-Learning rate: 2e-4
-Max sequence length: 512
-Quantization: 4-bit (for efficient training)] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
@@ -161,10 +108,6 @@ Quantization: 4-bit (for efficient training)] <!--fp32, fp16 mixed precision, bf
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
-Metrics
-Qualitative evaluation (manual testing)
-Relevancy score (embedding similarity)
-Hallucination detection (syntax validation)
 ### Testing Data, Factors & Metrics
@@ -178,20 +121,17 @@ Hallucination detection (syntax validation)
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[Qualitative evaluation (manual testing)
-Relevancy score (embedding similarity)
-Hallucination detection (syntax validation)]
 ### Results
-Improved code correctness compared to base model
-Better explanation quality
-Reduced syntax errors
 #### Summary
@@ -207,23 +147,23 @@ Reduced syntax errors
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-- **Hardware Type:** Google Colab GPU (T4)
-- **Hours used:** ~30–60 minutes
 - **Cloud Provider:** [More Information Needed]
 - **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** Low (small-scale training)
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
-[Base: Qwen2.5-Coder
-Fine-tuning: LoRA adapter]
 ### Compute Infrastructure
 #### Hardware
@@ -231,19 +171,12 @@ Fine-tuning: LoRA adapter]
 #### Software
-[Transformers
-PEFT (v0.19.0)
-Datasets]
 ## Citation [optional]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-@misc{coding-llm-2026,
-  author = {Girish},
-  title = {Coding LLM Model},
-  year = {2026},
-  publisher = {Hugging Face}
-}
 **BibTeX:**
 [More Information Needed]

 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
 ### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
 ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+[More Information Needed]
 ## Training Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
+[More Information Needed]
 #### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
 ### Results
+[More Information Needed]
 #### Summary
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
 - **Cloud Provider:** [More Information Needed]
 - **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
+[More Information Needed]
 ### Compute Infrastructure
+[More Information Needed]
 #### Hardware
 #### Software
+[More Information Needed]
 ## Citation [optional]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
 [More Information Needed]