Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

girish00 commited on Apr 19

Commit

f204dba

verified ·

1 Parent(s): f2d1187

update endpoint helper files

Browse files

Files changed (1) hide show

infer_local.py +44 -26

infer_local.py CHANGED Viewed

@@ -286,13 +286,19 @@ def main():
     parser.add_argument("--base-model", type=str, default="Qwen/Qwen2.5-Coder-0.5B-Instruct")
     parser.add_argument("--prompt", type=str, required=True)
     parser.add_argument("--max-new-tokens", type=int, default=320)
-    parser.add_argument("--temperature", type=float, default=0.25)
-    parser.add_argument("--top-p", type=float, default=0.9)
-    parser.add_argument("--do-sample", action="store_true")
-    args = parser.parse_args()
-    if not os.path.exists(args.model_path):
-        raise FileNotFoundError(
             f"Model path not found: {args.model_path}. Train first using run_pipeline.py."
         )
@@ -301,20 +307,28 @@ def main():
     full_model_weights_present = has_full_model_weights(args.model_path)
     if os.path.exists(adapter_config_path) and adapter_weights_present:
-        peft_config = PeftConfig.from_pretrained(args.model_path)
-        base_model_name = peft_config.base_model_name_or_path or args.base_model
-        tokenizer = AutoTokenizer.from_pretrained(base_model_name)
-        base_model = AutoModelForCausalLM.from_pretrained(
-            base_model_name,
-            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
-        )
-        model = PeftModel.from_pretrained(base_model, args.model_path)
-    elif full_model_weights_present and not os.path.exists(adapter_config_path):
-        tokenizer = AutoTokenizer.from_pretrained(args.model_path)
-        model = AutoModelForCausalLM.from_pretrained(
-            args.model_path,
-            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
-        )
     else:
         # Graceful fallback when local model folder has config/tokenizer but no weight files.
         fallback_base = args.base_model
@@ -343,11 +357,15 @@ def main():
                 ),
                 file=sys.stderr,
             )
-        tokenizer = AutoTokenizer.from_pretrained(fallback_base)
-        model = AutoModelForCausalLM.from_pretrained(
-            fallback_base,
-            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
-        )
     if tokenizer.pad_token is None:
         tokenizer.pad_token = tokenizer.eos_token
     model.eval()

     parser.add_argument("--base-model", type=str, default="Qwen/Qwen2.5-Coder-0.5B-Instruct")
     parser.add_argument("--prompt", type=str, required=True)
     parser.add_argument("--max-new-tokens", type=int, default=320)
+    parser.add_argument("--temperature", type=float, default=0.25)
+    parser.add_argument("--top-p", type=float, default=0.9)
+    parser.add_argument("--do-sample", action="store_true")
+    parser.add_argument(
+        "--allow-downloads",
+        action="store_true",
+        help="Allow Transformers to download missing model files from Hugging Face.",
+    )
+    args = parser.parse_args()
+    local_files_only = not args.allow_downloads
+    if not os.path.exists(args.model_path):
+        raise FileNotFoundError(
             f"Model path not found: {args.model_path}. Train first using run_pipeline.py."
         )
     full_model_weights_present = has_full_model_weights(args.model_path)
     if os.path.exists(adapter_config_path) and adapter_weights_present:
+        peft_config = PeftConfig.from_pretrained(args.model_path)
+        base_model_name = peft_config.base_model_name_or_path or args.base_model
+        tokenizer = AutoTokenizer.from_pretrained(
+            base_model_name,
+            local_files_only=local_files_only,
+        )
+        base_model = AutoModelForCausalLM.from_pretrained(
+            base_model_name,
+            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+            local_files_only=local_files_only,
+        )
+        model = PeftModel.from_pretrained(base_model, args.model_path)
+    elif full_model_weights_present and not os.path.exists(adapter_config_path):
+        tokenizer = AutoTokenizer.from_pretrained(
+            args.model_path,
+            local_files_only=local_files_only,
+        )
+        model = AutoModelForCausalLM.from_pretrained(
+            args.model_path,
+            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+            local_files_only=local_files_only,
+        )
     else:
         # Graceful fallback when local model folder has config/tokenizer but no weight files.
         fallback_base = args.base_model
                 ),
                 file=sys.stderr,
             )
+        tokenizer = AutoTokenizer.from_pretrained(
+            fallback_base,
+            local_files_only=local_files_only,
+        )
+        model = AutoModelForCausalLM.from_pretrained(
+            fallback_base,
+            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+            local_files_only=local_files_only,
+        )
     if tokenizer.pad_token is None:
         tokenizer.pad_token = tokenizer.eos_token
     model.eval()