Instructions to use girish00/ConicAI_LLM_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use girish00/ConicAI_LLM_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "girish00/ConicAI_LLM_model")

Transformers

How to use girish00/ConicAI_LLM_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("girish00/ConicAI_LLM_model")
model = AutoModelForCausalLM.from_pretrained("girish00/ConicAI_LLM_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use girish00/ConicAI_LLM_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "girish00/ConicAI_LLM_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/girish00/ConicAI_LLM_model

SGLang

How to use girish00/ConicAI_LLM_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "girish00/ConicAI_LLM_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "girish00/ConicAI_LLM_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "girish00/ConicAI_LLM_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use girish00/ConicAI_LLM_model with Docker Model Runner:
```
docker model run hf.co/girish00/ConicAI_LLM_model
```

girish00 commited on Apr 19

Commit

e42c1f9

verified ·

1 Parent(s): 89a3adc

make project runnable and endpoint-ready

Browse files

Files changed (1) hide show

run_pipeline.py +123 -0

run_pipeline.py ADDED Viewed

	@@ -0,0 +1,123 @@

+import argparse
+import json
+import subprocess
+import sys
+from pathlib import Path
+def run(cmd):
+    print("Running:", " ".join(cmd))
+    result = subprocess.run(cmd, check=False)
+    if result.returncode != 0:
+        raise SystemExit(result.returncode)
+def flag_present(flag_name):
+    return flag_name in sys.argv
+def usable_repo_id(repo_id):
+    if not repo_id:
+        return ""
+    placeholders = ("your-username/", "your-user/", "username/")
+    return "" if repo_id.startswith(placeholders) else repo_id
+def apply_config_defaults(args):
+    config_path = Path("training_config.json")
+    if not config_path.exists():
+        return args
+    with config_path.open("r", encoding="utf-8") as f:
+        cfg = json.load(f)
+    if not flag_present("--model-name"):
+        args.model_name = cfg.get("model_name", args.model_name)
+    if not flag_present("--dataset-size"):
+        args.dataset_size = cfg.get("dataset_size", args.dataset_size)
+    if not flag_present("--train-file"):
+        args.train_file = cfg.get("train_file", args.train_file)
+    if not flag_present("--output-dir"):
+        args.output_dir = cfg.get("output_dir", args.output_dir)
+    if not flag_present("--hf-repo"):
+        args.hf_repo = usable_repo_id(cfg.get("hf_repo_id", args.hf_repo))
+    if not flag_present("--epochs"):
+        args.epochs = cfg.get("epochs", args.epochs)
+    if not flag_present("--batch-size"):
+        args.batch_size = cfg.get("batch_size", args.batch_size)
+    if not flag_present("--learning-rate"):
+        args.learning_rate = cfg.get("learning_rate", args.learning_rate)
+    if not flag_present("--max-length"):
+        args.max_length = cfg.get("max_length", args.max_length)
+    if not flag_present("--max-train-samples"):
+        args.max_train_samples = cfg.get("max_train_samples", args.max_train_samples)
+    if not flag_present("--use-4bit"):
+        args.use_4bit = cfg.get("use_4bit", args.use_4bit)
+    return args
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dataset-size", type=int, default=8000)
+    parser.add_argument("--train-file", type=str, default="train.json")
+    parser.add_argument("--output-dir", type=str, default="model")
+    parser.add_argument("--model-name", type=str, default="Qwen/Qwen2.5-Coder-0.5B-Instruct")
+    parser.add_argument("--epochs", type=float, default=1)
+    parser.add_argument("--batch-size", type=int, default=2)
+    parser.add_argument("--learning-rate", type=float, default=2e-4)
+    parser.add_argument("--max-length", type=int, default=512)
+    parser.add_argument("--max-train-samples", type=int, default=0)
+    parser.add_argument("--use-4bit", action="store_true")
+    parser.add_argument("--hf-repo", type=str, default="")
+    parser.add_argument("--skip-generate", action="store_true")
+    parser.add_argument("--skip-train", action="store_true")
+    parser.add_argument("--skip-upload", action="store_true")
+    args = parser.parse_args()
+    args = apply_config_defaults(args)
+    if not (5000 <= args.dataset_size <= 10000):
+        raise ValueError("dataset-size must be between 5000 and 10000")
+    if not args.skip_generate:
+        run([sys.executable, "generate_dataset.py", "--size", str(args.dataset_size), "--out", args.train_file])
+    if not args.skip_train:
+        train_cmd = [
+            sys.executable,
+            "finetune_coding_llm_colab.py",
+            "--dataset-size",
+            str(args.dataset_size),
+            "--train-file",
+            args.train_file,
+            "--output-dir",
+            args.output_dir,
+            "--model-name",
+            args.model_name,
+            "--epochs",
+            str(args.epochs),
+            "--batch-size",
+            str(args.batch_size),
+            "--learning-rate",
+            str(args.learning_rate),
+            "--max-length",
+            str(args.max_length),
+            "--max-train-samples",
+            str(args.max_train_samples),
+            "--skip-dataset-gen",
+        ]
+        if args.use_4bit:
+            train_cmd.append("--use-4bit")
+        run(train_cmd)
+    else:
+        print("Skipping training stage (--skip-train).")
+    if not args.skip_upload:
+        if not args.hf_repo:
+            raise ValueError("Pass --hf-repo when upload is enabled, or use --skip-upload")
+        run([sys.executable, "upload_to_hf.py", "--model-dir", args.output_dir, "--repo-id", args.hf_repo])
+    print("Pipeline completed.")
+if __name__ == "__main__":
+    main()