Instructions to use lalatendu/phi3-sysadmin-lalatendu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use lalatendu/phi3-sysadmin-lalatendu with PEFT:
```
Task type is invalid.
```

How to use lalatendu/phi3-sysadmin-lalatendu with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lalatendu/phi3-sysadmin-lalatendu",
	filename="phi3-sysadmin-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use lalatendu/phi3-sysadmin-lalatendu with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

Use Docker

docker model run hf.co/lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

LM Studio
Jan

vLLM

How to use lalatendu/phi3-sysadmin-lalatendu with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lalatendu/phi3-sysadmin-lalatendu"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lalatendu/phi3-sysadmin-lalatendu",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

Ollama
How to use lalatendu/phi3-sysadmin-lalatendu with Ollama:
```
ollama run hf.co/lalatendu/phi3-sysadmin-lalatendu:Q4_K_M
```

Unsloth Studio

How to use lalatendu/phi3-sysadmin-lalatendu with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lalatendu/phi3-sysadmin-lalatendu to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lalatendu/phi3-sysadmin-lalatendu to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lalatendu/phi3-sysadmin-lalatendu to start chatting

Atomic Chat new
Docker Model Runner
How to use lalatendu/phi3-sysadmin-lalatendu with Docker Model Runner:
```
docker model run hf.co/lalatendu/phi3-sysadmin-lalatendu:Q4_K_M
```

Lemonade

How to use lalatendu/phi3-sysadmin-lalatendu with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lalatendu/phi3-sysadmin-lalatendu:Q4_K_M

Run and chat with the model

lemonade run user.phi3-sysadmin-lalatendu-Q4_K_M

List all available models

lemonade list

phi3-sysadmin-lalatendu

phi3-sysadmin-lalatendu is a domain-specialized model based on microsoft/Phi-3-mini-4k-instruct, fine-tuned using QLoRA (SFT with LoRA) via Unsloth for Linux system administration and DevOps tasks. This repository provides the GGUF (Q4_K_M) quantized model ready for local inference via Ollama.

Developed by: Lalatendu Keshari Swain
Model type: Causal Language Model (GGUF quantized)
Language(s): English
License: MIT
Base model: microsoft/Phi-3-mini-4k-instruct (3.8B parameters)
Fine-tuning method: QLoRA (Supervised Fine-Tuning with LoRA)
Quantization: q4_k_m (4-bit, ~2.3 GB)

Disclaimer: This model is provided for educational and productivity purposes only. We take no responsibility for the accuracy or completeness of the outputs. Commands and configurations suggested by this model should always be verified by a qualified system administrator before being applied to any production system. Please use it at your own risk.

Model Sources

LoRA Adapter: lalatendu/phi3-sysadmin-lora
GitHub: github.com/lalatenduswain
Blog: blog.lalatendu.info

Training Process

This model was trained using a single-stage SFT process:

Step 1: SFT (Supervised Fine-Tuning)

Dataset: 1,026 curated sysadmin and DevOps Q&A examples in ChatML JSONL format
Format: system / user / assistant turns
Topics: Linux administration, AWS, Docker, Kubernetes, Terraform, Ansible, Nginx, databases, networking, security, monitoring, backup
Objective: To specialize the Phi-3 Mini model in answering practical server management and troubleshooting questions accurately and concisely.

Training Hyperparameters

Parameter	Value
Base model quantization	4-bit (bnb-4bit)
LoRA rank (r)	64
LoRA alpha	128
LoRA target modules	Attention and MLP layers
Trainable parameters	~119M (5.62% of total)
Epochs	3–5
Batch size	8
Learning rate	2e-4
Optimizer	AdamW (8-bit)
Warmup steps	5
Weight decay	0.01
LR scheduler	Linear
Training time	~6 minutes
GPU	NVIDIA T4 (Google Colab free tier)
Final training loss	~0.5–0.8

GGUF Export

Quantization method: q4_k_m via llama.cpp
File size: ~2.3 GB
Export tool: Unsloth's built-in GGUF exporter

How to Get Started

Option 1: Ollama (Recommended)

# 1. Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# 2. Download phi3-sysadmin-Q4_K_M.gguf and Modelfile from this repo

# 3. Create the model
ollama create phi3-sysadmin -f Modelfile

# 4. Run interactively
ollama run phi3-sysadmin

Example queries:

ollama run phi3-sysadmin "How do I find what's consuming disk space?"
ollama run phi3-sysadmin "How do I set up Nginx reverse proxy with SSL?"
ollama run phi3-sysadmin "How do I troubleshoot high CPU usage?"
ollama run phi3-sysadmin "How do I create a Kubernetes deployment?"

API usage:

curl http://localhost:11434/api/generate -d '{
  "model": "phi3-sysadmin",
  "prompt": "How do I check which process is using port 8080?",
  "stream": false
}'

Option 2: llama.cpp

# Download the GGUF from this repo, then:
./llama-cli -m phi3-sysadmin-Q4_K_M.gguf \
  --system-prompt "You are phi3-sysadmin, a fine-tuned AI assistant created by Lalatendu Keshari Swain. Provide clear, practical answers for server management and troubleshooting." \
  -p "How do I check disk usage on Linux?"

Option 3: Transformers (LoRA adapter)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "lalatendu/phi3-sysadmin-lora")
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

prompt = (
    "<|system|>\n"
    "You are phi3-sysadmin, a fine-tuned AI assistant created by Lalatendu Keshari Swain. "
    "Provide clear, practical answers for server management and troubleshooting.<|end|>\n"
    "<|user|>\nHow do I check disk usage?<|end|>\n<|assistant|>\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Modelfile Contents

FROM ./phi3-sysadmin-Q4_K_M.gguf

TEMPLATE """<|system|>
{{ .System }}<|end|>
<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
{{ .Response }}<|end|>
"""

SYSTEM """You are phi3-sysadmin, a fine-tuned AI assistant created by Lalatendu Keshari Swain. Provide clear, practical answers for server management and troubleshooting."""

PARAMETER stop <|end|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
PARAMETER stop <|endoftext|>
PARAMETER temperature 0.7
PARAMETER top_p 0.9

Use Cases

Supported	Not Supported
Linux administration (disk, CPU, memory, processes, users, filesystems, systemd)	General-purpose conversation or creative writing
Cloud platforms (AWS, Azure, GCP)	Medical, legal, or financial advice
Containers (Docker, Kubernetes, Podman, Docker Swarm)	Non-English language tasks
CI/CD (Jenkins, GitHub Actions, ArgoCD)	Real-time data or internet access
IaC (Terraform, Ansible, Packer)	Unauthorized penetration testing or malicious use
Web servers (Nginx, Apache, Varnish)
Databases (MySQL, PostgreSQL, Redis, MongoDB, Elasticsearch)
Networking (DNS, firewalls, load balancing, VPN, TCP/IP, MTU)
Security (SSL/TLS, SELinux, AppArmor, mTLS, vulnerability scanning)
Monitoring (Prometheus, Grafana, Zabbix, node_exporter, ELK)
Backup (BorgBackup, Restic, snapshots, disaster recovery)
Bash/Shell scripting assistance

Bias, Risks, and Limitations

Small model (3.8B): May occasionally hallucinate or produce inaccurate commands. Always verify before running on production servers.
Training data scope: 1,026 examples cover common sysadmin topics. Niche or cutting-edge tooling may not be well represented.
English only: All responses are in English.
No real-time access: Cannot check current documentation, package versions, or live system state.
Outdated information: Package names, versions, and best practices evolve — cross-reference with official docs.

Recommendations:

Always verify commands before running on production systems
Cross-reference with official documentation for critical configurations
Use as a learning aid and quick reference, not as the sole authority
Do not use for security-critical decisions without expert verification

Training Data

The model was fine-tuned on 1,026 curated sysadmin Q&A pairs covering:

Linux administration (disk, CPU, memory, processes, users, filesystems)
Cloud platforms (AWS EC2, S3, VPC, IAM, RDS, CloudWatch, Lambda, EKS)
Containerization (Docker, Kubernetes, Podman)
CI/CD (Jenkins, GitHub Actions, ArgoCD)
Infrastructure as Code (Terraform, Ansible, Packer)
Web servers (Nginx, Apache, Varnish)
Databases (MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch)
Networking (DNS, firewalls, load balancing, VPN, TCP/IP)
Security (SSL/TLS, SELinux, AppArmor, vulnerability scanning)
Monitoring (Prometheus, Grafana, Zabbix, ELK)
Backup (BorgBackup, Restic, snapshots)
Model identity, creator information, and boundary/refusal examples

Evaluation

Testing: Manual evaluation with diverse sysadmin questions
Training loss: Final loss of ~0.5–0.8
Qualitative assessment: Responses checked for accuracy, practicality, and completeness

Results:

Provides accurate, practical answers for common sysadmin and DevOps tasks
Correctly identifies itself as phi3-sysadmin created by Lalatendu Keshari Swain
Appropriately refuses off-topic, harmful, and out-of-scope requests
Handles variations in question phrasing well

Environmental Impact

Item	Value
Hardware	NVIDIA T4 GPU (16GB VRAM)
Training duration	~~6 minutes (~~0.1 hours)
Cloud provider	Google Colab (free tier)
Compute region	Variable (Google Colab assigned)
Estimated CO₂	~0.01 kg CO₂eq

Technical Specifications

Architecture: Phi-3 Mini transformer decoder-only (3.8B parameters)
Objective: Causal language modeling, fine-tuned for sysadmin domain
Context length: 4096 tokens
Chat format: Phi-3 template with <|system|>, <|user|>, <|assistant|>, <|end|> tokens
Inference runtime: Ollama (minimum 4GB RAM)
Inference speed (CPU): ~10–20 tokens/sec
Inference speed (GPU): ~40–80 tokens/sec

Software stack:

Training: Unsloth + Hugging Face Transformers + PEFT 0.18.1 + PyTorch 2.x
Quantization: Unsloth GGUF exporter (llama.cpp based, q4_k_m)
Inference: Ollama

Files in This Repository

File	Size	Description
`phi3-sysadmin-Q4_K_M.gguf`	~2.3 GB	Quantized GGUF model for Ollama / llama.cpp
`Modelfile`	~0.4 KB	Ollama model configuration
`phi3_finetune.ipynb`	~60 KB	Full QLoRA training notebook (Google Colab)

Related Repositories

LoRA Adapter + Training Data: lalatendu/phi3-sysadmin-lora
Base Model: microsoft/Phi-3-mini-4k-instruct

Citation

@misc{phi3-sysadmin-lalatendu-2026,
  author    = {Swain, Lalatendu Keshari},
  title     = {phi3-sysadmin-lalatendu: A Fine-tuned Phi-3 Mini GGUF Model for Linux System Administration},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/lalatendu/phi3-sysadmin-lalatendu}
}

APA: Swain, L. K. (2026). phi3-sysadmin-lalatendu: A Fine-tuned Phi-3 Mini GGUF Model for Linux System Administration. HuggingFace. https://huggingface.co/lalatendu/phi3-sysadmin-lalatendu

Model Card Authors

Lalatendu Keshari Swain

Contact

Channel	Link
Website	lalatendu.info
Blog	blog.lalatendu.info
GitHub	github.com/lalatenduswain
LinkedIn	linkedin.com/in/lalatenduswain
Email	swain@lalatendu.info

Downloads last month: 14

GGUF

Model size

4B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for lalatendu/phi3-sysadmin-lalatendu

Base model

microsoft/Phi-3-mini-4k-instruct

Adapter

(857)

this model