TernaryPhysics-7B
Talk to your infrastructure. TernaryPhysics-7B is a 7B parameter LLM purpose-built for infrastructure operations. It powers tp-ops, a CLI tool that drops AI agents directly onto your resources—Kubernetes clusters, databases, VMs, API gateways, and more.
Unlike generic AI assistants, TernaryPhysics agents live on the resource they manage. They learn that resource's specific patterns, answer questions from direct knowledge, and operate with human approval before any action is taken.
Table of Contents
- Why TernaryPhysics
- Model Overview
- Key Capabilities
- Getting Started
- Agent Types
- Example Conversations
- Usage Patterns
- Security Model
- Limitations
- Technical Details
Why TernaryPhysics
Infrastructure operations is stuck between two bad options:
- Dashboards show data but don't investigate
- Runbooks follow checklists but don't think
When an incident occurs at 3am, you don't need more dashboards. You need to ask: "What is wrong and how do I fix it?"
TernaryPhysics makes that question literal. You talk to the agent. It talks back. It has answers because it lives on the resource and knows it intimately.
Key differentiators:
| Traditional Tools | TernaryPhysics |
|---|---|
| Shows metrics | Investigates root cause |
| Generic thresholds | Learns YOUR resource's baseline |
| Requires context switching | Single conversational interface |
| Cloud-dependent | Runs locally on your infrastructure |
| Subscription pricing | Pay per GB processed |
Model Overview
| Property | Value |
|---|---|
| Parameters | 7 billion |
| Quantization | Q4_K_M (4-bit) |
| File Size | 4.4 GB |
| Format | GGUF |
| Context Length | 32,768 tokens |
| Hardware | CPU-only (no GPU required) |
| License | Apache 2.0 |
The model runs entirely on your infrastructure. No data is sent to external APIs. No cloud dependency. The AI processes your logs, metrics, and configurations locally.
Key Capabilities
Investigation
Ask questions in plain English. The agent reads live data to answer:
- "Why are pods crashing in the payments namespace?"
- "What's causing high CPU on this server?"
- "Show me the slowest queries today"
- "Are there any connection leaks?"
Root Cause Analysis
The agent doesn't just report symptoms. It traces causality:
- Compares current state to learned baseline
- Identifies what changed recently (delta scan)
- Builds evidence chains from change to symptom
- Verifies claims before reporting
Remediation
Get specific, actionable commands—not generic advice:
- Exact kubectl, SQL, or shell commands
- Risk assessment (LOW / MEDIUM / HIGH)
- Expected outcome
- Human approval required before execution
Cross-Resource Correlation
When agents are deployed across multiple resources, they communicate to correlate issues:
- API latency traced to database lock contention
- Pod crashes linked to upstream dependency failures
- Memory leaks correlated with deployment changes
Getting Started
1. Install
pip install tp-ops
2. Drop an Agent
tp-ops drop k8s-agent --onto prod-cluster
The model downloads automatically on first use (~4.4 GB, one-time).
3. Talk to It
tp-ops ask prod-cluster
You're now in a conversation with an AI that knows your cluster.
4. Ask Questions
prod-cluster > what's the status of the payments namespace?
5 pods running (5/5 ready)
• payment-api (3 replicas) — healthy, avg latency 12ms
• payment-worker (1 replica) — processing 847 jobs/hr
• payment-scheduler (1 replica) — last run 4 min ago
Resource usage: 1.2Gi / 4Gi requested (30% utilized)
No events in the last hour.
Agent Types
TernaryPhysics supports 180+ agent types across infrastructure categories:
| Category | Agent Types | Example Use Cases |
|---|---|---|
| Compute | Kubernetes, VMs, Containers, ECS, Lambda | Pod failures, scaling issues, resource contention |
| Databases | PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch | Slow queries, connection leaks, replication lag |
| Networking | API Gateways, Load Balancers, DNS, Service Mesh | Latency spikes, routing errors, certificate issues |
| Cloud | AWS, Azure, GCP, Multi-cloud | Cost analysis, IAM audits, resource drift |
| Monitoring | Prometheus, Datadog, CloudWatch, Grafana | Alert correlation, metric anomalies |
| Security | RBAC, Secrets, CVEs, TLS, Compliance | Privilege escalation, exposed credentials, audit gaps |
| CI/CD | Jenkins, GitHub Actions, ArgoCD | Pipeline failures, deployment issues |
| Storage | S3, EBS, NFS, Ceph | Capacity planning, access patterns |
Each agent is specialized for its resource type with domain-specific knowledge, tools, and investigation patterns.
Example Conversations
Kubernetes Troubleshooting
prod-cluster > pods keep restarting in the api namespace
Checking api namespace...
Found 3 pods in CrashLoopBackOff:
• api-gateway-7d4b8c (12 restarts in 2h)
• api-gateway-9f2a1b (11 restarts in 2h)
• api-gateway-3c8e7f (10 restarts in 2h)
All pods failing with: "connection refused: redis:6379"
Checking redis... Redis pod is OOMKilled. Memory limit is 256Mi
but usage spiked to 1.2Gi after a config change 3 hours ago
enabled a new caching feature.
Root cause: Redis OOM due to aggressive caching config.
Recommended fix:
kubectl set resources deployment/redis -c redis --limits=memory=2Gi
Risk: LOW — increases memory allocation, no restart required.
Approve? [yes/no]
Database Performance
payments-db > why is the app so slow today?
Analyzing query performance...
Top bottleneck identified:
SELECT * FROM transactions WHERE status = 'pending'
AND created_at > now() - interval '24 hours'
Avg: 890ms | Calls: 720 today | Plan: Sequential Scan
This query is doing a full table scan on 42M rows.
No index exists for (status, created_at).
Recommended index:
CREATE INDEX CONCURRENTLY idx_txn_status_created
ON transactions (status, created_at)
WHERE status = 'pending';
Expected improvement: 890ms → ~5ms (178x faster)
Build time: ~2 minutes, no table lock.
Create this index? [yes/no]
VM Investigation
api-server-03 > what's eating all the CPU?
Top processes by CPU:
PID 23847 java (api-server) 94% CPU 14.2 GB RSS
PID 1892 java (elasticsearch) 12% CPU 8.1 GB RSS
api-server (PID 23847) is significantly above baseline.
Baseline CPU: 22% | Current: 94%
Checking recent changes...
• 6h ago: Deployed v3.15.0 (was v3.14.2)
• Config change: elasticsearch.retry.maxAttempts: 3 → 10
• Config change: elasticsearch.retry.backoff: 1000 → 5000ms
The new retry settings are causing thread pile-up during
ES timeouts. Each retry holds a thread for up to 50 seconds.
Recommendation: Rollback to v3.14.2 or fix retry configuration.
Usage Patterns
Interactive Conversation
tp-ops ask <resource>
Open a conversation with an agent. Ask multiple questions. The agent maintains context throughout the session.
One-Shot Investigation
tp-ops run <resource> "describe the problem"
Run a single investigation and get a report. Useful for scripts and automation.
Proactive Scan
tp-ops scan <resource>
The agent proactively looks for issues: misconfigurations, resource waste, security gaps, performance problems.
Approve Actions
tp-ops approve <resource>:action-1
tp-ops reject <resource>:action-1
Review and approve recommended actions from any interface.
Direct API Usage with llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="TernaryPhysics/TernaryPhysics-7B-GGUF",
filename="TernaryPhysics-7B-Q4_K_M.gguf",
n_ctx=4096
)
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a Kubernetes operations expert for prod-cluster."
},
{
"role": "user",
"content": "3 pods are in CrashLoopBackOff. What should I check first?"
}
],
temperature=0.7,
max_tokens=512
)
print(response["choices"][0]["message"]["content"])
Direct Download
# Using huggingface-cli
huggingface-cli download TernaryPhysics/TernaryPhysics-7B-GGUF TernaryPhysics-7B-Q4_K_M.gguf
# Using wget
wget https://huggingface.co/TernaryPhysics/TernaryPhysics-7B-GGUF/resolve/main/TernaryPhysics-7B-Q4_K_M.gguf
Security Model
TernaryPhysics is designed with security as a core principle:
Human-in-the-Loop
- Read operations (logs, metrics, configs) are autonomous
- Write operations (scale, restart, modify) require explicit human approval
- No auto-approve. No batch approve. Every write is individually confirmed.
Local Execution
- Model runs entirely on your infrastructure
- No data sent to external AI APIs
- No cloud inference
- Only billing metadata leaves your environment
No Credential Storage
- Uses your existing auth: kubeconfig, SSH keys, database credentials
- Credentials are never copied, cached, or transmitted
- Leverages your existing RBAC and access controls
Audit Trail
- Every conversation logged
- Every action tracked
- Every approval recorded
- Export to your SIEM
Limitations
- Language: Optimized for English infrastructure and DevOps conversations
- Domain: Best for infrastructure operations; not designed for general chat, creative writing, or code generation outside ops context
- Verification: May generate incorrect commands—always verify before executing
- Context: Best results when system prompts specify the operational context
- Size: 7B parameters provides good quality/speed tradeoff but larger models may perform better on complex reasoning
Technical Details
| Property | Value |
|---|---|
| Architecture | Qwen2 Transformer |
| Parameters | 7.6B (7B effective) |
| Layers | 28 |
| Attention Heads | 28 |
| Context Length | 32,768 tokens |
| Vocabulary | 152,064 tokens |
| Quantization | Q4_K_M (4.91 bits per weight) |
| Inference Engine | llama.cpp |
| Base Model | Qwen2.5-7B-Instruct |
Inference Performance
Typical performance on consumer hardware:
| Hardware | Tokens/sec |
|---|---|
| Apple M1 | ~15 tok/s |
| Apple M2 Pro | ~25 tok/s |
| Intel i7 (12th gen) | ~10 tok/s |
| AMD Ryzen 9 | ~12 tok/s |
Performance varies based on context length and system load.
Links
- CLI & Agents: github.com/TernaryPhysics/ternary-runtime
- Documentation: docs.ternaryphysics.com
- Website: ternaryphysics.com
- Downloads last month
- 120
4-bit