TernaryPhysics-7B

Talk to your infrastructure. TernaryPhysics-7B is a 7B parameter LLM purpose-built for infrastructure operations. It powers tp-ops, a CLI tool that drops AI agents directly onto your resources—Kubernetes clusters, databases, VMs, API gateways, and more.

Unlike generic AI assistants, TernaryPhysics agents live on the resource they manage. They learn that resource's specific patterns, answer questions from direct knowledge, and operate with human approval before any action is taken.

Why TernaryPhysics
Model Overview
Key Capabilities
Getting Started
Agent Types
Example Conversations
Usage Patterns
Security Model
Limitations
Technical Details

Why TernaryPhysics

Infrastructure operations is stuck between two bad options:

Dashboards show data but don't investigate
Runbooks follow checklists but don't think

When an incident occurs at 3am, you don't need more dashboards. You need to ask: "What is wrong and how do I fix it?"

TernaryPhysics makes that question literal. You talk to the agent. It talks back. It has answers because it lives on the resource and knows it intimately.

Key differentiators:

Traditional Tools	TernaryPhysics
Shows metrics	Investigates root cause
Generic thresholds	Learns YOUR resource's baseline
Requires context switching	Single conversational interface
Cloud-dependent	Runs locally on your infrastructure
Subscription pricing	Pay per GB processed

Model Overview

Property	Value
Parameters	7 billion
Quantization	Q4_K_M (4-bit)
File Size	4.4 GB
Format	GGUF
Context Length	32,768 tokens
Hardware	CPU-only (no GPU required)
License	Apache 2.0

The model runs entirely on your infrastructure. No data is sent to external APIs. No cloud dependency. The AI processes your logs, metrics, and configurations locally.

Key Capabilities

Investigation

Ask questions in plain English. The agent reads live data to answer:

"Why are pods crashing in the payments namespace?"
"What's causing high CPU on this server?"
"Show me the slowest queries today"
"Are there any connection leaks?"

Root Cause Analysis

The agent doesn't just report symptoms. It traces causality:

Compares current state to learned baseline
Identifies what changed recently (delta scan)
Builds evidence chains from change to symptom
Verifies claims before reporting

Remediation

Get specific, actionable commands—not generic advice:

Exact kubectl, SQL, or shell commands
Risk assessment (LOW / MEDIUM / HIGH)
Expected outcome
Human approval required before execution

Cross-Resource Correlation

When agents are deployed across multiple resources, they communicate to correlate issues:

API latency traced to database lock contention
Pod crashes linked to upstream dependency failures
Memory leaks correlated with deployment changes

Getting Started

1. Install

pip install tp-ops

2. Drop an Agent

tp-ops drop k8s-agent --onto prod-cluster

The model downloads automatically on first use (~4.4 GB, one-time).

3. Talk to It

tp-ops ask prod-cluster

You're now in a conversation with an AI that knows your cluster.

4. Ask Questions

prod-cluster > what's the status of the payments namespace?

5 pods running (5/5 ready)
• payment-api (3 replicas) — healthy, avg latency 12ms
• payment-worker (1 replica) — processing 847 jobs/hr
• payment-scheduler (1 replica) — last run 4 min ago

Resource usage: 1.2Gi / 4Gi requested (30% utilized)
No events in the last hour.

Agent Types

TernaryPhysics supports 180+ agent types across infrastructure categories:

Category	Agent Types	Example Use Cases
Compute	Kubernetes, VMs, Containers, ECS, Lambda	Pod failures, scaling issues, resource contention
Databases	PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch	Slow queries, connection leaks, replication lag
Networking	API Gateways, Load Balancers, DNS, Service Mesh	Latency spikes, routing errors, certificate issues
Cloud	AWS, Azure, GCP, Multi-cloud	Cost analysis, IAM audits, resource drift
Monitoring	Prometheus, Datadog, CloudWatch, Grafana	Alert correlation, metric anomalies
Security	RBAC, Secrets, CVEs, TLS, Compliance	Privilege escalation, exposed credentials, audit gaps
CI/CD	Jenkins, GitHub Actions, ArgoCD	Pipeline failures, deployment issues
Storage	S3, EBS, NFS, Ceph	Capacity planning, access patterns

Each agent is specialized for its resource type with domain-specific knowledge, tools, and investigation patterns.

Example Conversations

Kubernetes Troubleshooting

prod-cluster > pods keep restarting in the api namespace

Checking api namespace...

Found 3 pods in CrashLoopBackOff:
• api-gateway-7d4b8c (12 restarts in 2h)
• api-gateway-9f2a1b (11 restarts in 2h)
• api-gateway-3c8e7f (10 restarts in 2h)

All pods failing with: "connection refused: redis:6379"

Checking redis... Redis pod is OOMKilled. Memory limit is 256Mi 
but usage spiked to 1.2Gi after a config change 3 hours ago 
enabled a new caching feature.

Root cause: Redis OOM due to aggressive caching config.

Recommended fix:
  kubectl set resources deployment/redis -c redis --limits=memory=2Gi

Risk: LOW — increases memory allocation, no restart required.

Approve? [yes/no]

Database Performance

payments-db > why is the app so slow today?

Analyzing query performance...

Top bottleneck identified:
  SELECT * FROM transactions WHERE status = 'pending' 
  AND created_at > now() - interval '24 hours'
  
  Avg: 890ms | Calls: 720 today | Plan: Sequential Scan

This query is doing a full table scan on 42M rows. 
No index exists for (status, created_at).

Recommended index:
  CREATE INDEX CONCURRENTLY idx_txn_status_created 
  ON transactions (status, created_at) 
  WHERE status = 'pending';

Expected improvement: 890ms → ~5ms (178x faster)
Build time: ~2 minutes, no table lock.

Create this index? [yes/no]

VM Investigation

api-server-03 > what's eating all the CPU?

Top processes by CPU:
  PID 23847  java (api-server)     94% CPU   14.2 GB RSS
  PID 1892   java (elasticsearch)  12% CPU   8.1 GB RSS
  
api-server (PID 23847) is significantly above baseline.
Baseline CPU: 22% | Current: 94%

Checking recent changes...
• 6h ago: Deployed v3.15.0 (was v3.14.2)
• Config change: elasticsearch.retry.maxAttempts: 3 → 10
• Config change: elasticsearch.retry.backoff: 1000 → 5000ms

The new retry settings are causing thread pile-up during 
ES timeouts. Each retry holds a thread for up to 50 seconds.

Recommendation: Rollback to v3.14.2 or fix retry configuration.

Usage Patterns

Interactive Conversation

tp-ops ask <resource>

Open a conversation with an agent. Ask multiple questions. The agent maintains context throughout the session.

One-Shot Investigation

tp-ops run <resource> "describe the problem"

Run a single investigation and get a report. Useful for scripts and automation.

Proactive Scan

tp-ops scan <resource>

The agent proactively looks for issues: misconfigurations, resource waste, security gaps, performance problems.

Approve Actions

tp-ops approve <resource>:action-1
tp-ops reject <resource>:action-1

Review and approve recommended actions from any interface.

Direct API Usage with llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="TernaryPhysics/TernaryPhysics-7B-GGUF",
    filename="TernaryPhysics-7B-Q4_K_M.gguf",
    n_ctx=4096
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "system", 
            "content": "You are a Kubernetes operations expert for prod-cluster."
        },
        {
            "role": "user", 
            "content": "3 pods are in CrashLoopBackOff. What should I check first?"
        }
    ],
    temperature=0.7,
    max_tokens=512
)

print(response["choices"][0]["message"]["content"])

Direct Download

# Using huggingface-cli
huggingface-cli download TernaryPhysics/TernaryPhysics-7B-GGUF TernaryPhysics-7B-Q4_K_M.gguf

# Using wget
wget https://huggingface.co/TernaryPhysics/TernaryPhysics-7B-GGUF/resolve/main/TernaryPhysics-7B-Q4_K_M.gguf

Security Model

TernaryPhysics is designed with security as a core principle:

Human-in-the-Loop

Read operations (logs, metrics, configs) are autonomous
Write operations (scale, restart, modify) require explicit human approval
No auto-approve. No batch approve. Every write is individually confirmed.

Local Execution

Model runs entirely on your infrastructure
No data sent to external AI APIs
No cloud inference
Only billing metadata leaves your environment

No Credential Storage

Uses your existing auth: kubeconfig, SSH keys, database credentials
Credentials are never copied, cached, or transmitted
Leverages your existing RBAC and access controls

Audit Trail

Every conversation logged
Every action tracked
Every approval recorded
Export to your SIEM

Limitations

Language: Optimized for English infrastructure and DevOps conversations
Domain: Best for infrastructure operations; not designed for general chat, creative writing, or code generation outside ops context
Verification: May generate incorrect commands—always verify before executing
Context: Best results when system prompts specify the operational context
Size: 7B parameters provides good quality/speed tradeoff but larger models may perform better on complex reasoning

Technical Details

Property	Value
Architecture	Qwen2 Transformer
Parameters	7.6B (7B effective)
Layers	28
Attention Heads	28
Context Length	32,768 tokens
Vocabulary	152,064 tokens
Quantization	Q4_K_M (4.91 bits per weight)
Inference Engine	llama.cpp
Base Model	Qwen2.5-7B-Instruct

Inference Performance

Typical performance on consumer hardware:

Hardware	Tokens/sec
Apple M1	~15 tok/s
Apple M2 Pro	~25 tok/s
Intel i7 (12th gen)	~10 tok/s
AMD Ryzen 9	~12 tok/s

Performance varies based on context length and system load.

TernaryPhysics
/

TernaryPhysics-7B-GGUF