TernaryPhysics-7B

Talk to your infrastructure. TernaryPhysics-7B is a 7B parameter LLM purpose-built for infrastructure operations. It powers tp-ops, a CLI tool that drops AI agents directly onto your resources—Kubernetes clusters, databases, VMs, API gateways, and more.

Unlike generic AI assistants, TernaryPhysics agents live on the resource they manage. They learn that resource's specific patterns, answer questions from direct knowledge, and operate with human approval before any action is taken.


Table of Contents

  1. Why TernaryPhysics
  2. Model Overview
  3. Key Capabilities
  4. Getting Started
  5. Agent Types
  6. Example Conversations
  7. Usage Patterns
  8. Security Model
  9. Limitations
  10. Technical Details

Why TernaryPhysics

Infrastructure operations is stuck between two bad options:

  • Dashboards show data but don't investigate
  • Runbooks follow checklists but don't think

When an incident occurs at 3am, you don't need more dashboards. You need to ask: "What is wrong and how do I fix it?"

TernaryPhysics makes that question literal. You talk to the agent. It talks back. It has answers because it lives on the resource and knows it intimately.

Key differentiators:

Traditional Tools TernaryPhysics
Shows metrics Investigates root cause
Generic thresholds Learns YOUR resource's baseline
Requires context switching Single conversational interface
Cloud-dependent Runs locally on your infrastructure
Subscription pricing Pay per GB processed

Model Overview

Property Value
Parameters 7 billion
Quantization Q4_K_M (4-bit)
File Size 4.4 GB
Format GGUF
Context Length 32,768 tokens
Hardware CPU-only (no GPU required)
License Apache 2.0

The model runs entirely on your infrastructure. No data is sent to external APIs. No cloud dependency. The AI processes your logs, metrics, and configurations locally.


Key Capabilities

Investigation

Ask questions in plain English. The agent reads live data to answer:

  • "Why are pods crashing in the payments namespace?"
  • "What's causing high CPU on this server?"
  • "Show me the slowest queries today"
  • "Are there any connection leaks?"

Root Cause Analysis

The agent doesn't just report symptoms. It traces causality:

  • Compares current state to learned baseline
  • Identifies what changed recently (delta scan)
  • Builds evidence chains from change to symptom
  • Verifies claims before reporting

Remediation

Get specific, actionable commands—not generic advice:

  • Exact kubectl, SQL, or shell commands
  • Risk assessment (LOW / MEDIUM / HIGH)
  • Expected outcome
  • Human approval required before execution

Cross-Resource Correlation

When agents are deployed across multiple resources, they communicate to correlate issues:

  • API latency traced to database lock contention
  • Pod crashes linked to upstream dependency failures
  • Memory leaks correlated with deployment changes

Getting Started

1. Install

pip install tp-ops

2. Drop an Agent

tp-ops drop k8s-agent --onto prod-cluster

The model downloads automatically on first use (~4.4 GB, one-time).

3. Talk to It

tp-ops ask prod-cluster

You're now in a conversation with an AI that knows your cluster.

4. Ask Questions

prod-cluster > what's the status of the payments namespace?

5 pods running (5/5 ready)
• payment-api (3 replicas) — healthy, avg latency 12ms
• payment-worker (1 replica) — processing 847 jobs/hr
• payment-scheduler (1 replica) — last run 4 min ago

Resource usage: 1.2Gi / 4Gi requested (30% utilized)
No events in the last hour.

Agent Types

TernaryPhysics supports 180+ agent types across infrastructure categories:

Category Agent Types Example Use Cases
Compute Kubernetes, VMs, Containers, ECS, Lambda Pod failures, scaling issues, resource contention
Databases PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch Slow queries, connection leaks, replication lag
Networking API Gateways, Load Balancers, DNS, Service Mesh Latency spikes, routing errors, certificate issues
Cloud AWS, Azure, GCP, Multi-cloud Cost analysis, IAM audits, resource drift
Monitoring Prometheus, Datadog, CloudWatch, Grafana Alert correlation, metric anomalies
Security RBAC, Secrets, CVEs, TLS, Compliance Privilege escalation, exposed credentials, audit gaps
CI/CD Jenkins, GitHub Actions, ArgoCD Pipeline failures, deployment issues
Storage S3, EBS, NFS, Ceph Capacity planning, access patterns

Each agent is specialized for its resource type with domain-specific knowledge, tools, and investigation patterns.


Example Conversations

Kubernetes Troubleshooting

prod-cluster > pods keep restarting in the api namespace

Checking api namespace...

Found 3 pods in CrashLoopBackOff:
• api-gateway-7d4b8c (12 restarts in 2h)
• api-gateway-9f2a1b (11 restarts in 2h)
• api-gateway-3c8e7f (10 restarts in 2h)

All pods failing with: "connection refused: redis:6379"

Checking redis... Redis pod is OOMKilled. Memory limit is 256Mi 
but usage spiked to 1.2Gi after a config change 3 hours ago 
enabled a new caching feature.

Root cause: Redis OOM due to aggressive caching config.

Recommended fix:
  kubectl set resources deployment/redis -c redis --limits=memory=2Gi

Risk: LOW — increases memory allocation, no restart required.

Approve? [yes/no]

Database Performance

payments-db > why is the app so slow today?

Analyzing query performance...

Top bottleneck identified:
  SELECT * FROM transactions WHERE status = 'pending' 
  AND created_at > now() - interval '24 hours'
  
  Avg: 890ms | Calls: 720 today | Plan: Sequential Scan

This query is doing a full table scan on 42M rows. 
No index exists for (status, created_at).

Recommended index:
  CREATE INDEX CONCURRENTLY idx_txn_status_created 
  ON transactions (status, created_at) 
  WHERE status = 'pending';

Expected improvement: 890ms → ~5ms (178x faster)
Build time: ~2 minutes, no table lock.

Create this index? [yes/no]

VM Investigation

api-server-03 > what's eating all the CPU?

Top processes by CPU:
  PID 23847  java (api-server)     94% CPU   14.2 GB RSS
  PID 1892   java (elasticsearch)  12% CPU   8.1 GB RSS
  
api-server (PID 23847) is significantly above baseline.
Baseline CPU: 22% | Current: 94%

Checking recent changes...
• 6h ago: Deployed v3.15.0 (was v3.14.2)
• Config change: elasticsearch.retry.maxAttempts: 3 → 10
• Config change: elasticsearch.retry.backoff: 1000 → 5000ms

The new retry settings are causing thread pile-up during 
ES timeouts. Each retry holds a thread for up to 50 seconds.

Recommendation: Rollback to v3.14.2 or fix retry configuration.

Usage Patterns

Interactive Conversation

tp-ops ask <resource>

Open a conversation with an agent. Ask multiple questions. The agent maintains context throughout the session.

One-Shot Investigation

tp-ops run <resource> "describe the problem"

Run a single investigation and get a report. Useful for scripts and automation.

Proactive Scan

tp-ops scan <resource>

The agent proactively looks for issues: misconfigurations, resource waste, security gaps, performance problems.

Approve Actions

tp-ops approve <resource>:action-1
tp-ops reject <resource>:action-1

Review and approve recommended actions from any interface.

Direct API Usage with llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="TernaryPhysics/TernaryPhysics-7B-GGUF",
    filename="TernaryPhysics-7B-Q4_K_M.gguf",
    n_ctx=4096
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "system", 
            "content": "You are a Kubernetes operations expert for prod-cluster."
        },
        {
            "role": "user", 
            "content": "3 pods are in CrashLoopBackOff. What should I check first?"
        }
    ],
    temperature=0.7,
    max_tokens=512
)

print(response["choices"][0]["message"]["content"])
Direct Download
# Using huggingface-cli
huggingface-cli download TernaryPhysics/TernaryPhysics-7B-GGUF TernaryPhysics-7B-Q4_K_M.gguf

# Using wget
wget https://huggingface.co/TernaryPhysics/TernaryPhysics-7B-GGUF/resolve/main/TernaryPhysics-7B-Q4_K_M.gguf

Security Model

TernaryPhysics is designed with security as a core principle:

Human-in-the-Loop

  • Read operations (logs, metrics, configs) are autonomous
  • Write operations (scale, restart, modify) require explicit human approval
  • No auto-approve. No batch approve. Every write is individually confirmed.

Local Execution

  • Model runs entirely on your infrastructure
  • No data sent to external AI APIs
  • No cloud inference
  • Only billing metadata leaves your environment

No Credential Storage

  • Uses your existing auth: kubeconfig, SSH keys, database credentials
  • Credentials are never copied, cached, or transmitted
  • Leverages your existing RBAC and access controls

Audit Trail

  • Every conversation logged
  • Every action tracked
  • Every approval recorded
  • Export to your SIEM

Limitations

  • Language: Optimized for English infrastructure and DevOps conversations
  • Domain: Best for infrastructure operations; not designed for general chat, creative writing, or code generation outside ops context
  • Verification: May generate incorrect commands—always verify before executing
  • Context: Best results when system prompts specify the operational context
  • Size: 7B parameters provides good quality/speed tradeoff but larger models may perform better on complex reasoning

Technical Details

Property Value
Architecture Qwen2 Transformer
Parameters 7.6B (7B effective)
Layers 28
Attention Heads 28
Context Length 32,768 tokens
Vocabulary 152,064 tokens
Quantization Q4_K_M (4.91 bits per weight)
Inference Engine llama.cpp
Base Model Qwen2.5-7B-Instruct

Inference Performance

Typical performance on consumer hardware:

Hardware Tokens/sec
Apple M1 ~15 tok/s
Apple M2 Pro ~25 tok/s
Intel i7 (12th gen) ~10 tok/s
AMD Ryzen 9 ~12 tok/s

Performance varies based on context length and system load.


Links


Built by TernaryPhysics
Downloads last month
120
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support