Instructions to use THARX/THAR.0X with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use THARX/THAR.0X with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="THARX/THAR.0X", filename="THAR.0X-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use THARX/THAR.0X with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf THARX/THAR.0X:Q4_K_M # Run inference directly in the terminal: llama-cli -hf THARX/THAR.0X:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf THARX/THAR.0X:Q4_K_M # Run inference directly in the terminal: llama-cli -hf THARX/THAR.0X:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf THARX/THAR.0X:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf THARX/THAR.0X:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf THARX/THAR.0X:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf THARX/THAR.0X:Q4_K_M
Use Docker
docker model run hf.co/THARX/THAR.0X:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use THARX/THAR.0X with Ollama:
ollama run hf.co/THARX/THAR.0X:Q4_K_M
- Unsloth Studio new
How to use THARX/THAR.0X with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for THARX/THAR.0X to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for THARX/THAR.0X to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for THARX/THAR.0X to start chatting
- Pi new
How to use THARX/THAR.0X with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf THARX/THAR.0X:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "THARX/THAR.0X:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use THARX/THAR.0X with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf THARX/THAR.0X:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default THARX/THAR.0X:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use THARX/THAR.0X with Docker Model Runner:
docker model run hf.co/THARX/THAR.0X:Q4_K_M
- Lemonade
How to use THARX/THAR.0X with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull THARX/THAR.0X:Q4_K_M
Run and chat with the model
lemonade run user.THAR.0X-Q4_K_M
List all available models
lemonade list
| """ | |
| THAR.0X β app.py | |
| Model-agnostic cognitive architecture interface. | |
| Supports: | |
| - Ollama (http://localhost:11434) | |
| - LM Studio (http://localhost:1234) | |
| - Any OpenAI-compatible local server | |
| Usage: | |
| python app.py # interactive CLI chat | |
| python app.py --backend lmstudio # use LM Studio instead of Ollama | |
| python app.py --model qwen2.5:14b # override model | |
| python app.py --once "Who are you?" # single query, print and exit | |
| Requirements: | |
| pip install openai requests | |
| """ | |
| import argparse | |
| import json | |
| import pathlib | |
| import sys | |
| import textwrap | |
| from typing import Optional | |
| # --------------------------------------------------------------------------- | |
| # Load assets | |
| # --------------------------------------------------------------------------- | |
| SCRIPT_DIR = pathlib.Path(__file__).parent.resolve() | |
| def load_system_prompt() -> str: | |
| path = SCRIPT_DIR / "system_prompt.txt" | |
| if not path.exists(): | |
| print(f"[ERROR] system_prompt.txt not found at {path}") | |
| print("Make sure system_prompt.txt is in the same directory as app.py.") | |
| sys.exit(1) | |
| return path.read_text(encoding="utf-8").strip() | |
| def load_config() -> dict: | |
| path = SCRIPT_DIR / "config.json" | |
| if not path.exists(): | |
| return {} | |
| with open(path, encoding="utf-8") as f: | |
| return json.load(f) | |
| # --------------------------------------------------------------------------- | |
| # Backend abstraction | |
| # --------------------------------------------------------------------------- | |
| BACKENDS = { | |
| "ollama": { | |
| "base_url": "http://localhost:11434/v1", | |
| "api_key": "ollama", | |
| "default_model": "THAR.0X", | |
| }, | |
| "lmstudio": { | |
| "base_url": "http://localhost:1234/v1", | |
| "api_key": "lm-studio", | |
| "default_model": "local-model", | |
| }, | |
| } | |
| def build_client(backend: str): | |
| """Return an OpenAI-compatible client for the chosen backend.""" | |
| try: | |
| from openai import OpenAI | |
| except ImportError: | |
| print("[ERROR] openai package not installed.") | |
| print("Run: pip install openai") | |
| sys.exit(1) | |
| cfg = BACKENDS.get(backend) | |
| if cfg is None: | |
| print(f"[ERROR] Unknown backend '{backend}'. Choose: {list(BACKENDS.keys())}") | |
| sys.exit(1) | |
| return OpenAI(base_url=cfg["base_url"], api_key=cfg["api_key"]) | |
| def check_server(backend: str) -> bool: | |
| """Ping the server to confirm it's running before starting chat.""" | |
| import requests | |
| cfg = BACKENDS[backend] | |
| url = cfg["base_url"].replace("/v1", "") | |
| try: | |
| r = requests.get(url, timeout=3) | |
| return r.status_code < 500 | |
| except Exception: | |
| return False | |
| # --------------------------------------------------------------------------- | |
| # Chat engine | |
| # --------------------------------------------------------------------------- | |
| class THAR0X: | |
| def __init__( | |
| self, | |
| backend: str = "ollama", | |
| model: Optional[str] = None, | |
| verbose: bool = False, | |
| ): | |
| self.config = load_config() | |
| self.system_prompt = load_system_prompt() | |
| self.backend = backend | |
| self.client = build_client(backend) | |
| self.history: list[dict] = [] | |
| self.verbose = verbose | |
| # Model: CLI arg > config default > backend default | |
| inf = self.config.get("inference", {}) | |
| backend_cfg = BACKENDS[backend] | |
| self.model = model or backend_cfg["default_model"] | |
| # Inference parameters from config.json | |
| self.temperature = inf.get("temperature", 0.85) | |
| self.top_p = inf.get("top_p", 0.92) | |
| self.max_tokens = inf.get("max_tokens", 2048) | |
| if self.verbose: | |
| print(f"[THAR.0X] Backend: {backend} | Model: {self.model}") | |
| print(f"[THAR.0X] Temp: {self.temperature} | Top-p: {self.top_p} | Max tokens: {self.max_tokens}") | |
| def chat(self, user_message: str) -> str: | |
| """Send a message and return the assistant reply. History is maintained.""" | |
| self.history.append({"role": "user", "content": user_message}) | |
| messages = [ | |
| {"role": "system", "content": self.system_prompt}, | |
| *self.history, | |
| ] | |
| try: | |
| response = self.client.chat.completions.create( | |
| model=self.model, | |
| messages=messages, | |
| temperature=self.temperature, | |
| top_p=self.top_p, | |
| max_tokens=self.max_tokens, | |
| ) | |
| except Exception as e: | |
| error_msg = f"[ERROR] API call failed: {e}" | |
| print(error_msg, file=sys.stderr) | |
| return error_msg | |
| reply = response.choices[0].message.content | |
| self.history.append({"role": "assistant", "content": reply}) | |
| return reply | |
| def reset(self): | |
| """Clear conversation history.""" | |
| self.history = [] | |
| print("[THAR.0X] Conversation reset.") | |
| def show_history(self): | |
| """Print conversation history.""" | |
| if not self.history: | |
| print("[THAR.0X] No conversation history yet.") | |
| return | |
| for i, turn in enumerate(self.history): | |
| role = "YOU" if turn["role"] == "user" else "THAR.0X" | |
| print(f"\n[{role}] {turn['content']}") | |
| # --------------------------------------------------------------------------- | |
| # CLI interface | |
| # --------------------------------------------------------------------------- | |
| BANNER = """ | |
| ββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β T H A R . 0 X β | |
| β Cognitive Architecture β Local Intelligence β | |
| β Zero as in origin. X as in unlimited. β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Commands: | |
| /reset β clear conversation history | |
| /history β show full conversation | |
| /model β show current model and backend | |
| /quit β exit | |
| """ | |
| def run_interactive(agent: THAR0X): | |
| print(BANNER) | |
| print(f"Backend: {agent.backend.upper()} | Model: {agent.model}\n") | |
| while True: | |
| try: | |
| user_input = input("YOU > ").strip() | |
| except (EOFError, KeyboardInterrupt): | |
| print("\n[THAR.0X] Session ended.") | |
| break | |
| if not user_input: | |
| continue | |
| # Commands | |
| if user_input.lower() in ("/quit", "/exit", "quit", "exit"): | |
| print("[THAR.0X] Session ended.") | |
| break | |
| elif user_input.lower() == "/reset": | |
| agent.reset() | |
| continue | |
| elif user_input.lower() == "/history": | |
| agent.show_history() | |
| continue | |
| elif user_input.lower() == "/model": | |
| print(f"[THAR.0X] Backend: {agent.backend} | Model: {agent.model}") | |
| continue | |
| # Normal message | |
| print("\nTHAR.0X > ", end="", flush=True) | |
| reply = agent.chat(user_input) | |
| # Word-wrap long replies for terminal readability | |
| wrapped = textwrap.fill( | |
| reply, | |
| width=90, | |
| subsequent_indent=" ", | |
| break_long_words=False, | |
| break_on_hyphens=False, | |
| ) | |
| print(wrapped) | |
| print() | |
| # --------------------------------------------------------------------------- | |
| # Entry point | |
| # --------------------------------------------------------------------------- | |
| def parse_args(): | |
| parser = argparse.ArgumentParser( | |
| description="THAR.0X β Model-agnostic cognitive architecture CLI", | |
| formatter_class=argparse.RawDescriptionHelpFormatter, | |
| epilog=textwrap.dedent(""" | |
| Examples: | |
| python app.py | |
| python app.py --backend lmstudio | |
| python app.py --model qwen2.5:14b | |
| python app.py --once "Explain consciousness in one paragraph." | |
| python app.py --backend lmstudio --model Qwen2.5-14B --verbose | |
| """), | |
| ) | |
| parser.add_argument( | |
| "--backend", | |
| choices=list(BACKENDS.keys()), | |
| default="ollama", | |
| help="Which local LLM server to use (default: ollama)", | |
| ) | |
| parser.add_argument( | |
| "--model", | |
| default=None, | |
| help="Model name override. For Ollama: 'qwen2.5:14b'. For LM Studio: model filename.", | |
| ) | |
| parser.add_argument( | |
| "--once", | |
| metavar="PROMPT", | |
| default=None, | |
| help="Send a single prompt, print the reply, and exit.", | |
| ) | |
| parser.add_argument( | |
| "--verbose", | |
| action="store_true", | |
| help="Print inference parameters on startup.", | |
| ) | |
| parser.add_argument( | |
| "--no-check", | |
| action="store_true", | |
| help="Skip server connectivity check on startup.", | |
| ) | |
| return parser.parse_args() | |
| def main(): | |
| args = parse_args() | |
| # Server check | |
| if not args.no_check: | |
| print(f"[THAR.0X] Checking {args.backend} server...", end=" ", flush=True) | |
| if check_server(args.backend): | |
| print("OK") | |
| else: | |
| print("FAILED") | |
| print(f"\n[ERROR] Cannot reach {args.backend} server.") | |
| if args.backend == "ollama": | |
| print("Start it with: ollama serve") | |
| print("If THAR.0X model not created yet: ollama create THAR.0X -f Modelfile") | |
| elif args.backend == "lmstudio": | |
| print("Start LM Studio, load a model, and enable the local server.") | |
| print("\nUse --no-check to skip this check.") | |
| sys.exit(1) | |
| # Build agent | |
| agent = THAR0X( | |
| backend=args.backend, | |
| model=args.model, | |
| verbose=args.verbose, | |
| ) | |
| # Single-shot mode | |
| if args.once: | |
| reply = agent.chat(args.once) | |
| print(reply) | |
| return | |
| # Interactive mode | |
| run_interactive(agent) | |
| if __name__ == "__main__": | |
| main() | |