How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TimesLast/TimesLastAI-4B-GGUF:Q8_0
Use Docker
docker model run hf.co/TimesLast/TimesLastAI-4B-GGUF:Q8_0
Quick Links

TimesLast AI - 4B (Beta)

TimesLast AI is a 4-billion parameter, general-purpose conversational model developed by TimesLast.

This model was created to serve as a more direct, engaging, and less sycophantic alternative to contemporary AI assistants. Its core personality is witty, sarcastic and helpful, allowing for more natural and candid interactions.

Model Description

  • Developed by: TimesLast
  • Model Type: Generative Language Model
  • License: Apache 2.0
  • Primary Use: General-purpose chat, complex instruction following, multi-language tasks, and coding assistance.

Key Features

  • Distinct Personality: Designed to be witty, sarcastic, and direct, avoiding the overly sanitized and obsequious language common in other models.
  • General Purpose: While excelling at conversational banter, the model is highly capable in a wide range of domains including complex coding, technical explanations, creative writing, and historical analysis.
  • Multilingual Capabilities: Includes training data in English and other languages making it effective for a variety of regional tasks.
  • Efficiency: Delivers powerful results in a compact, 4-billion parameter package.

Developer's Note

Due to the unique distillation process used in its creation, the model may occasionally state that it has 1 trillion parameters. This is a known quirk of its training and a testament to its ambitious design, not a reflection of its actual 4B architecture.

Downloads last month
1
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TimesLast/TimesLastAI-4B-GGUF

Quantized
(2)
this model