How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RthItalia/AICE-v1
# Run inference directly in the terminal:
llama-cli -hf RthItalia/AICE-v1
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RthItalia/AICE-v1
# Run inference directly in the terminal:
llama-cli -hf RthItalia/AICE-v1
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RthItalia/AICE-v1
# Run inference directly in the terminal:
./llama-cli -hf RthItalia/AICE-v1
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RthItalia/AICE-v1
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RthItalia/AICE-v1
Use Docker
docker model run hf.co/RthItalia/AICE-v1
Quick Links

AICE-v1 Model Card

Overview

AICE-v1 is a compact conversational model in RWKV architecture, distributed as merged weights for text generation. AICE-v1 is a derivative model initialized from externally pre-trained foundation weights, then innovated with LoRA and merged for inference.

Technical profile

  • Architecture: RWKV causal language model
  • Layers: 24
  • Hidden size: 2048
  • Context length: 1024
  • Vocabulary size: 50277
  • Effective size: about 1.5B parameters
  • Primary format: model.safetensors (FP16/F32 mixed tensors, runtime friendly)

Training strategy

  • Initialization from pre-trained foundation checkpoint weights (third-party origin).
  • Instruction tuning performed with LoRA adapters.
  • Distilled supervision pipeline derived from a larger teacher model family (70B class).
  • LoRA adapters merged into a single consolidated model for inference.

Release intent

  • This repository contains the merged model artifacts for inference.
  • Adapter artifacts are optional internal training artifacts and are not required for runtime.

Suggested use

  • Assistant/chat inference
  • Lightweight deployment scenarios (desktop and mobile with quantized variants)
  • Prompt-based reasoning tasks

Limitations

  • Behavior quality depends on prompt design and decoding setup.
  • The model can still produce hallucinations and incorrect factual outputs.
  • Safety filtering and evaluation are required in production.

Runtime files

  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • model.safetensors

Mobile quantization

See MOBILE_Q4_PIPELINE.md for INT8/INT4 export and packaging.

Formats note

  • Included now: safetensors, ONNX (INT8, INT4) and GGUF.
  • GGUF artifact: aicemobile/AICE_v1_rwkv4_custom.gguf (custom RWKV4 GGUF layout).

Compliance documentation

  • EU AI Act public training-content summary: EU_TRAINING_SUMMARY.md
Downloads last month
16
Safetensors
Model size
2B params
Tensor type
F32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support