QORA - Native Rust LLM Inference Engine Base Model SmolLM3-3B (HuggingFaceTB/SmolLM3-3B)
Pure Rust inference engine for the SmolLM3-3B language model. No Python runtime, no CUDA, no external dependencies. Single executable + quantized weights = portable AI on any machine.
Now with GPU acceleration! Auto-detects Vulkan-compatible GPUs for ~4.8x faster inference, with intelligent fallback to CPU.
Try: https://huggingface.co/qoranet/QORA-LLM
Fastest: direct answer, no thinking, deterministic
qora.exe --prompt "What is the capital of France?" --no-think --greedy
Fast: direct answer with some randomness
qora.exe --prompt "Tell me about Mars" --no-think
Full quality: long reply mode, no thinking
qora.exe --prompt "Tell me story of Bitcoin and it's creation" --no-think --greedy
See what the model is thinking
qora.exe --prompt "What is 2+2?" --show-think
Force CPU (skip GPU auto-detect)
qora.exe --prompt "Hello" --cpu
Control output length
qora.exe --prompt "Tell me a story" --max-tokens 512
Raw text completion (no chat template)
qora.exe --prompt "Once upon a time" --raw --max-tokens 128
Custom model path
qora.exe --load path/to/model.qora --prompt "Hello"