|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: gguf |
|
|
tags: |
|
|
- ruvltra |
|
|
- sona |
|
|
- adaptive-learning |
|
|
- gguf |
|
|
- quantized |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
# RuvLTRA Medium |
|
|
|
|
|
[](https://opensource.org/licenses/Apache-2.0) |
|
|
[](https://huggingface.co/ruv/ruvltra-medium) |
|
|
[](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) |
|
|
|
|
|
**βοΈ Balanced Model for General-Purpose Tasks** |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
RuvLTRA Medium provides the sweet spot between capability and resource usage. Ideal for desktop applications, development workstations, and moderate-scale deployments. |
|
|
|
|
|
## Model Card |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Parameters** | 1.1 Billion | |
|
|
| **Quantization** | Q4_K_M | |
|
|
| **Context** | 8,192 tokens | |
|
|
| **Size** | ~669 MB | |
|
|
| **Min RAM** | 2 GB | |
|
|
| **Recommended RAM** | 4 GB | |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
```bash |
|
|
# Download |
|
|
wget https://huggingface.co/ruv/ruvltra-medium/resolve/main/ruvltra-1.1b-q4_k_m.gguf |
|
|
|
|
|
# Run inference |
|
|
./llama-cli -m ruvltra-1.1b-q4_k_m.gguf \ |
|
|
-p "Explain quantum computing in simple terms:" \ |
|
|
-n 512 -c 8192 |
|
|
``` |
|
|
|
|
|
## π‘ Use Cases |
|
|
|
|
|
- **Development**: Code assistance and generation |
|
|
- **Writing**: Content creation and editing |
|
|
- **Analysis**: Document summarization |
|
|
- **Chat**: Conversational AI applications |
|
|
|
|
|
## π§ Integration |
|
|
|
|
|
### Rust |
|
|
```rust |
|
|
use ruvllm::hub::ModelDownloader; |
|
|
|
|
|
let path = ModelDownloader::new() |
|
|
.download("ruv/ruvltra-medium", None) |
|
|
.await?; |
|
|
``` |
|
|
|
|
|
### Python |
|
|
```python |
|
|
from llama_cpp import Llama |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
model_path = hf_hub_download("ruv/ruvltra-medium", "ruvltra-1.1b-q4_k_m.gguf") |
|
|
llm = Llama(model_path=model_path, n_ctx=8192) |
|
|
``` |
|
|
|
|
|
### OpenAI-Compatible Server |
|
|
|
|
|
```bash |
|
|
python -m llama_cpp.server \ |
|
|
--model ruvltra-1.1b-q4_k_m.gguf \ |
|
|
--host 0.0.0.0 --port 8000 |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Platform | Tokens/sec | |
|
|
|----------|------------| |
|
|
| M2 Pro (Metal) | 65 tok/s | |
|
|
| RTX 4080 (CUDA) | 95 tok/s | |
|
|
| i9-13900K (CPU) | 25 tok/s | |
|
|
|
|
|
--- |
|
|
|
|
|
**License**: Apache 2.0 | **GitHub**: [ruvnet/ruvector](https://github.com/ruvnet/ruvector) |
|
|
|