Instructions to use Miki-T/JARVIS-Mistral-Phase1a with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Miki-T/JARVIS-Mistral-Phase1a with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1") model = PeftModel.from_pretrained(base_model, "Miki-T/JARVIS-Mistral-Phase1a") - Transformers
How to use Miki-T/JARVIS-Mistral-Phase1a with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Miki-T/JARVIS-Mistral-Phase1a")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Miki-T/JARVIS-Mistral-Phase1a with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Miki-T/JARVIS-Mistral-Phase1a" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Miki-T/JARVIS-Mistral-Phase1a", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Miki-T/JARVIS-Mistral-Phase1a
- SGLang
How to use Miki-T/JARVIS-Mistral-Phase1a with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Miki-T/JARVIS-Mistral-Phase1a" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Miki-T/JARVIS-Mistral-Phase1a", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Miki-T/JARVIS-Mistral-Phase1a" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Miki-T/JARVIS-Mistral-Phase1a", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Miki-T/JARVIS-Mistral-Phase1a with Docker Model Runner:
docker model run hf.co/Miki-T/JARVIS-Mistral-Phase1a
JARVIS-Mistral-Phase1a: Macedonian Language Foundation
Model ID: Miki-T/JARVIS-Mistral-Phase1a
A QLoRA fine-tuned Mistral 7B model trained on 500k rows of Macedonian web text to build language fluency as the foundation for JARVIS — a locally-hosted AI assistant inspired by Iron Man's JARVIS.
Model Details
Model Description
- Developed by: Miki Trajkovski
- Model type: Causal Language Model (fine-tuned via QLoRA)
- Base model:
mistralai/Mistral-7B-v0.1 - Language(s): Macedonian (mk), with English support
- License: MIT
- Finetuned from model: Mistral 7B v0.1
- Adapter type: LoRA (Low-Rank Adaptation)
Model Architecture
- Base: Mistral 7B (7 billion parameters)
- Fine-tuning method: QLoRA (4-bit quantization + LoRA adapters)
- LoRA rank: 16
- LoRA alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Max sequence length: 1024 tokens
Model Sources
- Repository: https://github.com/MikiTrajkovski/JARVIS (Will be available when project is complete)
- HuggingFace Model Card: https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a
- Training code:
tools/training_pipeline/train_phase1a.py
Uses
Direct Use
This model is designed for:
- Macedonian text generation — generates fluent Macedonian sentences
- Language understanding — comprehends Macedonian grammar and semantics
- Foundation for downstream tasks — serves as Phase 1a of the JARVIS training pipeline
Example usage:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import AutoPeftModelForCausalLM
# Load adapter
model = AutoPeftModelForCausalLM.from_pretrained(
"Miki-T/JARVIS-Mistral-Phase1a",
device_map="auto",
torch_dtype="auto",
)
# Merge for inference
model = model.merge_and_unload()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a")
# Generate
prompt = "Македонија е земја позната по"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
Downstream Use (Phase 1b, 1c)
This model is Phase 1a of a multi-phase training pipeline:
- Phase 1a (current): Macedonian language foundation
- Phase 1b (next): Instruction following
- Phase 1c (planned): Reasoning and problem-solving
- Phase 2 (planned): Macedonian law domain expertise (RAG)
Each phase builds on the previous one. Do NOT train Phase 1b on a fresh base model.
Out-of-Scope Use
- Not for production: This is a research/learning model
- Not instruction-tuned: Phase 1a only teaches language fluency, not instruction following
- Not domain-specific: Use Phase 2 for legal/specialized Macedonian tasks
- Not multilingual: Optimized for Macedonian; English support varies
Limitations and Bias
Known Limitations
Phase 1a only teaches language fluency — the model does NOT understand instructions yet
- Input: "Дај ми преводот" (Give me a translation)
- Output: Likely continues generating Macedonian text instead of translating
- This is fixed in Phase 1b
Training data bias — trained on Macedonian web text (Wikipedia, news, etc.)
- May reflect biases present in those sources
- Limited exposure to specialized domains (legal, medical, technical)
Context window: 1024 tokens max — cannot process very long Macedonian texts
No fine-grained reasoning: Phase 1c adds reasoning capability; Phase 1a lacks it
Recommendations
- Use this model only as a foundation for downstream phases
- For production Macedonian tasks, wait for Phase 1b (instruction following) and Phase 1c (reasoning)
- Fine-tune on domain-specific data if targeting legal, medical, or technical Macedonian
- Always validate outputs for accuracy and bias
Training Details
Training Data
| Dataset | Rows | Source | Purpose |
|---|---|---|---|
LVSTCK/macedonian-corpus-cleaned-dedup |
500,000 | HuggingFace | Macedonian language foundation |
- Data format: Plain text (one document per line in JSONL)
- Quality: Cleaned and deduplicated (lower quality than raw)
- Language: 100% Macedonian (Cyrillic script)
- Size: ~500k rows, ~2.5GB uncompressed
Training Procedure
Preprocessing
- Tokenized with Mistral tokenizer
- Max sequence length: 1024 tokens
- Packing enabled (multiple short texts combined into context window)
- No removal of special tokens or data cleaning beyond source dataset
Hyperparameters
| Parameter | Value | Reasoning |
|---|---|---|
| Learning rate | 2e-4 | Standard QLoRA starting point |
| Warmup ratio | 5% | Prevent large initial updates |
| Learning rate scheduler | Cosine decay | Smooth decay to ~0 by end |
| Batch size | 2 | Fits in 12GB VRAM with QLoRA |
| Gradient accumulation | 8 | Effective batch = 16 |
| Epochs | 1 | Single pass through data (avoid overfitting) |
| Optimization | AdamW 8-bit | Memory efficient |
| Gradient checkpointing | Enabled | Save VRAM at cost of speed |
Training Regime
- Hardware: NVIDIA RTX 5070 (12GB VRAM)
- Framework: PyTorch 2.2.0 + Hugging Face Transformers
- Fine-tuning framework: TRL SFTTrainer + PEFT LoRA
- Precision: 4-bit quantization (NF4) + bfloat16 math
Speeds, Sizes, Times
| Metric | Value |
|---|---|
| Training duration | 5 days, 23 hours, 29 minutes |
| Total steps | 9,502 |
| Throughput | ~12-15 tokens/second |
| Adapter size | ~200 MB |
| Total VRAM used | ~8.5 GB / 12 GB |
| Total tokens processed | 7.6M tokens |
Note: Throughput was artificially limited by gradient checkpointing. Phase 1b will disable this for 10x speedup.
Evaluation
Testing Data
Evaluated on:
- Manual test: 3 Macedonian prompts (verified fluent generation)
- Benchmark:
LVSTCK/macedonian-llm-eval(83 questions) — dataset unavailable due to HuggingFace deprecation
Metrics
| Metric | Value | Interpretation |
|---|---|---|
| Final loss | 1.2543 | Excellent convergence |
| Starting loss | 2.0910 | Model improved 40% |
| Final perplexity | 3.51 | Model is as uncertain as picking from ~4 equally likely tokens |
| Best loss achieved | 1.2460 | Fully converged |
| Gradient norm (avg) | 0.583 | Stable training (healthy range: 0.1-2.0) |
| Gradient norm (max) | 1.258 | No exploding gradients |
Sample Outputs
Test prompt: "Скопје е главен град на"
Model output: "Република Македонија и има околу 600.000 жители."
Interpretation: ✅ Fluent Macedonian text, maintains context, grammatically correct
Model Card Details
Environmental Impact
| Factor | Value |
|---|---|
| Hardware | NVIDIA RTX 5070 (12GB VRAM) |
| Training duration | 5 days, 23 hours |
| Power consumption (estimated) | ~150W continuous × 143.5 hours ≈ 21.5 kWh |
| Carbon emitted (estimated) | ~10-15 kg CO2e (depends on grid carbon intensity) |
| Cloud provider | None (local desktop GPU) |
Compute Infrastructure
- CPU: AMD Ryzen 7 7800X3D (8-core)
- GPU: NVIDIA RTX 5070 (12GB GDDR6X VRAM)
- RAM: 32GB DDR5
- Storage: NVMe SSD (assumed)
- OS: Windows 11
- CUDA: CUDA 12.x
Software
- PyTorch: 2.7.0+cu128
- Transformers: 4.40.0
- PEFT: 0.10.0
- TRL: 0.8.6
- Accelerate: 0.29.0
- Bitsandbytes: 0.43.0
- CTranslate2: (for Whisper STT, not used in this model)
See full requirements.txt in the JARVIS repository.
How to Use
Load the Model
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch
# Load with adapter (no merge)
model = AutoPeftModelForCausalLM.from_pretrained(
"Miki-T/JARVIS-Mistral-Phase1a",
device_map="auto",
torch_dtype=torch.float16,
)
# Or merge for faster inference
model = model.merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a")
Generate Text
prompt = "Македонија е земја позната по"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(model.device)
with torch.no_grad():
output_ids = model.generate(
input_ids,
max_new_tokens=50,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
Fine-tune Further (Phase 1b)
from peft import get_peft_model, LoraConfig
# Load base model + existing adapter
model = AutoPeftModelForCausalLM.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a")
# Use as starting point for Phase 1b training
# See: github.com/MikiTrajkovski/JARVIS/blob/main/tools/training_pipeline/train_phase1b.py
Citation
If you use this model, please cite:
BibTeX:
@misc{trajkovski2024jarvis,
author = {Trajkovski, Miki},
title = {JARVIS: Macedonian Language Foundation (Phase 1a)},
year = {2024},
publisher = {Hugging Face Hub},
howpublished = {\url{https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a}},
}
APA:
Trajkovski, M. (2024). JARVIS: Macedonian language foundation (Phase 1a) [Model]. Hugging Face Hub. https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a
Acknowledgments
- Base model: Mistral AI (Mistral 7B v0.1)
- Fine-tuning: Hugging Face TRL + PEFT libraries
- Data: LVSTCK Macedonian corpus
- Inspiration: Tony Stark's JARVIS from Marvel
License
This model is provided under the MIT License, same as the JARVIS project.
Model Card Contact
Author: Miki Trajkovski
GitHub: https://github.com/MikiTrajkovski/JARVIS
HuggingFace: https://huggingface.co/Miki-T
Framework Versions
- PEFT: 0.10.0
- Transformers: 4.40.0
- PyTorch: 2.7.0+cu128
- CUDA: 12.x
- Downloads last month
- 38
Model tree for Miki-T/JARVIS-Mistral-Phase1a
Base model
mistralai/Mistral-7B-v0.1