Instructions to use Miki-T/JARVIS-Mistral-Phase1a with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Miki-T/JARVIS-Mistral-Phase1a with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = PeftModel.from_pretrained(base_model, "Miki-T/JARVIS-Mistral-Phase1a")

Transformers

How to use Miki-T/JARVIS-Mistral-Phase1a with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Miki-T/JARVIS-Mistral-Phase1a")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Miki-T/JARVIS-Mistral-Phase1a with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Miki-T/JARVIS-Mistral-Phase1a"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Miki-T/JARVIS-Mistral-Phase1a",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Miki-T/JARVIS-Mistral-Phase1a

SGLang

How to use Miki-T/JARVIS-Mistral-Phase1a with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Miki-T/JARVIS-Mistral-Phase1a" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Miki-T/JARVIS-Mistral-Phase1a",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Miki-T/JARVIS-Mistral-Phase1a" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Miki-T/JARVIS-Mistral-Phase1a",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Miki-T/JARVIS-Mistral-Phase1a with Docker Model Runner:
```
docker model run hf.co/Miki-T/JARVIS-Mistral-Phase1a
```

JARVIS-Mistral-Phase1a: Macedonian Language Foundation

Model ID: Miki-T/JARVIS-Mistral-Phase1a

A QLoRA fine-tuned Mistral 7B model trained on 500k rows of Macedonian web text to build language fluency as the foundation for JARVIS — a locally-hosted AI assistant inspired by Iron Man's JARVIS.

Model Details

Model Description

Developed by: Miki Trajkovski
Model type: Causal Language Model (fine-tuned via QLoRA)
Base model: mistralai/Mistral-7B-v0.1
Language(s): Macedonian (mk), with English support
License: MIT
Finetuned from model: Mistral 7B v0.1
Adapter type: LoRA (Low-Rank Adaptation)

Model Architecture

Base: Mistral 7B (7 billion parameters)
Fine-tuning method: QLoRA (4-bit quantization + LoRA adapters)
LoRA rank: 16
LoRA alpha: 32
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max sequence length: 1024 tokens

Model Sources

Repository: https://github.com/MikiTrajkovski/JARVIS (Will be available when project is complete)
HuggingFace Model Card: https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a
Training code: tools/training_pipeline/train_phase1a.py

Uses

Direct Use

This model is designed for:

Macedonian text generation — generates fluent Macedonian sentences
Language understanding — comprehends Macedonian grammar and semantics
Foundation for downstream tasks — serves as Phase 1a of the JARVIS training pipeline

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import AutoPeftModelForCausalLM

# Load adapter
model = AutoPeftModelForCausalLM.from_pretrained(
    "Miki-T/JARVIS-Mistral-Phase1a",
    device_map="auto",
    torch_dtype="auto",
)

# Merge for inference
model = model.merge_and_unload()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a")

# Generate
prompt = "Македонија е земја позната по"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Downstream Use (Phase 1b, 1c)

This model is Phase 1a of a multi-phase training pipeline:

Phase 1a (current): Macedonian language foundation
Phase 1b (next): Instruction following
Phase 1c (planned): Reasoning and problem-solving
Phase 2 (planned): Macedonian law domain expertise (RAG)

Each phase builds on the previous one. Do NOT train Phase 1b on a fresh base model.

Out-of-Scope Use

Not for production: This is a research/learning model
Not instruction-tuned: Phase 1a only teaches language fluency, not instruction following
Not domain-specific: Use Phase 2 for legal/specialized Macedonian tasks
Not multilingual: Optimized for Macedonian; English support varies

Limitations and Bias

Known Limitations

Phase 1a only teaches language fluency — the model does NOT understand instructions yet
- Input: "Дај ми преводот" (Give me a translation)
- Output: Likely continues generating Macedonian text instead of translating
- This is fixed in Phase 1b
Training data bias — trained on Macedonian web text (Wikipedia, news, etc.)
- May reflect biases present in those sources
- Limited exposure to specialized domains (legal, medical, technical)
Context window: 1024 tokens max — cannot process very long Macedonian texts
No fine-grained reasoning: Phase 1c adds reasoning capability; Phase 1a lacks it

Recommendations

Use this model only as a foundation for downstream phases
For production Macedonian tasks, wait for Phase 1b (instruction following) and Phase 1c (reasoning)
Fine-tune on domain-specific data if targeting legal, medical, or technical Macedonian
Always validate outputs for accuracy and bias

Training Details

Training Data

Dataset	Rows	Source	Purpose
`LVSTCK/macedonian-corpus-cleaned-dedup`	500,000	HuggingFace	Macedonian language foundation

Data format: Plain text (one document per line in JSONL)
Quality: Cleaned and deduplicated (lower quality than raw)
Language: 100% Macedonian (Cyrillic script)
Size: ~500k rows, ~2.5GB uncompressed

Training Procedure

Preprocessing

Tokenized with Mistral tokenizer
Max sequence length: 1024 tokens
Packing enabled (multiple short texts combined into context window)
No removal of special tokens or data cleaning beyond source dataset

Hyperparameters

Parameter	Value	Reasoning
Learning rate	2e-4	Standard QLoRA starting point
Warmup ratio	5%	Prevent large initial updates
Learning rate scheduler	Cosine decay	Smooth decay to ~0 by end
Batch size	2	Fits in 12GB VRAM with QLoRA
Gradient accumulation	8	Effective batch = 16
Epochs	1	Single pass through data (avoid overfitting)
Optimization	AdamW 8-bit	Memory efficient
Gradient checkpointing	Enabled	Save VRAM at cost of speed

Training Regime

Hardware: NVIDIA RTX 5070 (12GB VRAM)
Framework: PyTorch 2.2.0 + Hugging Face Transformers
Fine-tuning framework: TRL SFTTrainer + PEFT LoRA
Precision: 4-bit quantization (NF4) + bfloat16 math

Speeds, Sizes, Times

Metric	Value
Training duration	5 days, 23 hours, 29 minutes
Total steps	9,502
Throughput	~12-15 tokens/second
Adapter size	~200 MB
Total VRAM used	~8.5 GB / 12 GB
Total tokens processed	7.6M tokens

Note: Throughput was artificially limited by gradient checkpointing. Phase 1b will disable this for 10x speedup.

Evaluation

Testing Data

Evaluated on:

Manual test: 3 Macedonian prompts (verified fluent generation)
Benchmark: LVSTCK/macedonian-llm-eval (83 questions) — dataset unavailable due to HuggingFace deprecation

Metrics

Metric	Value	Interpretation
Final loss	1.2543	Excellent convergence
Starting loss	2.0910	Model improved 40%
Final perplexity	3.51	Model is as uncertain as picking from ~4 equally likely tokens
Best loss achieved	1.2460	Fully converged
Gradient norm (avg)	0.583	Stable training (healthy range: 0.1-2.0)
Gradient norm (max)	1.258	No exploding gradients

Sample Outputs

Test prompt: "Скопје е главен град на"
Model output: "Република Македонија и има околу 600.000 жители."
Interpretation: ✅ Fluent Macedonian text, maintains context, grammatically correct

Model Card Details

Environmental Impact

Factor	Value
Hardware	NVIDIA RTX 5070 (12GB VRAM)
Training duration	5 days, 23 hours
Power consumption (estimated)	~150W continuous × 143.5 hours ≈ 21.5 kWh
Carbon emitted (estimated)	~10-15 kg CO2e (depends on grid carbon intensity)
Cloud provider	None (local desktop GPU)

Compute Infrastructure

CPU: AMD Ryzen 7 7800X3D (8-core)
GPU: NVIDIA RTX 5070 (12GB GDDR6X VRAM)
RAM: 32GB DDR5
Storage: NVMe SSD (assumed)
OS: Windows 11
CUDA: CUDA 12.x

Software

PyTorch: 2.7.0+cu128
Transformers: 4.40.0
PEFT: 0.10.0
TRL: 0.8.6
Accelerate: 0.29.0
Bitsandbytes: 0.43.0
CTranslate2: (for Whisper STT, not used in this model)

See full requirements.txt in the JARVIS repository.

How to Use

Load the Model

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch

# Load with adapter (no merge)
model = AutoPeftModelForCausalLM.from_pretrained(
    "Miki-T/JARVIS-Mistral-Phase1a",
    device_map="auto",
    torch_dtype=torch.float16,
)

# Or merge for faster inference
model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a")

Generate Text

prompt = "Македонија е земја позната по"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=50,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

Fine-tune Further (Phase 1b)

from peft import get_peft_model, LoraConfig

# Load base model + existing adapter
model = AutoPeftModelForCausalLM.from_pretrained("Miki-T/JARVIS-Mistral-Phase1a")

# Use as starting point for Phase 1b training
# See: github.com/MikiTrajkovski/JARVIS/blob/main/tools/training_pipeline/train_phase1b.py

Citation

If you use this model, please cite:

BibTeX:

@misc{trajkovski2024jarvis,
  author = {Trajkovski, Miki},
  title = {JARVIS: Macedonian Language Foundation (Phase 1a)},
  year = {2024},
  publisher = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a}},
}

APA:

Trajkovski, M. (2024). JARVIS: Macedonian language foundation (Phase 1a) [Model]. Hugging Face Hub. https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a

Acknowledgments

Base model: Mistral AI (Mistral 7B v0.1)
Fine-tuning: Hugging Face TRL + PEFT libraries
Data: LVSTCK Macedonian corpus
Inspiration: Tony Stark's JARVIS from Marvel

License

This model is provided under the MIT License, same as the JARVIS project.

Model Card Contact

Author: Miki Trajkovski
GitHub: https://github.com/MikiTrajkovski/JARVIS
HuggingFace: https://huggingface.co/Miki-T

Framework Versions

PEFT: 0.10.0
Transformers: 4.40.0
PyTorch: 2.7.0+cu128
CUDA: 12.x

Downloads last month: 38

Model tree for Miki-T/JARVIS-Mistral-Phase1a

Base model

mistralai/Mistral-7B-v0.1

Adapter

(2474)

this model

Miki-T
/

JARVIS-Mistral-Phase1a