Developed by GALAXY MIND AI LABS

Model Description

IoGPT A1 is a specialized language model developed by GALAXY MIND AI LABS. It is a fine-tuned version of magistral-small-2509, optimized using Unsloth for efficient training and inference.

This model was trained using SFT (Supervised Fine-Tuning) with the TRL library, specifically designed to handle complex instruction-following tasks with high efficiency.

Developer: GALAXY MIND AI LABS
Model Name: IoGPT A1
Base Model: mistral/magistral-small-2509
Library: PEFT / Unsloth / Transformers
Quantization: 4-bit (BNB)

🚀 Quick Start

You can use this model directly with the Hugging Face transformers library.

Option 1: Using Transformers (Standard)

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from accelerate import Accelerator

torch_device = Accelerator().device
model_checkpoint = "galaxyMindAiLabs/IoGPT-A1"
processor = AutoProcessor.from_pretrained(model_checkpoint)
model = AutoModelForImageTextToText.from_pretrained(model_checkpoint,torch_dtype=torch.bfloat16,  device_map="auto")
user_prompt = "Why sky is blue?"

messages = [
    {"role": "user", "content": user_prompt},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(0, dtype=torch.float16)
generate_ids = model.generate(**inputs, max_new_tokens=5000, do_sample=True) # We recommend always setting True to avoid hallucinations.
decoded_output = processor.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]

print(decoded_output)

Option 2: Using Unsloth (Faster Inference) Since this model was trained with Unsloth, using their library provides 2x faster inference.

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "galaxyMindAiLabs/IoGPT-A1",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

inputs = tokenizer(
[
    "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))

Our recommendations for inference:

Use a temperature of 0.5-0.7 to minimize model hallucinations as much as possible
Use top_p ≈ 0.85-0.95 Training Details This model was trained using the SFT (Supervised Fine-Tuning) technique with LoRA adapters for parameter efficiency. Framework Versions:

PEFT 0.18.0
TRL: 0.24.0
Transformers: 4.57.3
Pytorch: 2.9.1
Unsloth (latest) Citations

@misc{mistralai2024,
      title={Mistral AI: Large Language Models for Everyone},
      author={Mistral AI Team},
      year={2024},
      howpublished={\url{[https://mistral.ai](https://mistral.ai)}}
}

@misc{unsloth2024,
      title={Unsloth: Efficient Fine-Tuning of LLMs},
      author={Daniel Han and Michael Han},
      year={2024},
      howpublished={\url{[https://github.com/unslothai/unsloth](https://github.com/unslothai/unsloth)}}
}

@misc{peft2023,
      title={PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models},
      author={Sourab Mangrulkar and others},
      year={2023},
      publisher={GitHub},
      howpublished={\url{[https://github.com/huggingface/peft](https://github.com/huggingface/peft)}}
}

@misc{vonwerra2022trl,
      title={{TRL: Transformer Reinforcement Learning}},
      author={Leandro von Werra and others},
      year={2020},
      publisher={GitHub},
      howpublished={\url{[https://github.com/huggingface/trl](https://github.com/huggingface/trl)}}
}