IoGPT
Collection
A family of large language models developed by GALAXY MIND AI
โข
1 item
โข
Updated
โข
1
IoGPT A1 is a specialized language model developed by GALAXY MIND AI LABS. It is a fine-tuned version of magistral-small-2509, optimized using Unsloth for efficient training and inference.
This model was trained using SFT (Supervised Fine-Tuning) with the TRL library, specifically designed to handle complex instruction-following tasks with high efficiency.
You can use this model directly with the Hugging Face transformers library.
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from accelerate import Accelerator
torch_device = Accelerator().device
model_checkpoint = "galaxyMindAiLabs/IoGPT-A1"
processor = AutoProcessor.from_pretrained(model_checkpoint)
model = AutoModelForImageTextToText.from_pretrained(model_checkpoint,torch_dtype=torch.bfloat16, device_map="auto")
user_prompt = "Why sky is blue?"
messages = [
{"role": "user", "content": user_prompt},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(0, dtype=torch.float16)
generate_ids = model.generate(**inputs, max_new_tokens=5000, do_sample=True) # We recommend always setting True to avoid hallucinations.
decoded_output = processor.batch_decode(generate_ids[:, inputs["input_ids"].shape[1] :], skip_special_tokens=True)[0]
print(decoded_output)
Option 2: Using Unsloth (Faster Inference) Since this model was trained with Unsloth, using their library provides 2x faster inference.
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "galaxyMindAiLabs/IoGPT-A1",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
"If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))
Our recommendations for inference:
@misc{mistralai2024,
title={Mistral AI: Large Language Models for Everyone},
author={Mistral AI Team},
year={2024},
howpublished={\url{[https://mistral.ai](https://mistral.ai)}}
}
@misc{unsloth2024,
title={Unsloth: Efficient Fine-Tuning of LLMs},
author={Daniel Han and Michael Han},
year={2024},
howpublished={\url{[https://github.com/unslothai/unsloth](https://github.com/unslothai/unsloth)}}
}
@misc{peft2023,
title={PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models},
author={Sourab Mangrulkar and others},
year={2023},
publisher={GitHub},
howpublished={\url{[https://github.com/huggingface/peft](https://github.com/huggingface/peft)}}
}
@misc{vonwerra2022trl,
title={{TRL: Transformer Reinforcement Learning}},
author={Leandro von Werra and others},
year={2020},
publisher={GitHub},
howpublished={\url{[https://github.com/huggingface/trl](https://github.com/huggingface/trl)}}
}