Instructions to use Gemstone-Models/Gemstone-1024x28 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Gemstone-Models/Gemstone-1024x28 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Gemstone-Models/Gemstone-1024x28")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Gemstone-Models/Gemstone-1024x28") model = AutoModelForCausalLM.from_pretrained("Gemstone-Models/Gemstone-1024x28") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Gemstone-Models/Gemstone-1024x28 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Gemstone-Models/Gemstone-1024x28" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gemstone-Models/Gemstone-1024x28", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Gemstone-Models/Gemstone-1024x28
- SGLang
How to use Gemstone-Models/Gemstone-1024x28 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Gemstone-Models/Gemstone-1024x28" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gemstone-Models/Gemstone-1024x28", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Gemstone-Models/Gemstone-1024x28" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gemstone-Models/Gemstone-1024x28", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Gemstone-Models/Gemstone-1024x28 with Docker Model Runner:
docker model run hf.co/Gemstone-Models/Gemstone-1024x28
Gemstone-1024x28
Gemstone-1024x28 is part of the Gemstone Suite of Models. A set of models trained with varying widths and depths.
Training
We train using litgpt and AxoNN using AMD MI250X GPUs on Frontier at Oak Ridge National Laboratory with a global batch size of 2048.
Data
Train and validation data is taken from non-overlapping subsets of dolma. As such it is not an instruction model. This model is trained for 350 billion tokens, we upload checkpoints every 2 billion tokens (477 steps).
Using Gemstone-1024x28
The Gemstones are based on the gemma-2b architecture and use modeling_gemma.py to run using the transformers library.
Licence
This model is released under the apache-2.0 licence.
Contact
Please, feel free to contact us with any questions, or open a discussion thread.
Citation
@article{mcleish2024gemstones
title={Gemstones: A Model Suite for Multi-Faceted Scaling Laws},
author={Sean McLeish and John Kirchenbauer and David Yu Miller and Siddharth Singh and Abhinav Bhatele and Micah Goldblum and Ashwinee Panda and Tom Goldstein},
journal={arXiv preprint arXiv:2502.},
year={2025}
}
- Downloads last month
- 7