Instructions to use AstroMLab/AstroSage-8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AstroMLab/AstroSage-8B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-BF16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AstroMLab/AstroSage-8B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16 # Run inference directly in the terminal: ./llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16
Use Docker
docker model run hf.co/AstroMLab/AstroSage-8B-GGUF:BF16
- LM Studio
- Jan
- vLLM
How to use AstroMLab/AstroSage-8B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AstroMLab/AstroSage-8B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AstroMLab/AstroSage-8B-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AstroMLab/AstroSage-8B-GGUF:BF16
- Ollama
How to use AstroMLab/AstroSage-8B-GGUF with Ollama:
ollama run hf.co/AstroMLab/AstroSage-8B-GGUF:BF16
- Unsloth Studio new
How to use AstroMLab/AstroSage-8B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AstroMLab/AstroSage-8B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AstroMLab/AstroSage-8B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AstroMLab/AstroSage-8B-GGUF to start chatting
- Docker Model Runner
How to use AstroMLab/AstroSage-8B-GGUF with Docker Model Runner:
docker model run hf.co/AstroMLab/AstroSage-8B-GGUF:BF16
- Lemonade
How to use AstroMLab/AstroSage-8B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AstroMLab/AstroSage-8B-GGUF:BF16
Run and chat with the model
lemonade run user.AstroSage-8B-GGUF-BF16
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16# Run inference directly in the terminal:
llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16# Run inference directly in the terminal:
./llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16# Run inference directly in the terminal:
./build/bin/llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16Use Docker
docker model run hf.co/AstroMLab/AstroSage-8B-GGUF:BF16AstroSage-Llama-3.1-8B-GGUF
https://arxiv.org/abs/2411.09012
AstroSage-Llama-3.1-8B-GGUF is the quantized version of AstroSage-Llama-3.1-8B, optimized for efficient deployment while maintaining the model's specialized capabilities in astronomy, astrophysics, and cosmology. This quantized version aims to provide a more accessible deployment option while preserving the model's capabilities.
Model Details
- Base Architecture: Meta-Llama-3.1-8B
- Base Model: AstroSage-Llama-3.1-8B
- Parameters: 8 billion
- Quantization: GGUF format with two precision options
- Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
- License: Llama 3.1 Community License
- Development Process:
- Based on the fully trained AstroSage-Llama-3.1-8B model
- Quantized to GGUF format in two versions
- Optimized for efficient inference
Using the Model
Python Implementation
from llama_cpp import Llama
from huggingface_hub import hf_hub_download
import os
import sys
import contextlib
# Suppress warnings
@contextlib.contextmanager
def suppress_stderr():
stderr = sys.stderr
with open(os.devnull, 'w') as devnull:
sys.stderr = devnull
try:
yield
finally:
sys.stderr = stderr
# or change the filename to AstroSage-8B-BF16.gguf for BF16 quantization
def download_model(repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-Q8_0.gguf"):
try:
os.makedirs("models", exist_ok=True)
local_path = os.path.join("models", filename)
if not os.path.exists(local_path):
print(f"Downloading {filename}...")
with suppress_stderr():
local_path = hf_hub_download(
repo_id=repo_id,
filename=filename,
local_dir="models",
local_dir_use_symlinks=False
)
print("Download complete!")
return local_path
except Exception as e:
print(f"Error downloading model: {e}")
raise
def initialize_llm():
model_path = download_model()
with suppress_stderr():
return Llama(
model_path=model_path,
n_ctx=2048,
n_threads=4
)
def get_response(llm, prompt, max_tokens=128):
response = llm(
prompt,
max_tokens=max_tokens,
temperature=0.7,
top_p=0.9,
top_k=40,
repeat_penalty=1.1,
stop=["User:", "\n\n"]
)
return response['choices'][0]['text']
def main():
llm = initialize_llm()
# Example question about galaxy formation
first_question = "How does a galaxy form?"
print("\nQuestion:", first_question)
print("\nAI:", get_response(llm, first_question).strip(), "\n")
print("\nYou can now ask more questions! Type 'quit' or 'exit' to end the conversation.\n")
while True:
try:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit']:
print("\nGoodbye!")
break
print("\nAI:", get_response(llm, user_input).strip(), "\n")
except KeyboardInterrupt:
print("\nGoodbye!")
break
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
Installation Requirements
pip install llama-cpp-python huggingface_hub
For Macbook with Apple Silicon, install llama-cpp with the following instead
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DLLAMA_METAL=on" pip install llama-cpp-python
Key Parameters
n_ctx: Context window size (default: 2048)n_threads: Number of CPU threads to use (adjust based on your hardware)temperature: Controls randomnesstop_p: Nucleus sampling parametertop_k: Limits vocabulary choicesrepeat_penalty: Prevents repetitionmax_tokens: Maximum length of response (128 default, increase for longer answers)
Example Usage
The model will automatically:
- Download the quantized model from Hugging Face
- Initialize it with recommended parameters
- Start with an example question about galaxy formation
- Allow for interactive conversation
- Support easy exit with 'quit' or 'exit' commands
For different use cases, you can:
- Use the BF16 version for maximum accuracy
- Adjust context window size for longer conversations
- Modify temperature for more/less deterministic responses
- Change max_tokens for longer/shorter responses
Model Improvements and Performance
The quantized model offers several advantages:
- Reduced memory requirements
- CPU inference capability
- Faster inference speed
- Broader hardware compatibility
Note: Formal benchmarking of the quantized model is pending. Performance metrics will be updated once comprehensive testing is completed.
Quantization Details
- Format: GGUF
- Available Versions:
- AstroSage-8B-BF16.gguf: bfloat16 precision, original precision
- AstroSage-8B-Q8_0.gguf: 8-bit quantized, negligible loss in perplexity, smaller size
- Compatibility: Works with llama.cpp and derived projects
- Trade-offs:
- BF16:
- Best quality, closest to original model behavior
- Larger file size and memory requirements
- Recommended for accuracy-critical applications
- Q8_0:
- Reduced memory footprint
- Good balance of performance and size
- Suitable for most general applications
- BF16:
Intended Use
- Curiosity-driven question answering
- Brainstorming new ideas
- Astronomical research assistance
- Educational support in astronomy
- Literature review and summarization
- Scientific explanation of concepts
- Low-resource deployment scenarios
- Edge device implementation
- CPU-only environments
- Applications requiring reduced memory footprint
Limitations
- All limitations of the original model apply
- Additional considerations:
- Potential reduction in prediction accuracy due to quantization
- May show increased variance in numeric calculations
- Reduced precision in edge cases
- Performance may vary based on hardware configuration
Technical Specifications
- Architecture: Meta-Llama 3.1
- Deployment: CPU-friendly, reduced memory footprint
- Format: GGUF (compatible with llama.cpp)
Ethical Considerations
While this model is designed for scientific use:
- Should not be used as sole source for critical research decisions
- Output should be verified against primary sources
- May reflect biases present in astronomical literature
Citation and Contact
- Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
- AstroMLab: astromachinelearninglab at gmail dot com
- Please cite the AstroMLab 3 paper when referencing this model:
@preprint{dehaan2024astromlab3,
title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model},
author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
year={2024},
eprint={2411.09012},
archivePrefix={arXiv},
primaryClass={astro-ph.IM},
url={https://arxiv.org/abs/2411.09012},
}
Additional note: When citing this quantized version, please reference both the original AstroMLab 3 paper above and specify the use of the GGUF quantized variant.
- Downloads last month
- 63
8-bit
16-bit
Model tree for AstroMLab/AstroSage-8B-GGUF
Base model
meta-llama/Llama-3.1-8B
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf AstroMLab/AstroSage-8B-GGUF:BF16# Run inference directly in the terminal: llama-cli -hf AstroMLab/AstroSage-8B-GGUF:BF16