Instructions to use Yewei-Liu/SHINE-ift_mqa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Yewei-Liu/SHINE-ift_mqa with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Yewei-Liu/SHINE-ift_mqa")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Yewei-Liu/SHINE-ift_mqa", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Yewei-Liu/SHINE-ift_mqa with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Yewei-Liu/SHINE-ift_mqa" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yewei-Liu/SHINE-ift_mqa", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Yewei-Liu/SHINE-ift_mqa
- SGLang
How to use Yewei-Liu/SHINE-ift_mqa with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Yewei-Liu/SHINE-ift_mqa" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yewei-Liu/SHINE-ift_mqa", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Yewei-Liu/SHINE-ift_mqa" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yewei-Liu/SHINE-ift_mqa", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Yewei-Liu/SHINE-ift_mqa with Docker Model Runner:
docker model run hf.co/Yewei-Liu/SHINE-ift_mqa
SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass
SHINE (Scalable Hyper In-context NEtwork) is a scalable hypernetwork that can map diverse meaningful contexts into high-quality LoRA adapters for large language models (LLM).
By reusing the frozen LLM's own parameters in an in-context hypernetwork design, SHINE transforms in-context knowledge into in-parameter knowledge in a single forward pass. This allows the model to handle complex question-answering tasks related to a specific context without needing to process that context again during inference.
- Paper: SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass
- Repository: https://github.com/Yewei-Liu/SHINE
Introduction
SHINE overcomes key limitations of prior hypernetworks by achieving strong expressive power with a relatively small number of parameters. It updates LLM parameters without any fine-tuning, significantly saving time, computation, and memory costs compared to standard supervised fine-tuning (SFT) adaptation.
Usage
This is the hypernetwork checkpoint after pretraining and instruction fine-tuning mqa.
For detailed instructions on environment setup, downloading model checkpoints, and performing inference (including the inference.ipynb notebook), please refer to the official GitHub repository.