Text Generation
Transformers
English
adaptive-river
mamba
mixture-of-experts
state-space-models
hybrid-architecture
custom_code
Instructions to use Alienanthony/ROE_EDU_BASE_Undercooked with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Alienanthony/ROE_EDU_BASE_Undercooked with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Alienanthony/ROE_EDU_BASE_Undercooked", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Alienanthony/ROE_EDU_BASE_Undercooked", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Alienanthony/ROE_EDU_BASE_Undercooked with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Alienanthony/ROE_EDU_BASE_Undercooked" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Alienanthony/ROE_EDU_BASE_Undercooked", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Alienanthony/ROE_EDU_BASE_Undercooked
- SGLang
How to use Alienanthony/ROE_EDU_BASE_Undercooked with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Alienanthony/ROE_EDU_BASE_Undercooked" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Alienanthony/ROE_EDU_BASE_Undercooked", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Alienanthony/ROE_EDU_BASE_Undercooked" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Alienanthony/ROE_EDU_BASE_Undercooked", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Alienanthony/ROE_EDU_BASE_Undercooked with Docker Model Runner:
docker model run hf.co/Alienanthony/ROE_EDU_BASE_Undercooked
Update README.md
Browse files
README.md
CHANGED
|
@@ -158,9 +158,9 @@ k_ffn = max(1, int(round(base_top_k * (0.5 + budget_ratio / 2.0))))
|
|
| 158 |
|
| 159 |
| Budget Ratio | Active Attn Experts | Active FFN Experts | Relative Speed | Quality Retention | Recommended Use Case |
|
| 160 |
|--------------|---------------------|--------------------|--------------:|------------------:|----------------------|
|
| 161 |
-
| 1.0 (Full) | 6/6 (100%) |
|
| 162 |
-
| 0.9 | 5-6/6 (83-100%) |
|
| 163 |
-
| 0.75 | 4-5/6 (67-83%) | 1
|
| 164 |
| 0.6 | 4/6 (67%) | 1/4 (25%) | ~1.7× | 85-90% | Efficient inference |
|
| 165 |
| 0.5 | 3/6 (50%) | 1/4 (25%) | ~2.0× | 80-85% | Fast generation, good quality |
|
| 166 |
| 0.35 | 2-3/6 (33-50%) | 1/4 (25%) | ~2.3× | 70-80% | Speed-optimized |
|
|
|
|
| 158 |
|
| 159 |
| Budget Ratio | Active Attn Experts | Active FFN Experts | Relative Speed | Quality Retention | Recommended Use Case |
|
| 160 |
|--------------|---------------------|--------------------|--------------:|------------------:|----------------------|
|
| 161 |
+
| 1.0 (Full) | 6/6 (100%) | 1/4 (25%) | 1.0× | 100% | Maximum quality, complex reasoning |
|
| 162 |
+
| 0.9 | 5-6/6 (83-100%) | 1/4 (25%) | ~1.1× | 95-98% | High-quality production |
|
| 163 |
+
| 0.75 | 4-5/6 (67-83%) | 1/4 (25%) | ~1.4× | 90-95% | Balanced performance |
|
| 164 |
| 0.6 | 4/6 (67%) | 1/4 (25%) | ~1.7× | 85-90% | Efficient inference |
|
| 165 |
| 0.5 | 3/6 (50%) | 1/4 (25%) | ~2.0× | 80-85% | Fast generation, good quality |
|
| 166 |
| 0.35 | 2-3/6 (33-50%) | 1/4 (25%) | ~2.3× | 70-80% | Speed-optimized |
|