Spaces:
Running
Running
Update README.md
#2
by
hypothetical
- opened
README.md
CHANGED
|
@@ -7,4 +7,81 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
TheStage AI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# **TheStage AI Platform**
|
| 11 |
+
|
| 12 |
+
Inference optimization for LLMs, diffusion, and voice. Self-hosted or cloud. Works on NVIDIA GPUs, Apple Silicon, and edge devices.
|
| 13 |
+
|
| 14 |
+
**Links:**
|
| 15 |
+
|
| 16 |
+
[Web App](https://app.thestage.ai/) • [Docs](https://docs.thestage.ai/) • [Hugging Face](https://huggingface.co/TheStageAI) • [X](https://x.com/TheStageAI) • [LinkedIn](https://www.linkedin.com/company/thestageai) • [Discord](mailto:sergey@thestage.ai) (request invite) • [Email](mailto:support@thestage.ai)
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# **What is TheStage AI**
|
| 21 |
+
|
| 22 |
+
TheStage AI is an inference optimization stack. It helps you compress, compile, and serve models. You keep control of the accuracy versus performance trade-off.
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
# **Products / Components**
|
| 27 |
+
|
| 28 |
+
- [**ANNA (Automatic Neural Network Acceleration)**](https://docs.thestage.ai/qlip/docs/source/anna_api.html)
|
| 29 |
+
|
| 30 |
+
Automated compression analysis under user-defined constraints (size, MACs, latency, memory). Outputs a QlipConfig for compile and serve.
|
| 31 |
+
|
| 32 |
+
- [**Qlip**](https://docs.thestage.ai/qlip/docs/source/index.html)
|
| 33 |
+
|
| 34 |
+
Full-stack optimization and inference framework. Quantization, sparsification, and compilation for NVIDIA GPUs (Apple Silicon supported). Produces pre-compiled (non-JIT) artifacts with dynamic shapes and mixed precision. Triton-based serving.
|
| 35 |
+
|
| 36 |
+
- [**Elastic Models**](https://docs.thestage.ai/tutorials/source/elastic_transformers.html)
|
| 37 |
+
|
| 38 |
+
Qlip-optimized models with S / M / L / XL performance tiers (availability varies). L/M/S may include quantization or pruning for faster inference.
|
| 39 |
+
|
| 40 |
+
- [**TheStage CLI**](https://docs.thestage.ai/platform/src/thestage-ai-cli.html)
|
| 41 |
+
|
| 42 |
+
Manage projects, tokens, and hardware from the terminal. Launch/monitor jobs, rent instances, and stream logs.
|
| 43 |
+
|
| 44 |
+
- [**TheStage Platform**](https://app.thestage.ai/)
|
| 45 |
+
|
| 46 |
+
Web UI and APIs for instances, models, and deployments. Includes the [**Playground**](https://app.thestage.ai/) to test Elastic Models, switch hardware, and compare tiers before deployment.
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
# **Key features**
|
| 52 |
+
|
| 53 |
+
- **Elastic Models with S/M/L/XL tiers per model** (choose cost, quality, and memory balance; availability varies).
|
| 54 |
+
- **ANNA constraint-driven compression analysis** (outputs a QlipConfig for compile and serve).
|
| 55 |
+
- **Qlip compiler and runtime** (pre-compiled engines; no runtime JIT; dynamic shapes; mixed precision).
|
| 56 |
+
- **OpenAI-compatible HTTP serving** (deploy and scale models through a standard API).
|
| 57 |
+
- **Playground to test models and hardware** (compare performance and tiers before deployment).
|
| 58 |
+
- **Self-host or run in the cloud** (use your own infrastructure; keep data private).
|
| 59 |
+
- **Hardware support: NVIDIA (incl. Jetson), Apple Silicon, and edge targets** (NPUs, DSPs, and MCUs per model).
|
| 60 |
+
- **Comprehensive tutorials and documentation** (from setup to evaluation and production).
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
# **Quickstart**
|
| 65 |
+
|
| 66 |
+
- Install CLI: `pip install thestage`
|
| 67 |
+
- Set token: `thestage config set --api-token <YOUR_API_TOKEN>` (get it in the web app)
|
| 68 |
+
- Use `elastic_models` in your code and choose a tier (S/M/L/XL). See Markdown version for a snippet.
|
| 69 |
+
- Diffusion and voice examples are in the docs.
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
# **Serving**
|
| 74 |
+
|
| 75 |
+
OpenAI-compatible API flow with Modal is documented (single- and multi-GPU).
|
| 76 |
+
|
| 77 |
+
Start here: https://docs.thestage.ai/
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
# **Supported hardware**
|
| 82 |
+
|
| 83 |
+
- NVIDIA GPUs (incl. Jetson where applicable)
|
| 84 |
+
- Apple Silicon
|
| 85 |
+
- Edge/embedded devices
|
| 86 |
+
|
| 87 |
+
---
|