Files changed (1) hide show
  1. README.md +78 -1
README.md CHANGED
@@ -7,4 +7,81 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- TheStage AI - Inference Acceleration Stack
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # **TheStage AI Platform**
11
+
12
+ Inference optimization for LLMs, diffusion, and voice. Self-hosted or cloud. Works on NVIDIA GPUs, Apple Silicon, and edge devices.
13
+
14
+ **Links:**
15
+
16
+ [Web App](https://app.thestage.ai/) • [Docs](https://docs.thestage.ai/) • [Hugging Face](https://huggingface.co/TheStageAI) • [X](https://x.com/TheStageAI) • [LinkedIn](https://www.linkedin.com/company/thestageai) • [Discord](mailto:sergey@thestage.ai) (request invite) • [Email](mailto:support@thestage.ai)
17
+
18
+ ---
19
+
20
+ # **What is TheStage AI**
21
+
22
+ TheStage AI is an inference optimization stack. It helps you compress, compile, and serve models. You keep control of the accuracy versus performance trade-off.
23
+
24
+ ---
25
+
26
+ # **Products / Components**
27
+
28
+ - [**ANNA (Automatic Neural Network Acceleration)**](https://docs.thestage.ai/qlip/docs/source/anna_api.html)
29
+
30
+ Automated compression analysis under user-defined constraints (size, MACs, latency, memory). Outputs a QlipConfig for compile and serve.
31
+
32
+ - [**Qlip**](https://docs.thestage.ai/qlip/docs/source/index.html)
33
+
34
+ Full-stack optimization and inference framework. Quantization, sparsification, and compilation for NVIDIA GPUs (Apple Silicon supported). Produces pre-compiled (non-JIT) artifacts with dynamic shapes and mixed precision. Triton-based serving.
35
+
36
+ - [**Elastic Models**](https://docs.thestage.ai/tutorials/source/elastic_transformers.html)
37
+
38
+ Qlip-optimized models with S / M / L / XL performance tiers (availability varies). L/M/S may include quantization or pruning for faster inference.
39
+
40
+ - [**TheStage CLI**](https://docs.thestage.ai/platform/src/thestage-ai-cli.html)
41
+
42
+ Manage projects, tokens, and hardware from the terminal. Launch/monitor jobs, rent instances, and stream logs.
43
+
44
+ - [**TheStage Platform**](https://app.thestage.ai/)
45
+
46
+ Web UI and APIs for instances, models, and deployments. Includes the [**Playground**](https://app.thestage.ai/) to test Elastic Models, switch hardware, and compare tiers before deployment.
47
+
48
+
49
+ ---
50
+
51
+ # **Key features**
52
+
53
+ - **Elastic Models with S/M/L/XL tiers per model** (choose cost, quality, and memory balance; availability varies).
54
+ - **ANNA constraint-driven compression analysis** (outputs a QlipConfig for compile and serve).
55
+ - **Qlip compiler and runtime** (pre-compiled engines; no runtime JIT; dynamic shapes; mixed precision).
56
+ - **OpenAI-compatible HTTP serving** (deploy and scale models through a standard API).
57
+ - **Playground to test models and hardware** (compare performance and tiers before deployment).
58
+ - **Self-host or run in the cloud** (use your own infrastructure; keep data private).
59
+ - **Hardware support: NVIDIA (incl. Jetson), Apple Silicon, and edge targets** (NPUs, DSPs, and MCUs per model).
60
+ - **Comprehensive tutorials and documentation** (from setup to evaluation and production).
61
+
62
+ ---
63
+
64
+ # **Quickstart**
65
+
66
+ - Install CLI: `pip install thestage`
67
+ - Set token: `thestage config set --api-token <YOUR_API_TOKEN>` (get it in the web app)
68
+ - Use `elastic_models` in your code and choose a tier (S/M/L/XL). See Markdown version for a snippet.
69
+ - Diffusion and voice examples are in the docs.
70
+
71
+ ---
72
+
73
+ # **Serving**
74
+
75
+ OpenAI-compatible API flow with Modal is documented (single- and multi-GPU).
76
+
77
+ Start here: https://docs.thestage.ai/
78
+
79
+ ---
80
+
81
+ # **Supported hardware**
82
+
83
+ - NVIDIA GPUs (incl. Jetson where applicable)
84
+ - Apple Silicon
85
+ - Edge/embedded devices
86
+
87
+ ---