adidasguccinike commited on
Commit
b7e45f2
·
verified ·
1 Parent(s): a7db0f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -73
README.md CHANGED
@@ -7,81 +7,14 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- # **TheStage AI Platform**
11
 
12
- Inference optimization for LLMs, diffusion, and voice. Self-hosted or cloud. Works on NVIDIA GPUs, Apple Silicon, and edge devices.
13
 
14
- **Links:**
15
 
16
- [Web App](https://app.thestage.ai/) [Docs](https://docs.thestage.ai/) [Hugging Face](https://huggingface.co/TheStageAI) [X](https://x.com/TheStageAI) [LinkedIn](https://www.linkedin.com/company/thestageai) [Discord](mailto:sergey@thestage.ai) (request invite) [Email](mailto:support@thestage.ai)
17
 
18
- ---
19
-
20
- # **What is TheStage AI**
21
-
22
- TheStage AI is an inference optimization stack. It helps you compress, compile, and serve models. You keep control of the accuracy versus performance trade-off.
23
-
24
- ---
25
-
26
- # **Products / Components**
27
-
28
- - [**ANNA (Automatic Neural Network Acceleration)**](https://docs.thestage.ai/qlip/docs/source/anna_api.html)
29
-
30
- Automated compression analysis under user-defined constraints (size, MACs, latency, memory). Outputs a QlipConfig for compile and serve.
31
-
32
- - [**Qlip**](https://docs.thestage.ai/qlip/docs/source/index.html)
33
-
34
- Full-stack optimization and inference framework. Quantization, sparsification, and compilation for NVIDIA GPUs (Apple Silicon supported). Produces pre-compiled (non-JIT) artifacts with dynamic shapes and mixed precision. Triton-based serving.
35
-
36
- - [**Elastic Models**](https://docs.thestage.ai/tutorials/source/elastic_transformers.html)
37
-
38
- Qlip-optimized models with S / M / L / XL performance tiers (availability varies). L/M/S may include quantization or pruning for faster inference.
39
-
40
- - [**TheStage CLI**](https://docs.thestage.ai/platform/src/thestage-ai-cli.html)
41
-
42
- Manage projects, tokens, and hardware from the terminal. Launch/monitor jobs, rent instances, and stream logs.
43
-
44
- - [**TheStage Platform**](https://app.thestage.ai/)
45
-
46
- Web UI and APIs for instances, models, and deployments. Includes the [**Playground**](https://app.thestage.ai/) to test Elastic Models, switch hardware, and compare tiers before deployment.
47
-
48
-
49
- ---
50
-
51
- # **Key features**
52
-
53
- - **Elastic Models with S/M/L/XL tiers per model** (choose cost, quality, and memory balance; availability varies).
54
- - **ANNA constraint-driven compression analysis** (outputs a QlipConfig for compile and serve).
55
- - **Qlip compiler and runtime** (pre-compiled engines; no runtime JIT; dynamic shapes; mixed precision).
56
- - **OpenAI-compatible HTTP serving** (deploy and scale models through a standard API).
57
- - **Playground to test models and hardware** (compare performance and tiers before deployment).
58
- - **Self-host or run in the cloud** (use your own infrastructure; keep data private).
59
- - **Hardware support: NVIDIA (incl. Jetson), Apple Silicon, and edge targets** (NPUs, DSPs, and MCUs per model).
60
- - **Comprehensive tutorials and documentation** (from setup to evaluation and production).
61
-
62
- ---
63
 
64
- # **Quickstart**
65
-
66
- - Install CLI: `pip install thestage`
67
- - Set token: `thestage config set --api-token <YOUR_API_TOKEN>` (get it in the web app)
68
- - Use `elastic_models` in your code and choose a tier (S/M/L/XL). See Markdown version for a snippet.
69
- - Diffusion and voice examples are in the docs.
70
-
71
- ---
72
-
73
- # **Serving**
74
-
75
- OpenAI-compatible API flow with Modal is documented (single- and multi-GPU).
76
-
77
- Start here: https://docs.thestage.ai/
78
-
79
- ---
80
-
81
- # **Supported hardware**
82
-
83
- - NVIDIA GPUs (incl. Jetson where applicable)
84
- - Apple Silicon
85
- - Edge/embedded devices
86
-
87
- ---
 
7
  pinned: false
8
  ---
9
 
10
+ TheStage AI is a team of AI researchers and engineers focused on efficient model inference.
11
 
12
+ On Hugging Face, we publish **Elastic** checkpoints (Transformers, Diffusers, ASR) optimized for fast, cost-efficient serving.
13
 
14
+ Links: [Docs](https://docs.thestage.ai) • [Web app](https://app.thestage.ai)
15
 
16
+ Elastic Models are released as **ready-to-run checkpoints** and performance tiers (**S / M / L / XL**) so you can choose the best balance of quality, latency, and memory for your workload across NVIDIA GPUs (incl. Jetson), Apple Silicon, and edge devices.
17
 
18
+ Pipeline (what makes these HF releases): **ANNA** (analyzes constraints & targets) • **Qlip** (quantizes/sparsifies + compiles for optimized runtime & serving) • **CLI/Platform** (run jobs, deploy, benchmark).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ Community: [X](https://x.com/TheStageAI) • [Discord (request invite)](sergey@thestage.ai) • [YouTube](https://www.youtube.com/@KirillSolodskikh) • [LinkedIn](https://www.linkedin.com/company/thestageai/)