Spaces:

TheStageAI
/

README

Running

App Files Files Community

README / README.md

hypothetical

Update README.md (#2)

a7db0f1 verified 14 days ago

preview code

raw

history blame contribute delete

3.51 kB

	---
	title: README
	emoji: 🏢
	colorFrom: yellow
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# TheStage AI Platform

	Inference optimization for LLMs, diffusion, and voice. Self-hosted or cloud. Works on NVIDIA GPUs, Apple Silicon, and edge devices.

	Links:

	[Web App](https://app.thestage.ai/) • [Docs](https://docs.thestage.ai/) • [Hugging Face](https://huggingface.co/TheStageAI) • [X](https://x.com/TheStageAI) • [LinkedIn](https://www.linkedin.com/company/thestageai) • [Discord](mailto:sergey@thestage.ai) (request invite) • [Email](mailto:support@thestage.ai)

	---

	# What is TheStage AI

	TheStage AI is an inference optimization stack. It helps you compress, compile, and serve models. You keep control of the accuracy versus performance trade-off.

	---

	# Products / Components

	- [ANNA (Automatic Neural Network Acceleration)](https://docs.thestage.ai/qlip/docs/source/anna_api.html)

	Automated compression analysis under user-defined constraints (size, MACs, latency, memory). Outputs a QlipConfig for compile and serve.

	- [Qlip](https://docs.thestage.ai/qlip/docs/source/index.html)

	Full-stack optimization and inference framework. Quantization, sparsification, and compilation for NVIDIA GPUs (Apple Silicon supported). Produces pre-compiled (non-JIT) artifacts with dynamic shapes and mixed precision. Triton-based serving.

	- [Elastic Models](https://docs.thestage.ai/tutorials/source/elastic_transformers.html)

	Qlip-optimized models with S / M / L / XL performance tiers (availability varies). L/M/S may include quantization or pruning for faster inference.

	- [TheStage CLI](https://docs.thestage.ai/platform/src/thestage-ai-cli.html)

	Manage projects, tokens, and hardware from the terminal. Launch/monitor jobs, rent instances, and stream logs.

	- [TheStage Platform](https://app.thestage.ai/)

	Web UI and APIs for instances, models, and deployments. Includes the [Playground](https://app.thestage.ai/) to test Elastic Models, switch hardware, and compare tiers before deployment.


	---

	# Key features

	- Elastic Models with S/M/L/XL tiers per model (choose cost, quality, and memory balance; availability varies).
	- ANNA constraint-driven compression analysis (outputs a QlipConfig for compile and serve).
	- Qlip compiler and runtime (pre-compiled engines; no runtime JIT; dynamic shapes; mixed precision).
	- OpenAI-compatible HTTP serving (deploy and scale models through a standard API).
	- Playground to test models and hardware (compare performance and tiers before deployment).
	- Self-host or run in the cloud (use your own infrastructure; keep data private).
	- Hardware support: NVIDIA (incl. Jetson), Apple Silicon, and edge targets (NPUs, DSPs, and MCUs per model).
	- Comprehensive tutorials and documentation (from setup to evaluation and production).

	---

	# Quickstart

	- Install CLI: `pip install thestage`
	- Set token: `thestage config set --api-token <YOUR_API_TOKEN>` (get it in the web app)
	- Use `elastic_models` in your code and choose a tier (S/M/L/XL). See Markdown version for a snippet.
	- Diffusion and voice examples are in the docs.

	---

	# Serving

	OpenAI-compatible API flow with Modal is documented (single- and multi-GPU).

	Start here: https://docs.thestage.ai/

	---

	# Supported hardware

	- NVIDIA GPUs (incl. Jetson where applicable)
	- Apple Silicon
	- Edge/embedded devices

	---