raipolymath
/

triton-windows

@@ -1,135 +1,54 @@
----
-license: mit
-title: Triton Windows Pre-built Binaries
-sdk: static
-emoji: ⚡
-colorFrom: blue
-colorTo: green
-tags:
-  - triton
-  - windows
-  - cuda
-  - deep-learning
-  - gpu
----
-# Triton for Windows - Pre-built Binaries
-**Developed by Dr. Ankush Rai** | [GitHub](https://github.com/RaiAnk)
-This repository hosts pre-built Windows binaries for [OpenAI Triton](https://github.com/triton-lang/triton), a language and compiler for writing highly efficient custom Deep-Learning primitives.
-## Available Downloads
-### Triton Wheels (Python)
-| Python | CUDA | Wheel | Size |
-|--------|------|-------|------|
-| 3.11 | 12.x | `wheels/triton-3.2.0+windows-cp311-cp311-win_amd64.whl` | ~100MB |
-### Pre-built LLVM (Optional)
-If you want to build Triton from source or develop custom kernels:
-| File | Description | Size |
-|------|-------------|------|
-| `llvm/llvm-triton-windows-x64.zip` | LLVM with MLIR support | ~800MB |
-## Installation
-### Quick Install
-```bash
-# 1. Download the wheel for your Python version
-# 2. Install with pip:
-pip install triton-3.2.0+windows-cp311-cp311-win_amd64.whl
-```
-### Using the Installer Script
-```bash
-# Clone the source repository
-git clone https://github.com/RaiAnk/triton-windows.git
-cd triton-windows
-# Run installer (downloads from this HuggingFace repo)
-python install_triton_windows.py
-```
-## Requirements
-- **Windows 10/11 x64**
-- **Python 3.10, 3.11, or 3.12**
-- **NVIDIA CUDA Toolkit 12.x**
-- **PyTorch 2.4+ with CUDA support**
-## Verify Installation
-```python
-import triton
-import triton.language as tl
-import torch
-@triton.jit
-def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
-    pid = tl.program_id(axis=0)
-    block_start = pid * BLOCK_SIZE
-    offsets = block_start + tl.arange(0, BLOCK_SIZE)
-    mask = offsets < n_elements
-    x = tl.load(x_ptr + offsets, mask=mask)
-    y = tl.load(y_ptr + offsets, mask=mask)
-    output = x + y
-    tl.store(output_ptr + offsets, output, mask=mask)
-# Test
-size = 1024
-x = torch.rand(size, device='cuda')
-y = torch.rand(size, device='cuda')
-output = torch.empty_like(x)
-grid = lambda meta: (triton.cdiv(size, meta['BLOCK_SIZE']),)
-add_kernel[grid](x, y, output, size, BLOCK_SIZE=256)
-print("Triton is working!" if torch.allclose(output, x + y) else "Error!")
-```
-## Source Code
-The modified Triton source code with Windows support is available at:
-- **GitHub**: https://github.com/RaiAnk/triton-windows
-## Build from Source
-If you need to build from source (for unsupported Python versions or custom modifications):
-1. Download pre-built LLVM from this repository
-2. Extract to `C:\llvm-triton`
-3. Set environment variables:
-   ```cmd
-   set LLVM_SYSPATH=C:\llvm-triton
-   set LLVM_LIBRARY_DIR=C:\llvm-triton\lib
-   set LLVM_INCLUDE_DIRS=C:\llvm-triton\include
-   ```
-4. Build Triton:
-   ```cmd
-   pip install -e .
-   ```
-## Known Limitations
-- AMD ROCm is not supported on Windows
-- Some advanced features may have limited functionality
-- Proton profiler has limited Windows support
-## License
-Triton is licensed under the MIT License. See the [original repository](https://github.com/triton-lang/triton) for details.
-## Credits
-- **Original Triton**: OpenAI and the Triton team
-- **Windows Port**: Dr. Ankush Rai
-## Support
-For issues with Windows support, please open an issue on the GitHub repository.

+---
+license: mit
+tags:
+- triton
+- windows
+- nvidia
+- gpu
+- cuda
+- compiler
+---
+# Triton for Windows (NVIDIA)
+**Pre-built Windows wheels for Triton - NVIDIA GPUs only**
+## Author
+**Dr. Ankush Rai**
+- [GitHub](https://github.com/RaiAnk)
+- [LinkedIn](https://www.linkedin.com/in/dr-ankush-rai-80722021a/)
+- [YouTube](https://www.youtube.com/@AnkushRai_polymath)
+## Source Code
+**GitHub:** [https://github.com/RaiAnk/triton-windows-nvidia](https://github.com/RaiAnk/triton-windows-nvidia)
+## Installation
+Defaulting to user installation because normal site-packages is not writeable
+Collecting triton==3.6.0+git84bd6d54
+  Downloading https://huggingface.co/raipolymath/triton-windows/resolve/main/triton-3.6.0+git84bd6d54-cp311-cp311-win_amd64.whl (2026.4 MB)
+     ---------------------------------------- 2.0/2.0 GB 11.7 MB/s eta 0:00:00
+Installing collected packages: triton
+Successfully installed triton-3.6.0+git84bd6d54
+## Requirements
+- Windows 10/11 (64-bit)
+- Python 3.11
+- NVIDIA GPU (Compute Capability 8.0+)
+- CUDA Toolkit 11.8+
+## About Triton
+Triton was originally created by [Philippe Tillet](https://github.com/ptillet) as a research project at **Harvard University**. The project gained significant momentum when he joined OpenAI, who released it as an open-source Python-based DSL in 2021.
+This Windows port focuses on **NVIDIA GPUs only** (AMD backend disabled due to LLVM compatibility issues on Windows).
+## Acknowledgments
+- **Philippe Tillet** - Creator of Triton (Harvard University)
+- **OpenAI** - For open-sourcing and continued development
+- **LLVM Project** - Compiler infrastructure
+- **NVIDIA** - CUDA toolkit