Spaces:
Running
Running
Commit Β·
70ec35c
1
Parent(s): 261046a
Reorganize README around tasks, add missing repos, update links
Browse files- Restructure from flat table to task-based categories (Document Processing, Vision, Text/LLM, Training, Analysis)
- Add sam3 and training repos, point training to unsloth/jobs
- Lead with copy-paste quick start commands (OCR, SAM3, Unsloth)
- Remove incomplete repos (deduplication, transformers-training, marimo) until ready
- Update HF Jobs docs link, remove unrelated astral-sh link
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
README.md
CHANGED
|
@@ -9,108 +9,87 @@ pinned: false
|
|
| 9 |
|
| 10 |
# UV Scripts
|
| 11 |
|
| 12 |
-
**
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
##
|
| 17 |
-
|
| 18 |
-
UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically.
|
| 19 |
-
|
| 20 |
-
Perfect for:
|
| 21 |
-
|
| 22 |
-
- π **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs)
|
| 23 |
-
- π» **Local processing** on your machine
|
| 24 |
-
- π **Reproducible pipelines** that work anywhere
|
| 25 |
-
|
| 26 |
-
## π Quick Example
|
| 27 |
|
| 28 |
```bash
|
| 29 |
-
# Extract text from
|
| 30 |
hf jobs uv run --flavor l4x1 \
|
| 31 |
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
|
| 32 |
your-images your-extracted-text
|
| 33 |
-
```
|
| 34 |
|
| 35 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
| ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ |
|
| 39 |
-
| [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs (LaTeX, tables, forms) | β
|
|
| 40 |
-
| [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with guaranteed valid outputs | β
|
|
| 41 |
-
| [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files | β |
|
| 42 |
-
| [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance inference with vLLM | β
|
|
| 43 |
-
| [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate high-quality synthetic data with CoT reasoning | β
|
|
| 44 |
-
| [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication) | Remove duplicates using semantic similarity | β |
|
| 45 |
-
| [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses with visible reasoning traces | β
|
|
| 46 |
|
| 47 |
-
##
|
| 48 |
|
| 49 |
-
###
|
| 50 |
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
###
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
##
|
| 58 |
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
|
| 67 |
-
your-images extracted-text
|
| 68 |
-
```
|
| 69 |
|
| 70 |
-
###
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
|
| 75 |
-
# Fast semantic deduplication on CPU
|
| 76 |
-
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
|
| 77 |
-
your-dataset text your-dataset-clean \
|
| 78 |
-
--method duplicates --threshold 0.9
|
| 79 |
-
```
|
| 80 |
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
Create high-quality synthetic data with chain-of-thought reasoning:
|
| 84 |
|
| 85 |
```bash
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
|
| 89 |
-
--seed-dataset math-examples --output-dataset synthetic-math \
|
| 90 |
-
--task-type reasoning --num-samples 1000
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
## π Getting Started with HF Jobs
|
| 94 |
-
|
| 95 |
-
Run any UV script on GPU infrastructure:
|
| 96 |
-
|
| 97 |
-
```bash
|
| 98 |
-
hf jobs uv run --flavor l4x1 \
|
| 99 |
-
https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
|
| 100 |
[args]
|
| 101 |
```
|
| 102 |
|
| 103 |
-
|
| 104 |
- `l4x1` - Good balance for most tasks
|
| 105 |
-
- `a10g-large` - More
|
| 106 |
- `a100-large` - Maximum performance
|
| 107 |
|
| 108 |
-
##
|
| 109 |
-
|
| 110 |
-
- [UV Documentation](https://docs.astral.sh/uv/)
|
| 111 |
-
- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)
|
| 112 |
-
- [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts)
|
| 113 |
-
|
| 114 |
-
---
|
| 115 |
|
| 116 |
-
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
# UV Scripts
|
| 11 |
|
| 12 |
+
**Run ML workflows on GPU with a single command - no setup required**
|
| 13 |
|
| 14 |
+
Self-contained Python scripts powered by [UV](https://docs.astral.sh/uv/guides/scripts/) and [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Pick a task, copy the command, and you're running on cloud GPUs in seconds.
|
| 15 |
|
| 16 |
+
## Get Started
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
```bash
|
| 19 |
+
# Extract text from document images
|
| 20 |
hf jobs uv run --flavor l4x1 \
|
| 21 |
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
|
| 22 |
your-images your-extracted-text
|
|
|
|
| 23 |
|
| 24 |
+
# Detect objects in any image dataset
|
| 25 |
+
hf jobs uv run --flavor a100-large \
|
| 26 |
+
-s HF_TOKEN=HF_TOKEN \
|
| 27 |
+
https://huggingface.co/datasets/uv-scripts/sam3/raw/main/detect-objects.py \
|
| 28 |
+
your-images detected-objects --class-name photograph
|
| 29 |
+
|
| 30 |
+
# Fine-tune an LLM with Unsloth
|
| 31 |
+
hf jobs uv run \
|
| 32 |
+
https://huggingface.co/datasets/unsloth/jobs/raw/main/sft-lfm2.5.py \
|
| 33 |
+
--flavor a10g-small --secrets HF_TOKEN \
|
| 34 |
+
-- --dataset mlabonne/FineTome-100k --output-repo your-username/my-model
|
| 35 |
+
```
|
| 36 |
|
| 37 |
+
Every script works locally too - just replace `hf jobs uv run --flavor ...` with `uv run`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
## Browse Scripts
|
| 40 |
|
| 41 |
+
### Document Processing
|
| 42 |
|
| 43 |
+
| Script | What it does |
|
| 44 |
+
| --- | --- |
|
| 45 |
+
| [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs - handles LaTeX, tables, forms, handwriting |
|
| 46 |
+
| [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create HF datasets from PDFs and local files (CPU only) |
|
| 47 |
|
| 48 |
+
### Computer Vision
|
| 49 |
|
| 50 |
+
| Script | What it does |
|
| 51 |
+
| --- | --- |
|
| 52 |
+
| [sam3](https://huggingface.co/datasets/uv-scripts/sam3) | Zero-shot object detection with natural language prompts using SAM3 |
|
| 53 |
|
| 54 |
+
### Text Generation & Classification
|
| 55 |
|
| 56 |
+
| Script | What it does |
|
| 57 |
+
| --- | --- |
|
| 58 |
+
| [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance GPU inference with vLLM (classification, VLM tasks) |
|
| 59 |
+
| [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with structured outputs and guaranteed valid labels |
|
| 60 |
+
| [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses using OpenAI's open-source reasoning models |
|
| 61 |
+
| [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate synthetic training data with chain-of-thought reasoning |
|
| 62 |
|
| 63 |
+
### Model Training
|
| 64 |
|
| 65 |
+
| Script | What it does |
|
| 66 |
+
| --- | --- |
|
| 67 |
+
| [unsloth/jobs](https://huggingface.co/datasets/unsloth/jobs) | Fine-tune LLMs and VLMs with Unsloth - LFM2.5, Qwen3-VL, Gemma3, continued pretraining |
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
+
### Analysis & Visualization
|
| 70 |
|
| 71 |
+
| Script | What it does |
|
| 72 |
+
| --- | --- |
|
| 73 |
+
| [dataset-stats](https://huggingface.co/datasets/uv-scripts/dataset-stats) | Analyze dataset statistics with streaming and Polars (CPU only) |
|
| 74 |
+
| [build-atlas](https://huggingface.co/datasets/uv-scripts/build-atlas) | Generate interactive embedding visualizations with Apple's Atlas |
|
| 75 |
|
| 76 |
+
## Running on HF Jobs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
All GPU scripts are designed to run on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs):
|
|
|
|
|
|
|
| 79 |
|
| 80 |
```bash
|
| 81 |
+
hf jobs uv run --flavor <gpu> \
|
| 82 |
+
https://huggingface.co/datasets/uv-scripts/<collection>/raw/main/<script>.py \
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
[args]
|
| 84 |
```
|
| 85 |
|
| 86 |
+
Available GPU flavors:
|
| 87 |
- `l4x1` - Good balance for most tasks
|
| 88 |
+
- `a10g-large` - More VRAM for larger models
|
| 89 |
- `a100-large` - Maximum performance
|
| 90 |
|
| 91 |
+
## Learn More
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
+
- [UV Script Documentation](https://docs.astral.sh/uv/guides/scripts/)
|
| 94 |
+
- [HF Jobs Documentation](https://huggingface.co/docs/hub/jobs)
|
| 95 |
+
- [HF Jobs Python Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)
|