davanstrien HF Staff Claude Opus 4.6 commited on
Commit
70ec35c
Β·
1 Parent(s): 261046a

Reorganize README around tasks, add missing repos, update links

Browse files

- Restructure from flat table to task-based categories (Document Processing, Vision, Text/LLM, Training, Analysis)
- Add sam3 and training repos, point training to unsloth/jobs
- Lead with copy-paste quick start commands (OCR, SAM3, Unsloth)
- Remove incomplete repos (deduplication, transformers-training, marimo) until ready
- Update HF Jobs docs link, remove unrelated astral-sh link

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +53 -74
README.md CHANGED
@@ -9,108 +9,87 @@ pinned: false
9
 
10
  # UV Scripts
11
 
12
- **Ready-to-run ML tools powered by UV - zero setup, maximum power**
13
 
14
- Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with `uv run`.
15
 
16
- ## What are UV scripts?
17
-
18
- UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically.
19
-
20
- Perfect for:
21
-
22
- - πŸš€ **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs)
23
- - πŸ’» **Local processing** on your machine
24
- - πŸ”„ **Reproducible pipelines** that work anywhere
25
-
26
- ## πŸš€ Quick Example
27
 
28
  ```bash
29
- # Extract text from images with state-of-the-art OCR (no local GPU needed!)
30
  hf jobs uv run --flavor l4x1 \
31
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
32
  your-images your-extracted-text
33
- ```
34
 
35
- ## πŸ“š Browse Scripts
 
 
 
 
 
 
 
 
 
 
 
36
 
37
- | Script Collection | Description | GPU Required |
38
- | ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ |
39
- | [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs (LaTeX, tables, forms) | βœ… |
40
- | [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with guaranteed valid outputs | βœ… |
41
- | [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files | ❌ |
42
- | [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance inference with vLLM | βœ… |
43
- | [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate high-quality synthetic data with CoT reasoning | βœ… |
44
- | [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication) | Remove duplicates using semantic similarity | ❌ |
45
- | [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses with visible reasoning traces | βœ… |
46
 
47
- ## 🎯 Why UV Scripts?
48
 
49
- ### Zero Setup
50
 
51
- No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script.
 
 
 
52
 
53
- ### GPU Optimized
54
 
55
- Seamlessly run on local GPUs or scale to cloud with [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Same script, different compute.
 
 
56
 
57
- ## 🌟 Featured Scripts
58
 
59
- ### OCR Any Document Dataset
 
 
 
 
 
60
 
61
- Extract text from images with state-of-the-art accuracy:
62
 
63
- ```bash
64
- # Handles LaTeX, tables, forms, handwriting
65
- hf jobs uv run --flavor l4x1 \
66
- https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
67
- your-images extracted-text
68
- ```
69
 
70
- ### Deduplicate Datasets (CPU-Friendly!)
71
 
72
- Remove duplicates using semantic similarity - no GPU needed:
 
 
 
73
 
74
- ```bash
75
- # Fast semantic deduplication on CPU
76
- uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
77
- your-dataset text your-dataset-clean \
78
- --method duplicates --threshold 0.9
79
- ```
80
 
81
- ### Generate Synthetic Training Data
82
-
83
- Create high-quality synthetic data with chain-of-thought reasoning:
84
 
85
  ```bash
86
- # Generate synthetic math problems with reasoning
87
- hf jobs uv run --flavor l4x1 \
88
- https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
89
- --seed-dataset math-examples --output-dataset synthetic-math \
90
- --task-type reasoning --num-samples 1000
91
- ```
92
-
93
- ## πŸš€ Getting Started with HF Jobs
94
-
95
- Run any UV script on GPU infrastructure:
96
-
97
- ```bash
98
- hf jobs uv run --flavor l4x1 \
99
- https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
100
  [args]
101
  ```
102
 
103
- Choose your GPU flavor:
104
  - `l4x1` - Good balance for most tasks
105
- - `a10g-large` - More memory for larger models
106
  - `a100-large` - Maximum performance
107
 
108
- ## πŸ“– Learn More
109
-
110
- - [UV Documentation](https://docs.astral.sh/uv/)
111
- - [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)
112
- - [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts)
113
-
114
- ---
115
 
116
- _UV Scripts is a community project showcasing the power of [UV](https://github.com/astral-sh/uv) for ML workflows._
 
 
 
9
 
10
  # UV Scripts
11
 
12
+ **Run ML workflows on GPU with a single command - no setup required**
13
 
14
+ Self-contained Python scripts powered by [UV](https://docs.astral.sh/uv/guides/scripts/) and [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Pick a task, copy the command, and you're running on cloud GPUs in seconds.
15
 
16
+ ## Get Started
 
 
 
 
 
 
 
 
 
 
17
 
18
  ```bash
19
+ # Extract text from document images
20
  hf jobs uv run --flavor l4x1 \
21
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
22
  your-images your-extracted-text
 
23
 
24
+ # Detect objects in any image dataset
25
+ hf jobs uv run --flavor a100-large \
26
+ -s HF_TOKEN=HF_TOKEN \
27
+ https://huggingface.co/datasets/uv-scripts/sam3/raw/main/detect-objects.py \
28
+ your-images detected-objects --class-name photograph
29
+
30
+ # Fine-tune an LLM with Unsloth
31
+ hf jobs uv run \
32
+ https://huggingface.co/datasets/unsloth/jobs/raw/main/sft-lfm2.5.py \
33
+ --flavor a10g-small --secrets HF_TOKEN \
34
+ -- --dataset mlabonne/FineTome-100k --output-repo your-username/my-model
35
+ ```
36
 
37
+ Every script works locally too - just replace `hf jobs uv run --flavor ...` with `uv run`.
 
 
 
 
 
 
 
 
38
 
39
+ ## Browse Scripts
40
 
41
+ ### Document Processing
42
 
43
+ | Script | What it does |
44
+ | --- | --- |
45
+ | [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs - handles LaTeX, tables, forms, handwriting |
46
+ | [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create HF datasets from PDFs and local files (CPU only) |
47
 
48
+ ### Computer Vision
49
 
50
+ | Script | What it does |
51
+ | --- | --- |
52
+ | [sam3](https://huggingface.co/datasets/uv-scripts/sam3) | Zero-shot object detection with natural language prompts using SAM3 |
53
 
54
+ ### Text Generation & Classification
55
 
56
+ | Script | What it does |
57
+ | --- | --- |
58
+ | [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance GPU inference with vLLM (classification, VLM tasks) |
59
+ | [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with structured outputs and guaranteed valid labels |
60
+ | [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses using OpenAI's open-source reasoning models |
61
+ | [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate synthetic training data with chain-of-thought reasoning |
62
 
63
+ ### Model Training
64
 
65
+ | Script | What it does |
66
+ | --- | --- |
67
+ | [unsloth/jobs](https://huggingface.co/datasets/unsloth/jobs) | Fine-tune LLMs and VLMs with Unsloth - LFM2.5, Qwen3-VL, Gemma3, continued pretraining |
 
 
 
68
 
69
+ ### Analysis & Visualization
70
 
71
+ | Script | What it does |
72
+ | --- | --- |
73
+ | [dataset-stats](https://huggingface.co/datasets/uv-scripts/dataset-stats) | Analyze dataset statistics with streaming and Polars (CPU only) |
74
+ | [build-atlas](https://huggingface.co/datasets/uv-scripts/build-atlas) | Generate interactive embedding visualizations with Apple's Atlas |
75
 
76
+ ## Running on HF Jobs
 
 
 
 
 
77
 
78
+ All GPU scripts are designed to run on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs):
 
 
79
 
80
  ```bash
81
+ hf jobs uv run --flavor <gpu> \
82
+ https://huggingface.co/datasets/uv-scripts/<collection>/raw/main/<script>.py \
 
 
 
 
 
 
 
 
 
 
 
 
83
  [args]
84
  ```
85
 
86
+ Available GPU flavors:
87
  - `l4x1` - Good balance for most tasks
88
+ - `a10g-large` - More VRAM for larger models
89
  - `a100-large` - Maximum performance
90
 
91
+ ## Learn More
 
 
 
 
 
 
92
 
93
+ - [UV Script Documentation](https://docs.astral.sh/uv/guides/scripts/)
94
+ - [HF Jobs Documentation](https://huggingface.co/docs/hub/jobs)
95
+ - [HF Jobs Python Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)