Instructions to use OmAlve/reading-steiner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OmAlve/reading-steiner with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="OmAlve/reading-steiner")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OmAlve/reading-steiner")
model = AutoModelForCausalLM.from_pretrained("OmAlve/reading-steiner")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OmAlve/reading-steiner with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OmAlve/reading-steiner"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OmAlve/reading-steiner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/OmAlve/reading-steiner

SGLang

How to use OmAlve/reading-steiner with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OmAlve/reading-steiner" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OmAlve/reading-steiner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OmAlve/reading-steiner" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OmAlve/reading-steiner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use OmAlve/reading-steiner with Docker Model Runner:
```
docker model run hf.co/OmAlve/reading-steiner
```

OmAlve commited on Apr 24

Commit

dd73e30

verified ·

1 Parent(s): 4cf1cb1

Training in progress, step 1500

Browse files

Files changed (3) hide show

README.md +32 -183
model.safetensors +1 -1
runs/Apr24_08-06-45_1eb67182ed08/events.out.tfevents.1777018005.1eb67182ed08.53075.0 +2 -2

README.md CHANGED Viewed

@@ -1,209 +1,58 @@
 ---
-language:
-- en
-license: apache-2.0
-license_link: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/LICENSE
-library_name: transformers
 base_model: Qwen/Qwen3-0.6B
 tags:
 - trl
 - sft
-- qwen3
-- web-extraction
-- indexlm
-- reading-steiner
-pipeline_tag: text-generation
----
-# Reading Steiner
-**Reading Steiner** is a **8192-token (8k) context** supervised fine-tuned (SFT) model for **index-based web content extraction**, in the spirit of [IndexLM](https://arxiv.org/abs/2512.06641). It reads a page as **numbered blocks** `[i] <tag>…</tag>` and predicts **inclusive index intervals** for either a **user query** (query-relevant) or **main body** text (main-content), as plain text like `[[2,4],[7,7]]` or `NA`.
-- **Context length:** **8192** tokens (`max_length` in training)
-- **Base model:** [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) (~596M parameters)
-- **Training data:** [OmAlve/reading-steiner-data](https://huggingface.co/datasets/OmAlve/reading-steiner-data) (`messages` SFT)
-- **Paper:** [An Index-based Approach for Efficient and Effective Web Content Extraction](https://arxiv.org/abs/2512.06641)
-## Intended use
-1. **Query-relevant (QE)** — blocks that support answering a question.
-2. **Main-content (ME)** — blocks that are the article body vs nav/ads/sidebars.
-You supply **blocks**; the model does not fetch URLs or parse raw HTML trees.
----
-## System prompts (training)
-### Query-relevant (QE)
-```
-You are Reading Steiner, a web content extraction model. Given a webpage split into indexed blocks and a user query, identify which blocks contain content relevant to the query.
-Each block is formatted as: [i] <tag>content</tag>
-Output the indices of relevant blocks as a Python list of [start, end] intervals (inclusive).
-If no relevant content exists, output 'NA'.
-Example output: [[2,4],[7,7],[10,12]]
-```
-### Main-content (ME)
-```
-You are Reading Steiner, a web content extraction model. Given a webpage split into indexed blocks, identify which blocks contain the main content of the page (filtering out navigation, advertisements, sidebars, and other non-content elements).
-Each block is formatted as: [i] <tag>content</tag>
-Output the indices of main content blocks as a Python list of [start, end] intervals (inclusive).
-If no main content exists, output 'NA'.
-Example output: [[1,3],[5,8],[11,15]]
-```
----
-## User message format
-### QE
-```text
-URL: <string>
-Query: <question>
-Blocks:
-<one block per line>
-Output the index intervals of blocks relevant to the query.
-```
-### ME
-```text
-URL: <string>
-Title: <page title>
-Blocks:
-<one block per line>
-Output the index intervals of main content blocks.
-```
 ---
-## Minimal examples (full `messages`)
-### Example A — QE
-| Role | Content |
-|------|---------|
-| system | *(QE system prompt above)* |
-| user | `URL: https://example.com/article\nQuery: What substance does the article say was detected?\n\nBlocks:\n[1] <nav>Home \| Science</nav>\n[2] <h1>Water on Mars</h1>\n[3] <p>Researchers reported trace amounts of perchlorate in regolith samples.</p>\n[4] <div class="ad">Subscribe for more space news</div>\n\nOutput the index intervals of blocks relevant to the query.` |
-| assistant | `[[3,3]]` |
-### Example B — ME
-| Role | Content |
-|------|---------|
-| system | *(ME system prompt above)* |
-| user | `URL: https://example.com/news\nTitle: Local river cleanup\n\nBlocks:\n[1] <nav>Home \| City \| Sports</nav>\n[2] <h1>Volunteers clear three tons of debris</h1>\n[3] <p>Organizers said turnout doubled last year's event.</p>\n[4] <p>The next cleanup is scheduled for May.</p>\n[5] <aside>Popular: Weather \| Traffic</aside>\n\nOutput the index intervals of main content blocks.` |
-| assistant | `[[2,4]]` |
-### Example C — QE (no answer)
-| Role | Content |
-|------|---------|
-| system | *(QE system prompt above)* |
-| user | `URL: https://example.com/page\nQuery: What is the stock price of ACME Corp?\n\nBlocks:\n[1] <h1>Baking tips</h1>\n[2] <p>Preheat the oven to 350°F.</p>\n\nOutput the index intervals of blocks relevant to the query.` |
-| assistant | `NA` |
----
-## Runnable inference (Transformers, 8k-capable checkpoint)
-Use **`enable_thinking=False`** on Qwen3 for stable interval-style completions.
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-import torch
-SYSTEM_QE = """You are Reading Steiner, a web content extraction model. Given a webpage split into indexed blocks and a user query, identify which blocks contain content relevant to the query.
-Each block is formatted as: [i] <tag>content</tag>
-Output the indices of relevant blocks as a Python list of [start, end] intervals (inclusive).
-If no relevant content exists, output 'NA'.
-Example output: [[2,4],[7,7],[10,12]]"""
-blocks = """[1] <nav>Home | Science</nav>
-[2] <h1>Water on Mars</h1>
-[3] <p>Researchers reported trace amounts of perchlorate in regolith samples.</p>
-[4] <div class="ad">Subscribe for more space news</div>"""
-user = f"""URL: https://example.com/article
-Query: What substance does the article say was detected?
-Blocks:
-{blocks}
-Output the index intervals of blocks relevant to the query."""
-messages = [
-    {"role": "system", "content": SYSTEM_QE},
-    {"role": "user", "content": user},
-]
-model_id = "OmAlve/reading-steiner"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id, torch_dtype=torch.bfloat16, device_map="auto"
-)
-inputs = tokenizer.apply_chat_template(
-    messages,
-    tokenize=True,
-    add_generation_prompt=True,
-    return_tensors="pt",
-    enable_thinking=False,
-).to(model.device)
-out = model.generate(inputs, max_new_tokens=128, do_sample=False)
-print(tokenizer.decode(out[0][inputs.shape[-1] :], skip_special_tokens=True))
 ```
-**ME variant** — same flow; replace system with the **ME** block above and user text with the **ME** template (`URL`, `Title`, `Blocks`, main-content closing line).
----
-## Training summary
-| Setting | Value |
-|--------|--------|
-| **Max sequence length** | **8192 (8k)** |
-| Objective | Causal LM SFT (`messages`) |
-| Learning rate | 2e-5, cosine, warmup 5% |
-| Epochs | 3 |
-| Precision | bf16, gradient checkpointing |
-| Eval / save | Every 500 steps; best by `eval_loss` |
-## Limitations
-Small 0.6B model — validate intervals; match training **system** + **user** layout for best behavior. Derivative of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B); follow its license.
 ## Citations
-```bibtex
-@article{indexlm2025,
-  title={An Index-based Approach for Efficient and Effective Web Content Extraction},
-  journal={arXiv preprint arXiv:2512.06641},
-  year={2025},
-  url={https://arxiv.org/abs/2512.06641}
-}
-```
 ```bibtex
 @misc{vonwerra2022trl,
-  title={{TRL: Transformer Reinforcement Learning}},
-  author={Leandro von Werra and others},
-  howpublished={\url{https://github.com/huggingface/trl}},
-  year={2022}
 }
-```

 ---
 base_model: Qwen/Qwen3-0.6B
+library_name: transformers
+model_name: reading-steiner
 tags:
+- generated_from_trainer
 - trl
 - sft
+licence: license
 ---
+# Model Card for reading-steiner
+This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
 ```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="OmAlve/reading-steiner", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
 ```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- TRL: 0.24.0
+- Transformers: 5.5.0
+- Pytorch: 2.5.1+cu124
+- Datasets: 4.3.0
+- Tokenizers: 0.22.2
 ## Citations
+Cite TRL as:
 ```bibtex
 @misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
 }
+```

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:180fcda07b8a0f440d74d4a3fe33cd648aeb70b70594d2a0e3a577ec65c388fe
 size 1192135096

 version https://git-lfs.github.com/spec/v1
+oid sha256:2db0901acf2ac1095e55f08a3d3d467aa4d41eafe77e386841c1d84ded32576b
 size 1192135096

runs/Apr24_08-06-45_1eb67182ed08/events.out.tfevents.1777018005.1eb67182ed08.53075.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f224409338f39b26c93a7665d42639c290999f162e5cd8df9e7ffb3e201a6b41
-size 45007

 version https://git-lfs.github.com/spec/v1
+oid sha256:acbdff9c3a12aa0ca1143ae6020e518d4b588e6212f95ca1c74192a762316dba
+size 64546