Update Readme
Browse files
README.md
CHANGED
|
@@ -5,20 +5,34 @@ base_model: meta-llama/Llama-3.2-3B-Instruct
|
|
| 5 |
---
|
| 6 |
|
| 7 |
## Model Overview
|
| 8 |
-
Schematron
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
## Highlights
|
| 11 |
- **Schema-first extraction**: Strict, schema‑conformant JSON outputs
|
|
|
|
| 12 |
- **Long context**: Robust to lengthy, noisy HTML (up to 128K tokens)
|
| 13 |
- **Reliable structure**: Works well with JSON mode and typed parsers
|
| 14 |
-
- **Variants**:
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
- **Family**: Schematron (3B and 8B)
|
| 18 |
- **Base**: Instruction‑tuned LLM, fine‑tuned for schema‑guided extraction
|
| 19 |
- **Context window**: Up to 128K tokens
|
| 20 |
-
- **Input**:
|
| 21 |
-
- **Output**:
|
| 22 |
|
| 23 |
## Minimal Quickstart
|
| 24 |
Use these local snippets to prepare HTML and compose a schema‑guided prompt. The model returns strictly valid JSON; validate it against your schema downstream.
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
## Model Overview
|
| 8 |
+
Welcome to the Schematron series, [Inference.net's](https://inference.net/) long‑context extraction models specialized in converting noisy HTML into clean, typed JSON that conforms to your custom schema. The Schematron series was purpose‑trained for web scraping, data ingestion, and transforming arbitrary pages into structured records.
|
| 9 |
+
|
| 10 |
+
We're releasing these models in two different sizes:
|
| 11 |
+
|
| 12 |
+
- **Schematron‑8B** — marginal quality lift on harder/longer pages
|
| 13 |
+
- **Schematron‑3B** — recommended default; near‑parity quality at ~50% cost of Schematron-8B
|
| 14 |
+
|
| 15 |
+
> [!NOTE]
|
| 16 |
+
> This model card is dedicated to the smaller `Schematron-3B` model. Check out [`Schematron-8B`](https://huggingface.co/inference-net/Schematron-8B) for the larger model.
|
| 17 |
+
|
| 18 |
+
## I/O at a glance
|
| 19 |
+
- **Input**: Cleaned HTML + JSON Schema (or typed model like Pydantic/Zod)
|
| 20 |
+
- **Output**: Strictly valid JSON conforming to the provided schema (no narration)
|
| 21 |
+
|
| 22 |
|
| 23 |
## Highlights
|
| 24 |
- **Schema-first extraction**: Strict, schema‑conformant JSON outputs
|
| 25 |
+
- **Simple I/O**: HTML + schema → JSON
|
| 26 |
- **Long context**: Robust to lengthy, noisy HTML (up to 128K tokens)
|
| 27 |
- **Reliable structure**: Works well with JSON mode and typed parsers
|
| 28 |
+
- **Variants**: 3B (default, most cost‑efficient) · 8B (marginal quality lift at ~2× cost)
|
| 29 |
|
| 30 |
## Model Details
|
| 31 |
- **Family**: Schematron (3B and 8B)
|
| 32 |
- **Base**: Instruction‑tuned LLM, fine‑tuned for schema‑guided extraction
|
| 33 |
- **Context window**: Up to 128K tokens
|
| 34 |
+
- **Input**: Cleaned or raw HTML and a JSON Schema (or typed model)
|
| 35 |
+
- **Output**: Strict JSON that conforms to the provided schema
|
| 36 |
|
| 37 |
## Minimal Quickstart
|
| 38 |
Use these local snippets to prepare HTML and compose a schema‑guided prompt. The model returns strictly valid JSON; validate it against your schema downstream.
|