opensporks commited on
Commit
6f415b6
·
verified ·
1 Parent(s): 8f1c9ca

Update Readme

Browse files
Files changed (1) hide show
  1. README.md +18 -4
README.md CHANGED
@@ -5,20 +5,34 @@ base_model: meta-llama/Llama-3.2-3B-Instruct
5
  ---
6
 
7
  ## Model Overview
8
- Schematron is a long‑context extraction model for converting noisy HTML into clean, typed JSON that conforms to a user‑provided schema. It is purpose‑built for web scraping, data ingestion, and turning arbitrary pages into structured records.
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ## Highlights
11
  - **Schema-first extraction**: Strict, schema‑conformant JSON outputs
 
12
  - **Long context**: Robust to lengthy, noisy HTML (up to 128K tokens)
13
  - **Reliable structure**: Works well with JSON mode and typed parsers
14
- - **Variants**: Schematron‑8B (quality) and Schematron‑3B (cost)
15
 
16
  ## Model Details
17
  - **Family**: Schematron (3B and 8B)
18
  - **Base**: Instruction‑tuned LLM, fine‑tuned for schema‑guided extraction
19
  - **Context window**: Up to 128K tokens
20
- - **Input**: Raw or lightly cleaned HTML
21
- - **Output**: Strictly valid JSON matching your schema
22
 
23
  ## Minimal Quickstart
24
  Use these local snippets to prepare HTML and compose a schema‑guided prompt. The model returns strictly valid JSON; validate it against your schema downstream.
 
5
  ---
6
 
7
  ## Model Overview
8
+ Welcome to the Schematron series, [Inference.net's](https://inference.net/) long‑context extraction models specialized in converting noisy HTML into clean, typed JSON that conforms to your custom schema. The Schematron series was purpose‑trained for web scraping, data ingestion, and transforming arbitrary pages into structured records.
9
+
10
+ We're releasing these models in two different sizes:
11
+
12
+ - **Schematron‑8B** — marginal quality lift on harder/longer pages
13
+ - **Schematron‑3B** — recommended default; near‑parity quality at ~50% cost of Schematron-8B
14
+
15
+ > [!NOTE]
16
+ > This model card is dedicated to the smaller `Schematron-3B` model. Check out [`Schematron-8B`](https://huggingface.co/inference-net/Schematron-8B) for the larger model.
17
+
18
+ ## I/O at a glance
19
+ - **Input**: Cleaned HTML + JSON Schema (or typed model like Pydantic/Zod)
20
+ - **Output**: Strictly valid JSON conforming to the provided schema (no narration)
21
+
22
 
23
  ## Highlights
24
  - **Schema-first extraction**: Strict, schema‑conformant JSON outputs
25
+ - **Simple I/O**: HTML + schema → JSON
26
  - **Long context**: Robust to lengthy, noisy HTML (up to 128K tokens)
27
  - **Reliable structure**: Works well with JSON mode and typed parsers
28
+ - **Variants**: 3B (default, most costefficient) · 8B (marginal quality lift at ~2× cost)
29
 
30
  ## Model Details
31
  - **Family**: Schematron (3B and 8B)
32
  - **Base**: Instruction‑tuned LLM, fine‑tuned for schema‑guided extraction
33
  - **Context window**: Up to 128K tokens
34
+ - **Input**: Cleaned or raw HTML and a JSON Schema (or typed model)
35
+ - **Output**: Strict JSON that conforms to the provided schema
36
 
37
  ## Minimal Quickstart
38
  Use these local snippets to prepare HTML and compose a schema‑guided prompt. The model returns strictly valid JSON; validate it against your schema downstream.