LayoutLMv3 Table of Contents Detector

This model is a fine-tuned version of microsoft/layoutlmv3-base for detecting Table of Contents (TOC) pages in documents.

Model Description

Model type: LayoutLMv3 for binary sequence classification
Language: English (but works with multiple languages)
Task: Binary classification (TOC vs non-TOC page)
Base model: microsoft/layoutlmv3-base

Training Data

The model was fine-tuned on a custom dataset of 54 document pages:

TOC pages: 27 examples
Non-TOC pages: 27 examples
Sources: Various books and academic documents
Balance: Perfectly balanced (50/50)

The dataset includes:

Traditional TOC with page numbers (right-aligned)
Hierarchical TOC with chapter numbers (1, 1.1, 1.1.1)
Various formatting styles
Multiple languages and document types

Training Procedure

Training Hyperparameters

Epochs: 10
Batch size: 1 (with gradient accumulation of 4 steps)
Learning rate: 2e-5 with linear warmup
Optimizer: AdamW
Device: NVIDIA GeForce RTX 3050 4GB
Training time: ~2 minutes
Date: February 21, 2026

Training Results

Epoch	Train Loss	Train Acc	Val Loss	Val Accuracy
1	0.6768	59.26%	0.6706	57.14%
3	0.6045	81.48%	0.6031	71.43%
6	0.1850	92.59%	0.5292	85.71%
7	0.1001	96.30%	0.0830	100.00%
10	0.0048	100.00%	0.0058	100.00%

Final Test Metrics:

Overall Accuracy: 100.00% (54/54 correct)
TOC Detection: 100.00% (27/27 correct)
Non-TOC Detection: 100.00% (27/27 correct)
Best Epoch: Epoch 7

Comparison with Baseline

Method	Dataset	Accuracy	Speed
Rule-based (original)	N/A	85.3%	17.7s
LayoutLMv3 (this model)	54 pages	100.00% ✨	3.1s

This model is 5.7x faster and 14.7% more accurate than the rule-based approach.

Intended Use

Primary Use Case

Detecting whether a given document page is a Table of Contents page. This is useful for:

Document structure analysis
Automatic TOC extraction
Document navigation systems
Book/paper digitization pipelines

How to Use

from transformers import LayoutLMv3Processor, LayoutLMv3ForSequenceClassification
from PIL import Image
from doctr.models import ocr_predictor
from doctr.io import DocumentFile

# Load model and processor
model = LayoutLMv3ForSequenceClassification.from_pretrained("ssppkenny/layoutlmv3-toc-detector")
processor = LayoutLMv3Processor.from_pretrained("ssppkenny/layoutlmv3-toc-detector")

# Load and OCR image
image = Image.open("page.png").convert("RGB")
ocr_model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images("page.png")
result = ocr_model(doc)

# Extract words and boxes
words, boxes = [], []
doc_dict = result.export()
w, h = image.size

for page in doc_dict['pages']:
    for block in page['blocks']:
        for line in block['lines']:
            for word_data in line['words']:
                text = word_data['value'].strip()
                if text:
                    geometry = word_data['geometry']
                    x0 = int(geometry[0][0] * w)
                    y0 = int(geometry[0][1] * h)
                    x1 = int(geometry[1][0] * w)
                    y1 = int(geometry[1][1] * h)
                    words.append(text)
                    boxes.append([
                        int((x0 / w) * 1000),
                        int((y0 / h) * 1000),
                        int((x1 / w) * 1000),
                        int((y1 / h) * 1000)
                    ])

# Prepare input
encoding = processor(image, words, boxes=boxes, return_tensors="pt", 
                     padding="max_length", truncation=True, max_length=512)

# Predict
outputs = model(**encoding)
prediction = torch.argmax(outputs.logits, dim=1).item()
confidence = torch.softmax(outputs.logits, dim=1)[0][prediction].item()

print(f"Is TOC: {prediction == 1}")
print(f"Confidence: {confidence:.2%}")

Full Integration Example

For a complete document reflow system using this model, see: https://github.com/ssppkenny/segmentation

Limitations

Training data size: Only 34 examples - may not generalize to all TOC styles
Language: Primarily trained on English documents
Page quality: Best results with clear, high-quality scans
False positives: May misclassify pages with numbered lists as TOC

Bias and Fairness

The model was trained on a diverse set of document types (academic papers, books, technical documents) but may have biases toward:

Western document formatting conventions
English language documents
Modern typography

Citation

If you use this model, please cite:

@misc{layoutlmv3-toc-detector,
  author = {Sergey},
  title = {LayoutLMv3 Table of Contents Detector},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ssppkenny/layoutlmv3-toc-detector}},
}

License

MIT License - Free for commercial and non-commercial use

Acknowledgments

Base model: Microsoft LayoutLMv3
OCR: mindee/doctr
Training framework: HuggingFace Transformers

Contact

For issues or questions:

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Accuracy
self-reported

0.882