OptiTransfer Data

Premium web corpora for LLM pre-training, fine-tuning, RAG, and multilingual NLP. Swiss-registered. EU AI Act compliant. Quality-scored, PII-redacted, SHA256-verified.

Swiss-Registered EU AI Act Compliant SHA256 Verified PII Redacted

Capabilities

LLM Training

Sovereign national web corpora at scale for pre-training and supervised fine-tuning

RAG Pipelines

Pre-chunked, embedding-ready corpora with quality scores per chunk

Regulatory NLP

Domain-classified, jurisdiction-specific government and institutional data

Research

Reproducible datasets with full metadata, provenance tracking, and QA reports

Available Datasets

Dataset Records Formats Access
*.ch Swiss Web Premium (A+) 110,491 Parquet, JSONL, Language Splits, RAG Chunks Sample | Full

Flagship Swiss web corpus from the .ch ccTLD. 112.4M tokens across 78 fields. Multilingual coverage: German (61.2%), French (19.0%), English (10.5%), Italian (4.7%), and 25 additional languages. Nine-component quality model, full provenance chain, and independent QA report.

LLM Pre-Training Supervised Fine-Tuning (SFT) Retrieval-Augmented Generation Multilingual NLP German Language Models French Language Models Swiss Market AI EU AI Act Compliance Domain-Specific Training Web Corpus Research Text Classification Summarisation Question Answering Translation

Free gated samples available on each dataset. Request access to evaluate before purchasing.

Quality Standards

Licensing and Pricing

Sample

Free

Gated access. Evaluate data quality, schema, and documentation before committing.

Enterprise

Custom

Dedicated support, SLA, bespoke corpora, volume pricing.

Contact us for a quote: data@optitransfer.ch

Bank Transfer (SEPA/SWIFT)
TWINT (Swiss)
Crypto (BTC / ETH / SOL)
OptiTransfer Data

Swiss-registered | optitransfer.ch | data@optitransfer.ch