AI & ML interests

EU AI Act compliant sovereign web corpora for LLM training and RAG pipelines

Recent Activity

Organization Card

OptiTransfer Data

Premium web corpora for LLM pre-training, fine-tuning, RAG, and multilingual NLP. Swiss-registered. EU AI Act compliant. Quality-scored, PII-redacted, SHA256-verified.

Swiss-Registered EU AI Act Compliant SHA256 Verified PII Redacted

Capabilities

LLM Training

Sovereign national web corpora at scale for pre-training and supervised fine-tuning

RAG Pipelines

Pre-chunked, embedding-ready corpora with quality scores per chunk

Regulatory NLP

Domain-classified, jurisdiction-specific government and institutional data

Research

Reproducible datasets with full metadata, provenance tracking, and QA reports

Available Datasets

Dataset Records Formats Access
*.ch Swiss Web Premium (A+) 110,491 Parquet, JSONL, Language Splits, RAG Chunks Sample | Full

Flagship Swiss web corpus from the .ch ccTLD. 112.4M tokens across 78 fields. Multilingual coverage: German (61.2%), French (19.0%), English (10.5%), Italian (4.7%), and 25 additional languages. Nine-component quality model, full provenance chain, and independent QA report.

LLM Pre-Training Supervised Fine-Tuning (SFT) Retrieval-Augmented Generation Multilingual NLP German Language Models French Language Models Swiss Market AI EU AI Act Compliance Domain-Specific Training Web Corpus Research Text Classification Summarisation Question Answering Translation

Free gated samples available on each dataset. Request access to evaluate before purchasing.

Quality Standards

  • Independent QA audits with documented accuracy metrics
  • SHA-256 integrity verification on all production files
  • Quality scoring per record (0 to 100 scale, nine components)
  • Domain classification and language detection
  • EU AI Act compliance with full data provenance and licensing transparency
  • Content-level and URL-level deduplication
  • PII detection and redaction (email, phone, IBAN, AHV, credit card)
  • Croissant metadata for ML interoperability

Licensing and Pricing

Sample

Free

Gated access. Evaluate data quality, schema, and documentation before committing.

Enterprise

Custom

Dedicated support, SLA, bespoke corpora, volume pricing.

Contact us for a quote: data@optitransfer.ch

Bank Transfer (SEPA/SWIFT)
TWINT (Swiss)
Crypto (BTC / ETH / SOL)
OptiTransfer Data

Swiss-registered | optitransfer.ch | data@optitransfer.ch

models 0

None public yet