Spaces:
Configuration error
Configuration error
Fix Space config: add YAML frontmatter, corporate clean content
Browse files
README.md
CHANGED
|
@@ -1,4 +1,12 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
Premium web corpora for LLM pre-training, fine-tuning, RAG, and multilingual NLP.
|
| 4 |
|
|
@@ -6,7 +14,7 @@ Premium web corpora for LLM pre-training, fine-tuning, RAG, and multilingual NLP
|
|
| 6 |
|
| 7 |
## About
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
Every dataset ships with:
|
| 12 |
|
|
@@ -27,9 +35,7 @@ Every dataset ships with:
|
|
| 27 |
|
| 28 |
The flagship Swiss web corpus, extracted and quality-scored from the .ch ccTLD. Multilingual coverage across German (61.2%), French (19.0%), English (10.5%), Italian (4.7%), and additional languages. Nine-component quality model with full provenance chain.
|
| 29 |
|
| 30 |
-
**Best suited for:**
|
| 31 |
-
|
| 32 |
-
`LLM Pre-Training` `Supervised Fine-Tuning (SFT)` `Retrieval-Augmented Generation (RAG)` `Multilingual NLP` `German Language Models` `French Language Models` `Swiss Market AI` `Regulatory Compliance (EU AI Act)` `Domain-Specific Training` `Web Corpus Research` `Text Classification` `Summarisation` `Question Answering` `Translation`
|
| 33 |
|
| 34 |
**Formats:** Parquet (7 shards) | JSONL (7 shards) | Language Splits (DE, FR, EN, IT) | RAG Chunks (4 files)
|
| 35 |
|
|
@@ -73,7 +79,7 @@ QA reports are available in both the sample and full product repositories.
|
|
| 73 |
|
| 74 |
All datasets are available under the OptiTransfer Commercial License. Sample repositories provide gated evaluation access. Full datasets require a commercial license agreement.
|
| 75 |
|
| 76 |
-
**Payment methods:** Bank Transfer (SEPA/SWIFT)
|
| 77 |
|
| 78 |
For pricing, volume licensing, or custom extraction requests, contact [data@optitransfer.ch](mailto:data@optitransfer.ch).
|
| 79 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: OptiTransfer Data
|
| 3 |
+
sdk: static
|
| 4 |
+
pinned: false
|
| 5 |
+
colorFrom: gray
|
| 6 |
+
colorTo: blue
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# OptiTransfer Data
|
| 10 |
|
| 11 |
Premium web corpora for LLM pre-training, fine-tuning, RAG, and multilingual NLP.
|
| 12 |
|
|
|
|
| 14 |
|
| 15 |
## About
|
| 16 |
|
| 17 |
+
OptiTransfer Data is the data division of [OptiTransfer AG](https://optitransfer.ch), a Swiss-registered technology company. We produce compliance-ready, quality-scored web datasets for AI teams building in regulated markets.
|
| 18 |
|
| 19 |
Every dataset ships with:
|
| 20 |
|
|
|
|
| 35 |
|
| 36 |
The flagship Swiss web corpus, extracted and quality-scored from the .ch ccTLD. Multilingual coverage across German (61.2%), French (19.0%), English (10.5%), Italian (4.7%), and additional languages. Nine-component quality model with full provenance chain.
|
| 37 |
|
| 38 |
+
**Best suited for:** LLM Pre-Training, Supervised Fine-Tuning (SFT), Retrieval-Augmented Generation (RAG), Multilingual NLP, German Language Models, French Language Models, Swiss Market AI, Regulatory Compliance (EU AI Act), Domain-Specific Training, Web Corpus Research, Text Classification, Summarisation, Question Answering, Translation
|
|
|
|
|
|
|
| 39 |
|
| 40 |
**Formats:** Parquet (7 shards) | JSONL (7 shards) | Language Splits (DE, FR, EN, IT) | RAG Chunks (4 files)
|
| 41 |
|
|
|
|
| 79 |
|
| 80 |
All datasets are available under the OptiTransfer Commercial License. Sample repositories provide gated evaluation access. Full datasets require a commercial license agreement.
|
| 81 |
|
| 82 |
+
**Payment methods:** Bank Transfer (SEPA/SWIFT) | TWINT | Cryptocurrency (BTC / ETH / SOL)
|
| 83 |
|
| 84 |
For pricing, volume licensing, or custom extraction requests, contact [data@optitransfer.ch](mailto:data@optitransfer.ch).
|
| 85 |
|