Optitransfer commited on
Commit
5ec5e37
ยท
verified ยท
1 Parent(s): f650406

Add organisation profile card

Browse files
Files changed (1) hide show
  1. README.md +69 -4
README.md CHANGED
@@ -1,10 +1,75 @@
1
  ---
2
  title: README
3
- emoji: ๐ŸŒ–
4
- colorFrom: pink
5
- colorTo: red
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: ๐Ÿ”๏ธ
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ <div style="max-width: 800px; margin: 0 auto;">
11
+
12
+ <h2>๐Ÿ”๏ธ OptiTransferData โ€” Sovereign AI Data for Europe</h2>
13
+
14
+ <p style="font-size: 1.1em; color: #555;">
15
+ Production-grade, EU AI Act compliant web corpora for LLM training, RAG pipelines, and NLP research. Curated in Switzerland ๐Ÿ‡จ๐Ÿ‡ญ with independent quality assurance.
16
+ </p>
17
+
18
+ ---
19
+
20
+ ### ๐ŸŽฏ What We Do
21
+
22
+ We build **gold-standard national web corpora** โ€” comprehensive, deduplicated, and quality-scored datasets covering entire country-level web domains. Each dataset is independently audited and delivered with full provenance tracking, SHA-256 integrity verification, and commercial licensing.
23
+
24
+ **Our focus areas:**
25
+ - ๐Ÿค– **LLM Pre-training & Fine-tuning** โ€” Sovereign language data at scale
26
+ - ๐Ÿ” **RAG Pipelines** โ€” Pre-chunked, embedding-ready corpora with quality scores
27
+ - ๐Ÿ›๏ธ **Government & Regulatory NLP** โ€” Domain-classified, jurisdiction-specific data
28
+ - ๐Ÿ“Š **Academic Research** โ€” Reproducible, well-documented datasets with full metadata
29
+
30
+ ---
31
+
32
+ ### ๐Ÿ“ฆ Available Datasets
33
+
34
+ | Dataset | Records | Coverage | Format |
35
+ |---|---|---|---|
36
+ | ๐Ÿ‡ฑ๐Ÿ‡ฎ [Liechtenstein Ultra Premium](https://huggingface.co/datasets/OptiTransferData/liechtenstein-ultra-premium-li) | 35,748 | Full `.li` domain | JSONL ยท 37 fields |
37
+ | ๐Ÿ‡ซ๐Ÿ‡ท [France Sovereign RAG Chunks](https://huggingface.co/datasets/OptiTransferData/france-sovereign-rag-chunks) | 348,829 | French government & institutional web | JSONL ยท 8 fields |
38
+
39
+ > **Free gated samples** available on each dataset โ€” request access to evaluate before purchasing.
40
+
41
+ **Coming soon:** ๐Ÿ‡ฉ๐Ÿ‡ช Germany ยท ๐Ÿ‡ฆ๐Ÿ‡น Austria ยท ๐Ÿ‡จ๐Ÿ‡ญ Switzerland ยท ๐Ÿ‡ฎ๐Ÿ‡น Italy ยท ๐Ÿ‡ช๐Ÿ‡ธ Spain
42
+
43
+ ---
44
+
45
+ ### โœ… Quality Standards
46
+
47
+ - ๐Ÿ“‹ **Independent QA audits** with documented accuracy metrics
48
+ - ๐Ÿ” **SHA-256 integrity verification** on all production files
49
+ - ๐Ÿ“Š **Quality scoring** per record (0โ€“100 scale)
50
+ - ๐Ÿท๏ธ **Domain classification** and language detection
51
+ - ๐Ÿ“œ **EU AI Act compliance** โ€” full data provenance and licensing transparency
52
+ - ๐Ÿงน **Deduplication** โ€” content-level and URL-level
53
+
54
+ ---
55
+
56
+ ### ๐Ÿ’ผ Licensing & Access
57
+
58
+ | Tier | Access |
59
+ |---|---|
60
+ | **Sample** | Free with gated access โ€” evaluate data quality |
61
+ | **Full Dataset** | Commercial licence โ€” complete production data |
62
+ | **Enterprise** | Custom pricing โ€” dedicated support, SLA, bespoke corpora |
63
+
64
+ ๐Ÿ“ง **Contact us for a quote:** [data@optitransfer.ch](mailto:data@optitransfer.ch)
65
+
66
+ **Payment methods:**
67
+ ๐Ÿฆ Bank Transfer (SEPA/SWIFT) ยท ๐Ÿ“ฑ TWINT (Swiss) ยท โ‚ฟ Crypto (BTC/ETH/SOL โ€” addresses on request)
68
+
69
+ ---
70
+
71
+ <p style="text-align: center; color: #888; font-size: 0.9em;">
72
+ ๐Ÿ”๏ธ Curated in Switzerland ยท <a href="https://optitransfer.ch">optitransfer.ch</a> ยท <a href="mailto:data@optitransfer.ch">data@optitransfer.ch</a>
73
+ </p>
74
+
75
+ </div>