File size: 2,467 Bytes
b91be15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac23463
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: mit
language:
- en
- fr
tags:
- legal
- french-law
- datasets
- rag
- vector-search
- compliance
- micro-entrepreneurs
pretty_name: SpadaLab
---

# SpadaLab

**French legal datasets, curated for AI builders.**

SpadaLab produces **vector-ready French legal datasets** for builders of legal AI assistants, RAG pipelines, and decision-support tools — with a focus on the realities of **micro-entrepreneurs** and **regulated professions**.

🇫🇷 *SpadaLab produit des datasets juridiques francais prets a vectoriser pour les developpeurs d'assistants IA, pipelines RAG et outils d'aide a la decision.*

---

## Why SpadaLab

French micro-entrepreneurs and small business owners face a wall of legal complexity (URSSAF, RGPD, accessibility, food safety, trade regulations…) — most of it scattered across Legifrance and EU CELLAR in formats that don't fit modern AI pipelines.

We curate, clean, and chunk this content into **production-ready datasets** that any team can drop into their vector store of choice (Qdrant, Weaviate, pgvector, etc.) — **no embedding lock-in**.

## What we ship

Four commercial packs (launching May 2026) :

| Pack | Coverage | Format |
|---|---|---|
| **Micro-Entrepreneur Complet** | 7 collections : CGI, LPF, Code commerce, Conso, Securite sociale, CNIL, RGPD | JSON / Parquet |
| **Artisanat Reglemente** | Code de l'artisanat + LODA related | JSON / Parquet |
| **HACCP & Hygiene Alimentaire** | EU Reg. 178/2002, 852/2004, 853/2004 + LODA | JSON / Parquet |
| **Accessibilite PMR** | LODA accessibility (ERP, batiments) | JSON / Parquet |

**Format** : pre-chunked text + metadata + Gebru-style datasheets. **No embeddings included by default** — you bring your own model (avoids OpenAI lock-in).

**Sample datasets** (CC BY-NC 4.0) and **gated full datasets** (custom commercial license) will be published here.

## How we work

- **Source-of-truth-only** : Legifrance PISTE Production API + EU CELLAR (not scraped websites)
- **Reproducible pipelines** : every dataset shipped with manifest, SHA-256 hashes, ingestion scripts
- **Versioned** : semantic versioning per dataset, transparent changelog
- **Local-first AI** : built using on-premise models (Ollama) where appropriate

## Stay in touch

- Email : `contact@spadalab.fr`
- Website : *coming soon*
- Datasets marketplace : also available on Datarade *(coming soon)*

---

*SpadaLab is a French micro-enterprise (SIREN 103 696 993) based in France.*