File size: 2,467 Bytes
b91be15 ac23463 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ---
license: mit
language:
- en
- fr
tags:
- legal
- french-law
- datasets
- rag
- vector-search
- compliance
- micro-entrepreneurs
pretty_name: SpadaLab
---
# SpadaLab
**French legal datasets, curated for AI builders.**
SpadaLab produces **vector-ready French legal datasets** for builders of legal AI assistants, RAG pipelines, and decision-support tools — with a focus on the realities of **micro-entrepreneurs** and **regulated professions**.
🇫🇷 *SpadaLab produit des datasets juridiques francais prets a vectoriser pour les developpeurs d'assistants IA, pipelines RAG et outils d'aide a la decision.*
---
## Why SpadaLab
French micro-entrepreneurs and small business owners face a wall of legal complexity (URSSAF, RGPD, accessibility, food safety, trade regulations…) — most of it scattered across Legifrance and EU CELLAR in formats that don't fit modern AI pipelines.
We curate, clean, and chunk this content into **production-ready datasets** that any team can drop into their vector store of choice (Qdrant, Weaviate, pgvector, etc.) — **no embedding lock-in**.
## What we ship
Four commercial packs (launching May 2026) :
| Pack | Coverage | Format |
|---|---|---|
| **Micro-Entrepreneur Complet** | 7 collections : CGI, LPF, Code commerce, Conso, Securite sociale, CNIL, RGPD | JSON / Parquet |
| **Artisanat Reglemente** | Code de l'artisanat + LODA related | JSON / Parquet |
| **HACCP & Hygiene Alimentaire** | EU Reg. 178/2002, 852/2004, 853/2004 + LODA | JSON / Parquet |
| **Accessibilite PMR** | LODA accessibility (ERP, batiments) | JSON / Parquet |
**Format** : pre-chunked text + metadata + Gebru-style datasheets. **No embeddings included by default** — you bring your own model (avoids OpenAI lock-in).
**Sample datasets** (CC BY-NC 4.0) and **gated full datasets** (custom commercial license) will be published here.
## How we work
- **Source-of-truth-only** : Legifrance PISTE Production API + EU CELLAR (not scraped websites)
- **Reproducible pipelines** : every dataset shipped with manifest, SHA-256 hashes, ingestion scripts
- **Versioned** : semantic versioning per dataset, transparent changelog
- **Local-first AI** : built using on-premise models (Ollama) where appropriate
## Stay in touch
- Email : `contact@spadalab.fr`
- Website : *coming soon*
- Datasets marketplace : also available on Datarade *(coming soon)*
---
*SpadaLab is a French micro-enterprise (SIREN 103 696 993) based in France.*
|