| --- |
| license: mit |
| language: |
| - en |
| - fr |
| tags: |
| - legal |
| - french-law |
| - datasets |
| - rag |
| - vector-search |
| - compliance |
| - micro-entrepreneurs |
| pretty_name: SpadaLab |
| --- |
| |
| # SpadaLab |
|
|
| **French legal datasets, curated for AI builders.** |
|
|
| SpadaLab produces **vector-ready French legal datasets** for builders of legal AI assistants, RAG pipelines, and decision-support tools — with a focus on the realities of **micro-entrepreneurs** and **regulated professions**. |
|
|
| 🇫🇷 *SpadaLab produit des datasets juridiques francais prets a vectoriser pour les developpeurs d'assistants IA, pipelines RAG et outils d'aide a la decision.* |
|
|
| --- |
|
|
| ## Why SpadaLab |
|
|
| French micro-entrepreneurs and small business owners face a wall of legal complexity (URSSAF, RGPD, accessibility, food safety, trade regulations…) — most of it scattered across Legifrance and EU CELLAR in formats that don't fit modern AI pipelines. |
|
|
| We curate, clean, and chunk this content into **production-ready datasets** that any team can drop into their vector store of choice (Qdrant, Weaviate, pgvector, etc.) — **no embedding lock-in**. |
|
|
| ## What we ship |
|
|
| Four commercial packs (launching May 2026) : |
|
|
| | Pack | Coverage | Format | |
| |---|---|---| |
| | **Micro-Entrepreneur Complet** | 7 collections : CGI, LPF, Code commerce, Conso, Securite sociale, CNIL, RGPD | JSON / Parquet | |
| | **Artisanat Reglemente** | Code de l'artisanat + LODA related | JSON / Parquet | |
| | **HACCP & Hygiene Alimentaire** | EU Reg. 178/2002, 852/2004, 853/2004 + LODA | JSON / Parquet | |
| | **Accessibilite PMR** | LODA accessibility (ERP, batiments) | JSON / Parquet | |
|
|
| **Format** : pre-chunked text + metadata + Gebru-style datasheets. **No embeddings included by default** — you bring your own model (avoids OpenAI lock-in). |
|
|
| **Sample datasets** (CC BY-NC 4.0) and **gated full datasets** (custom commercial license) will be published here. |
|
|
| ## How we work |
|
|
| - **Source-of-truth-only** : Legifrance PISTE Production API + EU CELLAR (not scraped websites) |
| - **Reproducible pipelines** : every dataset shipped with manifest, SHA-256 hashes, ingestion scripts |
| - **Versioned** : semantic versioning per dataset, transparent changelog |
| - **Local-first AI** : built using on-premise models (Ollama) where appropriate |
|
|
| ## Stay in touch |
|
|
| - Email : `contact@spadalab.fr` |
| - Website : *coming soon* |
| - Datasets marketplace : also available on Datarade *(coming soon)* |
|
|
| --- |
|
|
| *SpadaLab is a French micro-enterprise (SIREN 103 696 993) based in France.* |
|
|