--- license: mit language: - en - fr tags: - legal - french-law - datasets - rag - vector-search - compliance - micro-entrepreneurs pretty_name: SpadaLab --- # SpadaLab **French legal datasets, curated for AI builders.** SpadaLab produces **vector-ready French legal datasets** for builders of legal AI assistants, RAG pipelines, and decision-support tools — with a focus on the realities of **micro-entrepreneurs** and **regulated professions**. 🇫🇷 *SpadaLab produit des datasets juridiques francais prets a vectoriser pour les developpeurs d'assistants IA, pipelines RAG et outils d'aide a la decision.* --- ## Why SpadaLab French micro-entrepreneurs and small business owners face a wall of legal complexity (URSSAF, RGPD, accessibility, food safety, trade regulations…) — most of it scattered across Legifrance and EU CELLAR in formats that don't fit modern AI pipelines. We curate, clean, and chunk this content into **production-ready datasets** that any team can drop into their vector store of choice (Qdrant, Weaviate, pgvector, etc.) — **no embedding lock-in**. ## What we ship Four commercial packs (launching May 2026) : | Pack | Coverage | Format | |---|---|---| | **Micro-Entrepreneur Complet** | 7 collections : CGI, LPF, Code commerce, Conso, Securite sociale, CNIL, RGPD | JSON / Parquet | | **Artisanat Reglemente** | Code de l'artisanat + LODA related | JSON / Parquet | | **HACCP & Hygiene Alimentaire** | EU Reg. 178/2002, 852/2004, 853/2004 + LODA | JSON / Parquet | | **Accessibilite PMR** | LODA accessibility (ERP, batiments) | JSON / Parquet | **Format** : pre-chunked text + metadata + Gebru-style datasheets. **No embeddings included by default** — you bring your own model (avoids OpenAI lock-in). **Sample datasets** (CC BY-NC 4.0) and **gated full datasets** (custom commercial license) will be published here. ## How we work - **Source-of-truth-only** : Legifrance PISTE Production API + EU CELLAR (not scraped websites) - **Reproducible pipelines** : every dataset shipped with manifest, SHA-256 hashes, ingestion scripts - **Versioned** : semantic versioning per dataset, transparent changelog - **Local-first AI** : built using on-premise models (Ollama) where appropriate ## Stay in touch - Email : `contact@spadalab.fr` - Website : *coming soon* - Datasets marketplace : also available on Datarade *(coming soon)* --- *SpadaLab is a French micro-enterprise (SIREN 103 696 993) based in France.*