Spaces:
Running
Running
Charles Grandjean commited on
Commit Β·
7414a53
1
Parent(s): d81d6f6
migrate data
Browse files- DEPLOYMENT.md +237 -0
- Dockerfile +2 -0
- data/lawyers.json +0 -202
- requirements.txt +1 -0
- scripts/download_knowledge_graph.py +118 -0
- startup.sh +16 -0
DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,237 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Guide de dΓ©ploiement - CyberLegalAI Knowledge Graph sur Hugging Face
|
| 2 |
+
|
| 3 |
+
## Vue d'ensemble
|
| 4 |
+
|
| 5 |
+
Ce guide explique comment dΓ©ployer CyberLegalAI en utilisant un dataset Hugging Face pour stocker le knowledge graph, libΓ©rant ainsi 842 Mo dans le repo principal Space.
|
| 6 |
+
|
| 7 |
+
## Avantages de cette solution
|
| 8 |
+
|
| 9 |
+
β
**842 Mo libΓ©rΓ©s** dans le repo Space (limite de 1 Go respectΓ©e)
|
| 10 |
+
β
**TΓ©lΓ©chargement intelligent** avec cache local (pas de re-download)
|
| 11 |
+
β
**Démarrage rapide** après premier téléchargement
|
| 12 |
+
β
**Multi-juridictions** supportΓ©es nativement
|
| 13 |
+
β
**MaintenabilitΓ©** facile d'ajouter de nouvelles juridictions
|
| 14 |
+
β
**Robustesse** donnΓ©es sauvegardΓ©es dans dataset sΓ©parΓ©
|
| 15 |
+
|
| 16 |
+
## Architecture
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
Hugging Face Space (Repo principal)
|
| 20 |
+
βββ Code application
|
| 21 |
+
βββ Configuration
|
| 22 |
+
|
| 23 |
+
Hugging Face Dataset (SΓ©parΓ©)
|
| 24 |
+
βββ data/rag_storage/
|
| 25 |
+
βββ romania/ (~267 Mo)
|
| 26 |
+
βββ bahrain/ (~575 Mo)
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
## Γtapes de dΓ©ploiement
|
| 30 |
+
|
| 31 |
+
### 1. PrΓ©requis
|
| 32 |
+
|
| 33 |
+
- Compte Hugging Face avec accès aux Spaces
|
| 34 |
+
- Token d'accès Hugging Face avec permissions de lecture sur les datasets
|
| 35 |
+
- CLI Hugging Face installΓ©e: `pip install huggingface-hub`
|
| 36 |
+
|
| 37 |
+
### 2. CrΓ©ation du Dataset Hugging Face
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
# CrΓ©er le dataset
|
| 41 |
+
huggingface-cli repo create Cyberlgl/CyberLegalAI-knowledge-graph --type dataset
|
| 42 |
+
|
| 43 |
+
# Uploader les donnΓ©es du knowledge graph
|
| 44 |
+
cd /Users/cgrdj/Documents/Code/Cyberlgl/CyberlegalAI
|
| 45 |
+
huggingface-cli upload Cyberlgl/CyberLegalAI-knowledge-graph data/rag_storage/
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
Vérifiez que le dataset est bien créé: https://huggingface.co/datasets/Cyberlgl/CyberLegalAI-knowledge-graph
|
| 49 |
+
|
| 50 |
+
### 3. Configuration du Space
|
| 51 |
+
|
| 52 |
+
Ajoutez les variables d'environnement suivantes dans votre Hugging Face Space:
|
| 53 |
+
|
| 54 |
+
```
|
| 55 |
+
HF_TOKEN=your_hf_token_here
|
| 56 |
+
JURISDICTIONS=romania,bahrain
|
| 57 |
+
HF_KNOWLEDGE_GRAPH_DATASET=Cyberlgl/CyberLegalAI-knowledge-graph
|
| 58 |
+
HF_HOME=/data/.huggingface
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### 4. Nettoyage du repo principal
|
| 62 |
+
|
| 63 |
+
Une fois les donnΓ©es transfΓ©rΓ©es vers le dataset:
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
# Supprimer les donnΓ©es du repo principal (conservez localement si nΓ©cessaire)
|
| 67 |
+
rm -rf data/rag_storage/
|
| 68 |
+
|
| 69 |
+
# Ajouter au .gitignore pour Γ©viter de re-ajouter ces fichiers
|
| 70 |
+
echo "data/rag_storage/" >> .gitignore
|
| 71 |
+
|
| 72 |
+
# Commiter les changements
|
| 73 |
+
git add .gitignore
|
| 74 |
+
git commit -m "Exclude knowledge graph from repo - now served from Hugging Face dataset"
|
| 75 |
+
git push
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
### 5. RedΓ©ploiement du Space
|
| 79 |
+
|
| 80 |
+
Le Space va automatiquement:
|
| 81 |
+
1. TΓ©lΓ©charger le knowledge graph depuis le dataset
|
| 82 |
+
2. Le mettre en cache dans `/data/.huggingface`
|
| 83 |
+
3. Copier les fichiers vers `data/rag_storage/`
|
| 84 |
+
4. DΓ©marrer les serveurs LightRAG et l'API
|
| 85 |
+
|
| 86 |
+
## VΓ©rification du dΓ©ploiement
|
| 87 |
+
|
| 88 |
+
### Logs de dΓ©marrage
|
| 89 |
+
|
| 90 |
+
Vous devriez voir dans les logs du Space:
|
| 91 |
+
|
| 92 |
+
```
|
| 93 |
+
π₯ Checking for knowledge graph data...
|
| 94 |
+
π Knowledge graph not found, downloading from Hugging Face...
|
| 95 |
+
================================================================================
|
| 96 |
+
π Starting Knowledge Graph Download
|
| 97 |
+
================================================================================
|
| 98 |
+
π¦ Dataset: Cyberlgl/CyberLegalAI-knowledge-graph
|
| 99 |
+
π Jurisdictions: romania, bahrain
|
| 100 |
+
πΎ HF Cache: /data/.huggingface
|
| 101 |
+
π Target Directory: data/rag_storage
|
| 102 |
+
================================================================================
|
| 103 |
+
|
| 104 |
+
π₯ Processing jurisdiction: romania
|
| 105 |
+
...
|
| 106 |
+
β
romania: 18 files copied (267.0 MB)
|
| 107 |
+
|
| 108 |
+
π₯ Processing jurisdiction: bahrain
|
| 109 |
+
...
|
| 110 |
+
β
bahrain: 18 files copied (575.0 MB)
|
| 111 |
+
|
| 112 |
+
================================================================================
|
| 113 |
+
π Knowledge Graph Download Complete!
|
| 114 |
+
================================================================================
|
| 115 |
+
π romania: 267.0 MB
|
| 116 |
+
π bahrain: 575.0 MB
|
| 117 |
+
|
| 118 |
+
πΎ Total size: 842.0 MB
|
| 119 |
+
================================================================================
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
### RedΓ©marrage ultΓ©rieur
|
| 123 |
+
|
| 124 |
+
Aux redΓ©marrages suivants, vous verrez:
|
| 125 |
+
|
| 126 |
+
```
|
| 127 |
+
π₯ Checking for knowledge graph data...
|
| 128 |
+
β
Knowledge graph data already present, skipping download
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
## Maintenance
|
| 132 |
+
|
| 133 |
+
### Mettre Γ jour le knowledge graph
|
| 134 |
+
|
| 135 |
+
1. Mettez Γ jour les donnΓ©es localement
|
| 136 |
+
2. Uploadez les modifications vers le dataset Hugging Face:
|
| 137 |
+
|
| 138 |
+
```bash
|
| 139 |
+
huggingface-cli upload Cyberlgl/CyberLegalAI-knowledge-graph data/rag_storage/ --repo-type dataset
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
3. RedΓ©marrez le Space pour appliquer les modifications
|
| 143 |
+
|
| 144 |
+
### Ajouter une nouvelle juridiction
|
| 145 |
+
|
| 146 |
+
1. Ajoutez les donnΓ©es dans `data/rag_storage/nouvelle_juridiction/`
|
| 147 |
+
2. Uploadez vers le dataset Hugging Face:
|
| 148 |
+
|
| 149 |
+
```bash
|
| 150 |
+
huggingface-cli upload Cyberlgl/CyberLegalAI-knowledge-graph data/rag_storage/nouvelle_juridiction/ --repo-type dataset
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
3. Mettez Γ jour la variable `JURISDICTIONS` dans le Space:
|
| 154 |
+
```
|
| 155 |
+
JURISDICTIONS=romania,bahrain,nouvelle_juridiction
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
4. RedΓ©marrez le Space
|
| 159 |
+
|
| 160 |
+
## DΓ©pannage
|
| 161 |
+
|
| 162 |
+
### Erreur "Dataset not found"
|
| 163 |
+
|
| 164 |
+
**SymptΓ΄me:** `Repo not found: Cyberlgl/CyberLegalAI-knowledge-graph`
|
| 165 |
+
|
| 166 |
+
**Solution:**
|
| 167 |
+
- VΓ©rifiez que le dataset existe: https://huggingface.co/datasets/Cyberlgl/CyberLegalAI-knowledge-graph
|
| 168 |
+
- VΓ©rifiez que l'ID du dataset dans `HF_KNOWLEDGE_GRAPH_DATASET` est correct
|
| 169 |
+
|
| 170 |
+
### Erreur "Invalid token"
|
| 171 |
+
|
| 172 |
+
**SymptΓ΄me:** `Invalid token passed`
|
| 173 |
+
|
| 174 |
+
**Solution:**
|
| 175 |
+
- VΓ©rifiez que `HF_TOKEN` est correctement configurΓ© dans le Space
|
| 176 |
+
- CrΓ©ez un nouveau token avec les permissions de lecture (read) sur les datasets: https://huggingface.co/settings/tokens
|
| 177 |
+
|
| 178 |
+
### TΓ©lΓ©chargement lent
|
| 179 |
+
|
| 180 |
+
**SymptΓ΄me:** Le tΓ©lΓ©chargement prend beaucoup de temps
|
| 181 |
+
|
| 182 |
+
**Solution:**
|
| 183 |
+
- Le premier tΓ©lΓ©chargement peut prendre plusieurs minutes pour 842 Mo
|
| 184 |
+
- Les tΓ©lΓ©chargements suivants seront instantanΓ©s grΓ’ce au cache
|
| 185 |
+
- VΓ©rifiez que les permissions rΓ©seau du Space sont correctes
|
| 186 |
+
|
| 187 |
+
### Erreur "Permission denied" lors du tΓ©lΓ©chargement
|
| 188 |
+
|
| 189 |
+
**SymptΓ΄me:** `PermissionError: [Errno 13] Permission denied: '/data/.huggingface'`
|
| 190 |
+
|
| 191 |
+
**Solution:**
|
| 192 |
+
- Le script devrait crΓ©er automatiquement le rΓ©pertoire avec les bonnes permissions
|
| 193 |
+
- Si l'erreur persiste, vΓ©rifiez les permissions dans le Dockerfile
|
| 194 |
+
|
| 195 |
+
## Variables d'environnement
|
| 196 |
+
|
| 197 |
+
| Variable | Description | DΓ©faut |
|
| 198 |
+
|----------|-------------|--------|
|
| 199 |
+
| `HF_TOKEN` | Token d'accès Hugging Face | (requis) |
|
| 200 |
+
| `JURISDICTIONS` | Liste des juridictions Γ tΓ©lΓ©charger (sΓ©parΓ©es par virgules) | `romania,bahrain` |
|
| 201 |
+
| `HF_KNOWLEDGE_GRAPH_DATASET` | ID du dataset Hugging Face | `Cyberlgl/CyberLegalAI-knowledge-graph` |
|
| 202 |
+
| `HF_HOME` | RΓ©pertoire de cache Hugging Face | `/data/.huggingface` |
|
| 203 |
+
|
| 204 |
+
## Scripts inclus
|
| 205 |
+
|
| 206 |
+
### `scripts/download_knowledge_graph.py`
|
| 207 |
+
|
| 208 |
+
Script principal gΓ©rant le tΓ©lΓ©chargement du knowledge graph.
|
| 209 |
+
|
| 210 |
+
**FonctionnalitΓ©s:**
|
| 211 |
+
- TΓ©lΓ©chargement automatique depuis le dataset Hugging Face
|
| 212 |
+
- Support du cache persistant pour Γ©viter les re-tΓ©lΓ©chargements
|
| 213 |
+
- TΓ©lΓ©chargement sΓ©lectif par juridiction
|
| 214 |
+
- Logs dΓ©taillΓ©s du processus de tΓ©lΓ©chargement
|
| 215 |
+
- Copie des fichiers vers le rΓ©pertoire d'application
|
| 216 |
+
|
| 217 |
+
Pour plus de dΓ©tails, voir: `scripts/README.md`
|
| 218 |
+
|
| 219 |
+
## Support
|
| 220 |
+
|
| 221 |
+
Pour toute question ou problème:
|
| 222 |
+
1. Consultez `scripts/README.md` pour les dΓ©tails des scripts
|
| 223 |
+
2. Consultez les logs du Space pour les erreurs spΓ©cifiques
|
| 224 |
+
3. Ouvrez une issue sur GitHub: https://github.com/Cgrandjean/CyberLegalAI
|
| 225 |
+
|
| 226 |
+
## Migration depuis l'ancien système
|
| 227 |
+
|
| 228 |
+
Si vous migrez depuis un système où les données étaient dans le repo principal:
|
| 229 |
+
|
| 230 |
+
1. **Sauvegardez localement** les donnΓ©es existantes
|
| 231 |
+
2. **CrΓ©ez le dataset** (Γ©tape 2)
|
| 232 |
+
3. **Uploadez les donnΓ©es** (Γ©tape 2)
|
| 233 |
+
4. **Configurez le Space** (Γ©tape 3)
|
| 234 |
+
5. **Nettoyez le repo** (Γ©tape 4)
|
| 235 |
+
6. **RedΓ©ployez** (Γ©tape 5)
|
| 236 |
+
|
| 237 |
+
Les données resteront accessibles et le système fonctionnera comme avant, mais avec les avantages du nouveau système.
|
Dockerfile
CHANGED
|
@@ -11,6 +11,8 @@ ENV PYTHONIOENCODING=utf-8
|
|
| 11 |
ENV LIGHTRAG_HOST=127.0.0.1
|
| 12 |
ENV LIGHTRAG_PORT=9621
|
| 13 |
ENV API_PORT=8000
|
|
|
|
|
|
|
| 14 |
|
| 15 |
# Install system dependencies
|
| 16 |
RUN apt-get update && apt-get install -y \
|
|
|
|
| 11 |
ENV LIGHTRAG_HOST=127.0.0.1
|
| 12 |
ENV LIGHTRAG_PORT=9621
|
| 13 |
ENV API_PORT=8000
|
| 14 |
+
ENV HF_HOME=/data/.huggingface
|
| 15 |
+
ENV JURISDICTIONS=romania,bahrain
|
| 16 |
|
| 17 |
# Install system dependencies
|
| 18 |
RUN apt-get update && apt-get install -y \
|
data/lawyers.json
DELETED
|
@@ -1,202 +0,0 @@
|
|
| 1 |
-
[
|
| 2 |
-
{
|
| 3 |
-
"name": "Nader Bakri",
|
| 4 |
-
"experience_years": 8,
|
| 5 |
-
"specialty": "Cyber Law",
|
| 6 |
-
"presentation": "Experienced lawyer focusing on complex legal matters at the intersection of technology, business, and regulatory compliance. Provides practical and solution-oriented legal advice tailored to modern digital challenges.",
|
| 7 |
-
"areas_of_practice": [
|
| 8 |
-
"Criminal Law",
|
| 9 |
-
"Commercial Law",
|
| 10 |
-
"Civil Law",
|
| 11 |
-
"Administrative Law",
|
| 12 |
-
"Family Law",
|
| 13 |
-
"Cyber Law",
|
| 14 |
-
"IT Law",
|
| 15 |
-
"AI Law",
|
| 16 |
-
"Data Protection"
|
| 17 |
-
]
|
| 18 |
-
},
|
| 19 |
-
{
|
| 20 |
-
"name": "Andrei Popescu",
|
| 21 |
-
"experience_years": 12,
|
| 22 |
-
"specialty": "Commercial & Corporate Law",
|
| 23 |
-
"presentation": "Seasoned legal professional with extensive experience advising companies on corporate governance, contracts, and commercial disputes at both national and international levels.",
|
| 24 |
-
"areas_of_practice": [
|
| 25 |
-
"Commercial Law",
|
| 26 |
-
"Corporate Law",
|
| 27 |
-
"Civil Law",
|
| 28 |
-
"Contract Law",
|
| 29 |
-
"Commercial Litigation",
|
| 30 |
-
"Arbitration"
|
| 31 |
-
]
|
| 32 |
-
},
|
| 33 |
-
{
|
| 34 |
-
"name": "Maria Ionescu",
|
| 35 |
-
"experience_years": 9,
|
| 36 |
-
"specialty": "Data Protection & Privacy Law",
|
| 37 |
-
"presentation": "Specialized in data protection and privacy compliance, assisting organizations in aligning their operations with GDPR and international data protection standards.",
|
| 38 |
-
"areas_of_practice": [
|
| 39 |
-
"Data Protection",
|
| 40 |
-
"GDPR",
|
| 41 |
-
"Cyber Law",
|
| 42 |
-
"IT Law",
|
| 43 |
-
"Civil Law",
|
| 44 |
-
"Commercial Law",
|
| 45 |
-
"Compliance"
|
| 46 |
-
]
|
| 47 |
-
},
|
| 48 |
-
{
|
| 49 |
-
"name": "Karim Al-Hassan",
|
| 50 |
-
"experience_years": 15,
|
| 51 |
-
"specialty": "International Business Law",
|
| 52 |
-
"presentation": "International lawyer advising multinational clients on cross-border transactions, regulatory frameworks, and international commercial contracts.",
|
| 53 |
-
"areas_of_practice": [
|
| 54 |
-
"International Commercial Law",
|
| 55 |
-
"Civil Law",
|
| 56 |
-
"Contract Law",
|
| 57 |
-
"Arbitration",
|
| 58 |
-
"Customs Law"
|
| 59 |
-
]
|
| 60 |
-
},
|
| 61 |
-
{
|
| 62 |
-
"name": "Elena Radu",
|
| 63 |
-
"experience_years": 7,
|
| 64 |
-
"specialty": "Civil & Family Law",
|
| 65 |
-
"presentation": "Dedicated legal professional providing assistance in sensitive civil and family matters, with a strong focus on ethics, discretion, and client trust.",
|
| 66 |
-
"areas_of_practice": [
|
| 67 |
-
"Civil Law",
|
| 68 |
-
"Family Law",
|
| 69 |
-
"Matrimonial Law",
|
| 70 |
-
"Inheritance Law",
|
| 71 |
-
"Litigation"
|
| 72 |
-
]
|
| 73 |
-
},
|
| 74 |
-
{
|
| 75 |
-
"name": "Victor Marinescu",
|
| 76 |
-
"experience_years": 14,
|
| 77 |
-
"specialty": "Criminal Law",
|
| 78 |
-
"presentation": "Experienced criminal defense lawyer representing clients in complex investigations and high-stakes criminal proceedings.",
|
| 79 |
-
"areas_of_practice": [
|
| 80 |
-
"Criminal Law",
|
| 81 |
-
"Criminal Procedure",
|
| 82 |
-
"Related Civil Claims",
|
| 83 |
-
"Litigation"
|
| 84 |
-
]
|
| 85 |
-
},
|
| 86 |
-
{
|
| 87 |
-
"name": "Sophia Klein",
|
| 88 |
-
"experience_years": 10,
|
| 89 |
-
"specialty": "IT & Technology Law",
|
| 90 |
-
"presentation": "Technology-focused legal advisor assisting startups and technology companies with contracts, compliance, and intellectual property matters.",
|
| 91 |
-
"areas_of_practice": [
|
| 92 |
-
"IT Law",
|
| 93 |
-
"Cyber Law",
|
| 94 |
-
"AI Law",
|
| 95 |
-
"Intellectual Property Law",
|
| 96 |
-
"Commercial Law"
|
| 97 |
-
]
|
| 98 |
-
},
|
| 99 |
-
{
|
| 100 |
-
"name": "Mihai Dumitrescu",
|
| 101 |
-
"experience_years": 18,
|
| 102 |
-
"specialty": "Administrative & Public Law",
|
| 103 |
-
"presentation": "Legal expert in administrative disputes and public procurement, representing both private entities and public authorities.",
|
| 104 |
-
"areas_of_practice": [
|
| 105 |
-
"Administrative Law",
|
| 106 |
-
"Public Law",
|
| 107 |
-
"Public Procurement",
|
| 108 |
-
"Administrative Litigation",
|
| 109 |
-
"Constitutional Law"
|
| 110 |
-
]
|
| 111 |
-
},
|
| 112 |
-
{
|
| 113 |
-
"name": "Laura Petrescu",
|
| 114 |
-
"experience_years": 6,
|
| 115 |
-
"specialty": "Employment & Labor Law",
|
| 116 |
-
"presentation": "Advises employers and employees on labor relations, regulatory compliance, and employment-related dispute resolution.",
|
| 117 |
-
"areas_of_practice": [
|
| 118 |
-
"Employment Law",
|
| 119 |
-
"Labor Law",
|
| 120 |
-
"Civil Law",
|
| 121 |
-
"Employment Litigation",
|
| 122 |
-
"Compliance"
|
| 123 |
-
]
|
| 124 |
-
},
|
| 125 |
-
{
|
| 126 |
-
"name": "Omar Khaled",
|
| 127 |
-
"experience_years": 11,
|
| 128 |
-
"specialty": "Cybercrime & Digital Evidence",
|
| 129 |
-
"presentation": "Specialist in cybercrime cases and digital investigations, with strong expertise in electronic evidence and forensic collaboration.",
|
| 130 |
-
"areas_of_practice": [
|
| 131 |
-
"Cyber Law",
|
| 132 |
-
"Criminal Law",
|
| 133 |
-
"Cybercrime",
|
| 134 |
-
"IT Law",
|
| 135 |
-
"Digital Evidence"
|
| 136 |
-
]
|
| 137 |
-
},
|
| 138 |
-
{
|
| 139 |
-
"name": "Ana-Maria Stoica",
|
| 140 |
-
"experience_years": 13,
|
| 141 |
-
"specialty": "Intellectual Property Law",
|
| 142 |
-
"presentation": "Provides strategic legal protection for brands, software, and creative works in both domestic and international markets.",
|
| 143 |
-
"areas_of_practice": [
|
| 144 |
-
"Intellectual Property Law",
|
| 145 |
-
"Commercial Law",
|
| 146 |
-
"IT Law",
|
| 147 |
-
"Copyright",
|
| 148 |
-
"Trademarks"
|
| 149 |
-
]
|
| 150 |
-
},
|
| 151 |
-
{
|
| 152 |
-
"name": "Daniel Weiss",
|
| 153 |
-
"experience_years": 16,
|
| 154 |
-
"specialty": "Arbitration & Litigation",
|
| 155 |
-
"presentation": "Experienced litigator representing clients in complex commercial disputes before courts and arbitral tribunals.",
|
| 156 |
-
"areas_of_practice": [
|
| 157 |
-
"Litigation",
|
| 158 |
-
"Arbitration",
|
| 159 |
-
"Commercial Law",
|
| 160 |
-
"Civil Law",
|
| 161 |
-
"Private International Law"
|
| 162 |
-
]
|
| 163 |
-
},
|
| 164 |
-
{
|
| 165 |
-
"name": "Raluca Neagu",
|
| 166 |
-
"experience_years": 8,
|
| 167 |
-
"specialty": "Compliance & Regulatory Law",
|
| 168 |
-
"presentation": "Advises companies on regulatory compliance, internal governance, and risk management frameworks.",
|
| 169 |
-
"areas_of_practice": [
|
| 170 |
-
"Compliance",
|
| 171 |
-
"Regulatory Law",
|
| 172 |
-
"Commercial Law",
|
| 173 |
-
"Administrative Law",
|
| 174 |
-
"Data Protection"
|
| 175 |
-
]
|
| 176 |
-
},
|
| 177 |
-
{
|
| 178 |
-
"name": "Hassan Farouk",
|
| 179 |
-
"experience_years": 20,
|
| 180 |
-
"specialty": "Banking & Financial Law",
|
| 181 |
-
"presentation": "Senior legal advisor with extensive experience in banking regulation, financial transactions, and risk mitigation.",
|
| 182 |
-
"areas_of_practice": [
|
| 183 |
-
"Banking Law",
|
| 184 |
-
"Financial Law",
|
| 185 |
-
"Compliance",
|
| 186 |
-
"Commercial Law"
|
| 187 |
-
]
|
| 188 |
-
},
|
| 189 |
-
{
|
| 190 |
-
"name": "Ioana Vasilescu",
|
| 191 |
-
"experience_years": 5,
|
| 192 |
-
"specialty": "AI & Emerging Technologies Law",
|
| 193 |
-
"presentation": "Focused on legal challenges related to artificial intelligence, automation, and emerging technologies, supporting innovation-driven organizations.",
|
| 194 |
-
"areas_of_practice": [
|
| 195 |
-
"AI Law",
|
| 196 |
-
"Cyber Law",
|
| 197 |
-
"IT Law",
|
| 198 |
-
"Data Protection",
|
| 199 |
-
"Commercial Law"
|
| 200 |
-
]
|
| 201 |
-
}
|
| 202 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
|
@@ -26,3 +26,4 @@ langchain-tavily>=0.2.16
|
|
| 26 |
resend>=0.8.0
|
| 27 |
beautifulsoup4>=4.12.0
|
| 28 |
httpx>=0.24.0
|
|
|
|
|
|
| 26 |
resend>=0.8.0
|
| 27 |
beautifulsoup4>=4.12.0
|
| 28 |
httpx>=0.24.0
|
| 29 |
+
huggingface-hub>=0.20.0
|
scripts/download_knowledge_graph.py
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Download knowledge graph from Hugging Face dataset at container startup
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
import shutil
|
| 7 |
+
import logging
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
from huggingface_hub import snapshot_download
|
| 10 |
+
from dotenv import load_dotenv
|
| 11 |
+
|
| 12 |
+
# Load environment variables
|
| 13 |
+
load_dotenv(dotenv_path=".env", override=False)
|
| 14 |
+
|
| 15 |
+
# Configure logging
|
| 16 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def download_knowledge_graph():
|
| 21 |
+
"""
|
| 22 |
+
Download knowledge graph from Hugging Face dataset and copy to application directory
|
| 23 |
+
"""
|
| 24 |
+
# Configure Hugging Face cache to use persistent storage
|
| 25 |
+
hf_home = "/data/.huggingface"
|
| 26 |
+
os.environ["HF_HOME"] = hf_home
|
| 27 |
+
os.makedirs(hf_home, exist_ok=True)
|
| 28 |
+
|
| 29 |
+
# Get jurisdictions to download
|
| 30 |
+
jurisdictions_str = os.getenv("JURISDICTIONS", "romania,bahrain")
|
| 31 |
+
jurisdictions = [j.strip() for j in jurisdictions_str.split(",")]
|
| 32 |
+
|
| 33 |
+
# Dataset configuration
|
| 34 |
+
dataset_id = os.getenv("HF_KNOWLEDGE_GRAPH_DATASET", "Cyberlgl/CyberLegalAI-knowledge-graph")
|
| 35 |
+
hf_token = os.getenv("HF_TOKEN")
|
| 36 |
+
|
| 37 |
+
# Target directory
|
| 38 |
+
target_base_dir = "data/rag_storage"
|
| 39 |
+
os.makedirs(target_base_dir, exist_ok=True)
|
| 40 |
+
|
| 41 |
+
logger.info("=" * 80)
|
| 42 |
+
logger.info("π Starting Knowledge Graph Download")
|
| 43 |
+
logger.info("=" * 80)
|
| 44 |
+
logger.info(f"π¦ Dataset: {dataset_id}")
|
| 45 |
+
logger.info(f"π Jurisdictions: {', '.join(jurisdictions)}")
|
| 46 |
+
logger.info(f"πΎ HF Cache: {hf_home}")
|
| 47 |
+
logger.info(f"π Target Directory: {target_base_dir}")
|
| 48 |
+
logger.info("=" * 80)
|
| 49 |
+
|
| 50 |
+
try:
|
| 51 |
+
for jurisdiction in jurisdictions:
|
| 52 |
+
logger.info(f"\nπ₯ Processing jurisdiction: {jurisdiction}")
|
| 53 |
+
|
| 54 |
+
# Download from dataset with filtering
|
| 55 |
+
local_path = snapshot_download(
|
| 56 |
+
repo_id=dataset_id,
|
| 57 |
+
repo_type="dataset",
|
| 58 |
+
allow_patterns=[f"{jurisdiction}/*"],
|
| 59 |
+
cache_dir=hf_home,
|
| 60 |
+
token=hf_token
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
logger.info(f"β
Downloaded to cache: {local_path}")
|
| 64 |
+
|
| 65 |
+
# Copy to application directory
|
| 66 |
+
dest_dir = os.path.join(target_base_dir, jurisdiction)
|
| 67 |
+
os.makedirs(dest_dir, exist_ok=True)
|
| 68 |
+
|
| 69 |
+
src_dir = os.path.join(local_path, jurisdiction)
|
| 70 |
+
|
| 71 |
+
if os.path.exists(src_dir):
|
| 72 |
+
files_copied = 0
|
| 73 |
+
total_size = 0
|
| 74 |
+
|
| 75 |
+
for file in os.listdir(src_dir):
|
| 76 |
+
src_file = os.path.join(src_dir, file)
|
| 77 |
+
dest_file = os.path.join(dest_dir, file)
|
| 78 |
+
|
| 79 |
+
# Copy file
|
| 80 |
+
shutil.copy2(src_file, dest_file)
|
| 81 |
+
file_size = os.path.getsize(dest_file)
|
| 82 |
+
total_size += file_size
|
| 83 |
+
files_copied += 1
|
| 84 |
+
|
| 85 |
+
logger.info(f"π Copied: {file} ({file_size / (1024*1024):.1f} MB)")
|
| 86 |
+
|
| 87 |
+
logger.info(f"β
{jurisdiction}: {files_copied} files copied ({total_size / (1024*1024):.1f} MB)")
|
| 88 |
+
else:
|
| 89 |
+
logger.warning(f"β οΈ Jurisdiction directory not found in dataset: {src_dir}")
|
| 90 |
+
|
| 91 |
+
logger.info("\n" + "=" * 80)
|
| 92 |
+
logger.info("π Knowledge Graph Download Complete!")
|
| 93 |
+
logger.info("=" * 80)
|
| 94 |
+
|
| 95 |
+
# Print summary
|
| 96 |
+
total_size = 0
|
| 97 |
+
for jurisdiction in jurisdictions:
|
| 98 |
+
jur_dir = os.path.join(target_base_dir, jurisdiction)
|
| 99 |
+
if os.path.exists(jur_dir):
|
| 100 |
+
jur_size = sum(os.path.getsize(os.path.join(jur_dir, f)) for f in os.listdir(jur_dir))
|
| 101 |
+
total_size += jur_size
|
| 102 |
+
logger.info(f"π {jurisdiction}: {jur_size / (1024*1024):.1f} MB")
|
| 103 |
+
|
| 104 |
+
logger.info(f"\nπΎ Total size: {total_size / (1024*1024):.1f} MB")
|
| 105 |
+
logger.info("=" * 80)
|
| 106 |
+
|
| 107 |
+
return True
|
| 108 |
+
|
| 109 |
+
except Exception as e:
|
| 110 |
+
logger.error("\n" + "=" * 80)
|
| 111 |
+
logger.error(f"β Error downloading knowledge graph: {e}")
|
| 112 |
+
logger.error("=" * 80)
|
| 113 |
+
return False
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
if __name__ == "__main__":
|
| 117 |
+
success = download_knowledge_graph()
|
| 118 |
+
exit(0 if success else 1)
|
startup.sh
CHANGED
|
@@ -1,6 +1,22 @@
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -euo pipefail
|
| 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
HOST="${LIGHTRAG_HOST:-127.0.0.1}"
|
| 5 |
ROOT="${LIGHTRAG_STORAGE_ROOT:-data/rag_storage}"
|
| 6 |
GRAPHS="${LIGHTRAG_GRAPHS:-romania:9621}"
|
|
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -euo pipefail
|
| 3 |
|
| 4 |
+
# Step 1: Download knowledge graph from Hugging Face
|
| 5 |
+
echo "π₯ Checking for knowledge graph data..."
|
| 6 |
+
if [ ! -d "data/rag_storage/romania" ] || [ ! -d "data/rag_storage/bahrain" ]; then
|
| 7 |
+
echo "π Knowledge graph not found, downloading from Hugging Face..."
|
| 8 |
+
python scripts/download_knowledge_graph.py
|
| 9 |
+
if [ $? -ne 0 ]; then
|
| 10 |
+
echo "β Failed to download knowledge graph. Exiting."
|
| 11 |
+
exit 1
|
| 12 |
+
fi
|
| 13 |
+
echo "β
Knowledge graph download complete"
|
| 14 |
+
else
|
| 15 |
+
echo "β
Knowledge graph data already present, skipping download"
|
| 16 |
+
fi
|
| 17 |
+
|
| 18 |
+
echo ""
|
| 19 |
+
|
| 20 |
HOST="${LIGHTRAG_HOST:-127.0.0.1}"
|
| 21 |
ROOT="${LIGHTRAG_STORAGE_ROOT:-data/rag_storage}"
|
| 22 |
GRAPHS="${LIGHTRAG_GRAPHS:-romania:9621}"
|