Spaces:

Ba7ath-Project
/

ahlya

Running

App Files Files Community

Ba7ath-Project commited on Feb 27

Commit

7f18aa9

0 Parent(s):

Déploiement HF sans base de données, pour de vrai

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +50 -0
README.md +70 -0
backend/.gitignore +18 -0
backend/.python-version +1 -0
backend/ahlya_vs_trovit_fuzzy.py +181 -0
backend/app.py +8 -0
backend/app/api/enrichment.py +529 -0
backend/app/api/v1/auth.py +88 -0
backend/app/api/v1/companies.py +46 -0
backend/app/api/v1/investigate.py +181 -0
backend/app/api/v1/meta.py +28 -0
backend/app/api/v1/risk.py +14 -0
backend/app/api/v1/stats.py +13 -0
backend/app/data/companies.json +0 -0
backend/app/data/stats.json +45 -0
backend/app/database.py +24 -0
backend/app/main.py +91 -0
backend/app/models/enrichment_models.py +77 -0
backend/app/models/schemas.py +74 -0
backend/app/models/user_models.py +12 -0
backend/app/schemas/auth_schemas.py +28 -0
backend/app/services/aggregation.py +71 -0
backend/app/services/auth_service.py +74 -0
backend/app/services/data_loader.py +216 -0
backend/app/services/llm_service.py +201 -0
backend/app/services/osint_links.py +32 -0
backend/app/services/risk_engine.py +168 -0
backend/compare_by_name_fuzzy.py +162 -0
backend/compare_data.py +90 -0
backend/compare_names_with_qwen.py +185 -0
backend/create_admin.py +44 -0
backend/enrich_not_in_trovit.py +71 -0
backend/inspect_db.py +46 -0
backend/readme.md +12 -0
backend/test_auth_flow.py +52 -0
docs/API_Reference.md +103 -0
docs/Authentication_Guide.md +58 -0
docs/Contributing_Guide.md +40 -0
docs/Database_Schema.md +81 -0
docs/Deployment_Guide.md +41 -0
docs/Development_Guide.md +78 -0
docs/Frontend_Architecture.md +59 -0
docs/OSINT_Methodology.md +42 -0
docs/README.md +104 -0
docs/Troubleshooting.md +49 -0
index.html +34 -0
package-lock.json +0 -0
package.json +54 -0
postcss.config.js +6 -0
project_tree.py +16 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,50 @@

+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+# dependencies
+/node_modules
+/.pnp
+.pnp.js
+# testing
+/coverage
+# production
+/build
+# misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+# --- Python backend / Ba7ath ---
+# Environnements virtuels
+venv/
+.env/
+.env.*
+.env.*
+.env
+# Bytecode / cache
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+# Bases et données locales
+# *.db
+*.sqlite3
+instance/
+# Logs
+*.log
+logs/
+.vercel
+backend/.env
+backend/bulk_test.py

README.md ADDED Viewed

	@@ -0,0 +1,70 @@

+# Getting Started with Create React App
+This project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app).
+## Available Scripts
+In the project directory, you can run:
+### `npm start`
+Runs the app in the development mode.\
+Open [http://localhost:3000](http://localhost:3000) to view it in your browser.
+The page will reload when you make changes.\
+You may also see any lint errors in the console.
+### `npm test`
+Launches the test runner in the interactive watch mode.\
+See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
+### `npm run build`
+Builds the app for production to the `build` folder.\
+It correctly bundles React in production mode and optimizes the build for the best performance.
+The build is minified and the filenames include the hashes.\
+Your app is ready to be deployed!
+See the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information.
+### `npm run eject`
+**Note: this is a one-way operation. Once you `eject`, you can't go back!**
+If you aren't satisfied with the build tool and configuration choices, you can `eject` at any time. This command will remove the single build dependency from your project.
+Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except `eject` will still work, but they will point to the copied scripts so you can tweak them. At this point you're on your own.
+You don't have to ever use `eject`. The curated feature set is suitable for small and middle deployments, and you shouldn't feel obligated to use this feature. However we understand that this tool wouldn't be useful if you couldn't customize it when you are ready for it.
+## Learn More
+You can learn more in the [Create React App documentation](https://facebook.github.io/create-react-app/docs/getting-started).
+To learn React, check out the [React documentation](https://reactjs.org/).
+### Code Splitting
+This section has moved here: [https://facebook.github.io/create-react-app/docs/code-splitting](https://facebook.github.io/create-react-app/docs/code-splitting)
+### Analyzing the Bundle Size
+This section has moved here: [https://facebook.github.io/create-react-app/docs/analyzing-the-bundle-size](https://facebook.github.io/create-react-app/docs/analyzing-the-bundle-size)
+### Making a Progressive Web App
+This section has moved here: [https://facebook.github.io/create-react-app/docs/making-a-progressive-web-app](https://facebook.github.io/create-react-app/docs/making-a-progressive-web-app)
+### Advanced Configuration
+This section has moved here: [https://facebook.github.io/create-react-app/docs/advanced-configuration](https://facebook.github.io/create-react-app/docs/advanced-configuration)
+### Deployment
+This section has moved here: [https://facebook.github.io/create-react-app/docs/deployment](https://facebook.github.io/create-react-app/docs/deployment)
+### `npm run build` fails to minify
+This section has moved here: [https://facebook.github.io/create-react-app/docs/troubleshooting#npm-run-build-fails-to-minify](https://facebook.github.io/create-react-app/docs/troubleshooting#npm-run-build-fails-to-minify)

backend/.gitignore ADDED Viewed

	@@ -0,0 +1,18 @@

+venv/
+__pycache__/
+# Ignorer tout le dossier des scripts sensibles
+app/scripts/
+force_admin.py
+# Ignorer systématiquement les bases de données (Excel et CSV)
+*.xlsx
+*.csv
+# Ignorer les journaux de progression et fichiers temporaires
+ba7ath_progress.txt
+*.log
+*.txt
+.env
+.env.*
+../.env*.db

backend/.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.11.13

backend/ahlya_vs_trovit_fuzzy.py ADDED Viewed

	@@ -0,0 +1,181 @@

+import re
+from pathlib import Path
+import pandas as pd
+from rapidfuzz import process, fuzz
+# -------- CONFIG --------
+CSV_AHLYA = Path("Ahlya_Total_Feuil1.csv")
+CSV_TROVIT = Path("trovit_charikat_ahliya_all.csv")
+# Noms de colonnes
+COL_NAME_AHLYA = "اسم_الشركة"
+COL_NAME_TROVIT = "name"
+# Seuils de décision
+# score >= MATCH_THRESHOLD       => match strict
+# MAYBE_THRESHOLD <= score < MATCH_THRESHOLD => à vérifier
+MATCH_THRESHOLD = 95
+MAYBE_THRESHOLD = 85
+# Fichiers de sortie
+OUT_ALL = Path("ahlya_vs_trovit_fuzzy_all.csv")
+OUT_NON_MATCH = Path("ahlya_not_in_trovit_fuzzy.csv")
+OUT_MATCHES_STRICT = Path("ahlya_matches_stricts.csv")
+OUT_MAYBE = Path("ahlya_a_verifier.csv")
+# Encodage CSV
+ENCODING = "utf-8-sig"
+# ------------------------
+def normalize_name(s: str) -> str:
+    """Normalisation agressive pour comparer des noms arabes proches."""
+    if pd.isna(s):
+        return ""
+    s = str(s).strip()
+    # Unifier quelques lettres arabes fréquentes
+    s = s.replace("أ", "ا").replace("إ", "ا").replace("آ", "ا")
+    s = s.replace("ى", "ي").replace("ئ", "ي").replace("ؤ", "و")
+    s = s.replace("ة", "ه")
+    # Supprimer mots génériques
+    generic = [
+        "شركة", "الشركة",
+        "الاهلية", "الأهلية", "الاهليه",
+        "المحلية", "المحليه",
+        "الجهوية", "الجهويه",
+    ]
+    for g in generic:
+        s = s.replace(g, "")
+    # Supprimer ponctuation simple et normaliser les espaces
+    s = re.sub(r"[^\w\s]", " ", s)
+    s = " ".join(s.split())
+    return s
+def main():
+    if not CSV_AHLYA.exists():
+        raise FileNotFoundError(CSV_AHLYA.resolve())
+    if not CSV_TROVIT.exists():
+        raise FileNotFoundError(CSV_TROVIT.resolve())
+    # 1. Charger les deux fichiers
+    df_ahlya = pd.read_csv(CSV_AHLYA, encoding=ENCODING)
+    df_trovit = pd.read_csv(CSV_TROVIT, encoding=ENCODING)
+    if COL_NAME_AHLYA not in df_ahlya.columns:
+        raise KeyError(
+            f"Colonne '{COL_NAME_AHLYA}' absente dans {CSV_AHLYA.name} : "
+            f"{list(df_ahlya.columns)}"
+        )
+    if COL_NAME_TROVIT not in df_trovit.columns:
+        raise KeyError(
+            f"Colonne '{COL_NAME_TROVIT}' absente dans {CSV_TROVIT.name} : "
+            f"{list(df_trovit.columns)}"
+        )
+    # 2. Créer des versions normalisées des noms
+    df_ahlya["__name_norm__"] = df_ahlya[COL_NAME_AHLYA].apply(normalize_name)
+    df_trovit["__name_norm__"] = df_trovit[COL_NAME_TROVIT].apply(normalize_name)
+    # Liste des noms trovit pour RapidFuzz
+    trovit_names = df_trovit["__name_norm__"].tolist()
+    best_scores = []
+    best_indexes = []
+    # 3. Pour chaque société Ahlya, chercher le meilleur match dans Trovit
+    for _, row in df_ahlya.iterrows():
+        name_a = row["__name_norm__"]
+        if not name_a:
+            best_scores.append(0)
+            best_indexes.append(None)
+            continue
+        match = process.extractOne(
+            name_a,
+            trovit_names,
+            scorer=fuzz.token_sort_ratio,
+        )
+        if match is None:
+            best_scores.append(0)
+            best_indexes.append(None)
+        else:
+            _, score, idx = match
+            best_scores.append(score)
+            best_indexes.append(idx)
+    df_ahlya["match_score"] = best_scores
+    df_ahlya["trovit_index"] = best_indexes
+    df_ahlya["has_candidate"] = df_ahlya["trovit_index"].notna()
+    # 4. Ajouter quelques colonnes Trovit pour contexte (nom, wilaya, delegation, ids…)
+    def extract_from_trovit(idx, col):
+        if pd.isna(idx):
+            return None
+        idx = int(idx)
+        if 0 <= idx < len(df_trovit):
+            return df_trovit.iloc[idx].get(col)
+        return None
+    trovit_cols_to_add = [
+        COL_NAME_TROVIT,
+        "charika_id",
+        "tax_id",
+        "wilaya",
+        "delegation",
+        "capital",
+        "legal_form",
+        "detail_url",
+    ]
+    for col in trovit_cols_to_add:
+        new_col = f"trovit_{col}"
+        if col in df_trovit.columns:
+            df_ahlya[new_col] = df_ahlya["trovit_index"].apply(
+                lambda i: extract_from_trovit(i, col)
+            )
+        else:
+            df_ahlya[new_col] = None
+    # 5. Marquer les catégories
+    df_ahlya["matched_strict"] = df_ahlya["match_score"] >= MATCH_THRESHOLD
+    df_ahlya["matched_maybe"] = (
+        (df_ahlya["match_score"] >= MAYBE_THRESHOLD)
+        & (df_ahlya["match_score"] < MATCH_THRESHOLD)
+    )
+    # 6. Sauvegarder toutes les lignes avec info de match
+    df_ahlya.to_csv(OUT_ALL, index=False, encoding=ENCODING)
+    # 7. Fichiers dérivés
+    df_matches = df_ahlya[df_ahlya["matched_strict"]].copy()
+    df_maybe = df_ahlya[df_ahlya["matched_maybe"]].copy()
+    df_non_match = df_ahlya[~(df_ahlya["matched_strict"] | df_ahlya["matched_maybe"])].copy()
+    df_matches.to_csv(OUT_MATCHES_STRICT, index=False, encoding=ENCODING)
+    df_maybe.to_csv(OUT_MAYBE, index=False, encoding=ENCODING)
+    df_non_match.to_csv(OUT_NON_MATCH, index=False, encoding=ENCODING)
+    print(f"[INFO] Lignes Ahlya : {len(df_ahlya)}")
+    print(f"[INFO] Matchs stricts (score >= {MATCH_THRESHOLD}) : {len(df_matches)}")
+    print(
+        f"[INFO] À vérifier ({MAYBE_THRESHOLD} <= score < {MATCH_THRESHOLD}) : "
+        f"{len(df_maybe)}"
+    )
+    print(f"[INFO] Non-concordances (score < {MAYBE_THRESHOLD}) : {len(df_non_match)}")
+    print(f"[OK] Fichier complet : {OUT_ALL.resolve()}")
+    print(f"[OK] Matchs stricts : {OUT_MATCHES_STRICT.resolve()}")
+    print(f"[OK] À vérifier : {OUT_MAYBE.resolve()}")
+    print(f"[OK] Non-concordances : {OUT_NON_MATCH.resolve()}")
+if __name__ == "__main__":
+    main()

backend/app.py ADDED Viewed

	@@ -0,0 +1,8 @@

+import uvicorn
+# On importe ton instance FastAPI ('app') depuis ton fichier app/main.py
+from app.main import app
+# Hugging Face va exécuter ce fichier, qui lancera ton API sur le bon port
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=7860)

backend/app/api/enrichment.py ADDED Viewed

	@@ -0,0 +1,529 @@

+from fastapi import APIRouter, HTTPException, Depends
+from typing import List, Optional
+from pydantic import BaseModel, Field
+from datetime import datetime
+from sqlalchemy.orm import Session
+import uuid
+from app.database import get_db
+from app.models.enrichment_models import (
+    EnrichedCompany as EnrichedCompanyDB,
+    InvestigationNote as InvestigationNoteDB
+)
+router = APIRouter()
+# --- Pydantic Models (Request/Response shapes) ---
+class Shareholder(BaseModel):
+    name: str
+    percentage: float
+    role: str
+class RneData(BaseModel):
+    # Existing fields
+    capital_social: float = 0.0
+    legal_form: Optional[str] = None  # Made optional for CSV import compatibility
+    registration_number: Optional[str] = ""
+    registration_date: Optional[str] = ""
+    address: Optional[str] = None
+    shareholders: List[Shareholder] = []
+    # Trovit CSV fields (1:1 mapping)
+    charika_type: Optional[str] = None
+    charika_id: Optional[str] = None
+    name: Optional[str] = None
+    delegation: Optional[str] = None
+    zipcode_list: Optional[str] = None
+    start_date_raw: Optional[str] = None
+    capital: Optional[int] = None  # Distinct from capital_social (float), kept for CSV fidelity
+    tax_id: Optional[str] = None
+    rc_number: Optional[str] = None
+    founding_date_iso: Optional[str] = None
+    zipcode_detail: Optional[str] = None
+    wilaya: Optional[str] = None
+    founding_location: Optional[str] = None
+    detail_url: Optional[str] = None
+class JortAnnouncement(BaseModel):
+    date: str
+    type: str
+    jort_number: Optional[str] = None
+    content: str
+    year: Optional[int] = None
+class JortData(BaseModel):
+    announcements: List[JortAnnouncement] = []
+class Contract(BaseModel):
+    date: str
+    organisme: str
+    type: str
+    montant: float
+    objet: str
+class MarchesData(BaseModel):
+    contracts: List[Contract] = []
+class EnrichmentData(BaseModel):
+    rne: RneData
+    jort: JortData
+    marches: MarchesData
+    notes: Optional[str] = None
+class RedFlag(BaseModel):
+    type: str
+    severity: str
+    message_ar: str
+class Metrics(BaseModel):
+    total_contracts: int
+    total_contracts_value: float
+    capital_to_contracts_ratio: float
+    red_flags: List[RedFlag] = []
+class EnrichedCompanyRequest(BaseModel):
+    company_id: str
+    company_name: str
+    wilaya: str
+    data: EnrichmentData
+    enriched_by: str = "Journalist"
+    enriched_at: Optional[str] = None
+class EnrichedCompanyResponse(BaseModel):
+    company_id: str
+    company_name: str
+    wilaya: str
+    data: dict
+    metrics: dict
+    enriched_by: str
+    enriched_at: Optional[str]
+# --- Investigation Notes Pydantic Models ---
+class CreateNoteRequest(BaseModel):
+    title: str
+    content: str
+    created_by: Optional[str] = "Unknown"
+    tags: Optional[List[str]] = []
+class UpdateNoteRequest(BaseModel):
+    title: Optional[str] = None
+    content: Optional[str] = None
+    tags: Optional[List[str]] = None
+# --- Business Logic ---
+def calculate_red_flags(data: EnrichmentData) -> Metrics:
+    """Calculate metrics and detect red flags from enrichment data."""
+    total_contracts = len(data.marches.contracts)
+    total_value = sum(c.montant for c in data.marches.contracts)
+    capital = data.rne.capital_social if data.rne.capital_social > 0 else 1
+    ratio = total_value / capital
+    flags = []
+    # Flag: High Ratio (> 10x)
+    if ratio > 10:
+        flags.append(RedFlag(
+            type="FINANCIAL_RATIO",
+            severity="HIGH",
+            message_ar=f"قيمة الصفقات تتجاوز رأس المال بـ {ratio:.1f} مرة"
+        ))
+    # Flag: Gré à gré frequency
+    gre_a_gre_count = sum(1 for c in data.marches.contracts if "تراضي" in c.type or "Direct" in c.type)
+    if total_contracts > 0 and (gre_a_gre_count / total_contracts) > 0.5:
+        flags.append(RedFlag(
+            type="PROCUREMENT_METHOD",
+            severity="HIGH",
+            message_ar="أكثر من 50% من الصفقات بالتراضي"
+        ))
+    # Flag: Single Shareholder
+    if len(data.rne.shareholders) == 1:
+        flags.append(RedFlag(
+            type="GOVERNANCE",
+            severity="MEDIUM",
+            message_ar="مساهم وحيد في الشركة"
+        ))
+    return Metrics(
+        total_contracts=total_contracts,
+        total_contracts_value=total_value,
+        capital_to_contracts_ratio=ratio,
+        red_flags=flags
+    )
+def db_company_to_dict(company: EnrichedCompanyDB) -> dict:
+    """Convert SQLAlchemy model to dict matching the frontend-expected shape."""
+    return {
+        "company_id": company.company_id,
+        "company_name": company.company_name,
+        "wilaya": company.wilaya,
+        "data": company.data,
+        "metrics": company.metrics,
+        "enriched_by": company.enriched_by,
+        "enriched_at": company.enriched_at.isoformat() if company.enriched_at else None,
+    }
+def db_note_to_dict(note: InvestigationNoteDB) -> dict:
+    """Convert SQLAlchemy note model to dict matching the frontend-expected shape."""
+    return {
+        "id": note.id,
+        "title": note.title,
+        "content": note.content,
+        "created_at": note.created_at.isoformat() if note.created_at else None,
+        "updated_at": note.updated_at.isoformat() if note.updated_at else None,
+        "created_by": note.created_by,
+        "tags": note.tags or []
+    }
+# --- Enrichment Endpoints ---
+@router.post("/manual")
+def save_manual_enrichment(payload: EnrichedCompanyRequest, db: Session = Depends(get_db)):
+    """Save or update an enriched company profile."""
+    # Calculate metrics & flags
+    metrics = calculate_red_flags(payload.data)
+    metrics_dict = {
+        "total_contracts": metrics.total_contracts,
+        "total_contracts_value": metrics.total_contracts_value,
+        "capital_to_contracts_ratio": metrics.capital_to_contracts_ratio,
+        "red_flags": [f.dict() for f in metrics.red_flags]
+    }
+    data_dict = payload.data.dict()
+    enriched_at = datetime.fromisoformat(payload.enriched_at) if payload.enriched_at else datetime.utcnow()
+    # Check if company exists (upsert)
+    existing = db.query(EnrichedCompanyDB).filter(
+        EnrichedCompanyDB.company_id == payload.company_id
+    ).first()
+    if existing:
+        # Update existing record
+        existing.company_name = payload.company_name
+        existing.wilaya = payload.wilaya
+        existing.data = data_dict
+        existing.metrics = metrics_dict
+        existing.enriched_by = payload.enriched_by
+        existing.enriched_at = enriched_at
+        db.commit()
+        db.refresh(existing)
+        company_obj = existing
+    else:
+        # Create new record
+        company_obj = EnrichedCompanyDB(
+            company_id=payload.company_id,
+            company_name=payload.company_name,
+            wilaya=payload.wilaya,
+            data=data_dict,
+            metrics=metrics_dict,
+            enriched_by=payload.enriched_by,
+            enriched_at=enriched_at,
+        )
+        db.add(company_obj)
+        db.commit()
+        db.refresh(company_obj)
+    return db_company_to_dict(company_obj)
+@router.get("/profile/{company_id}")
+def get_enriched_profile(company_id: str, db: Session = Depends(get_db)):
+    """Get a single enriched company profile by ID."""
+    company = db.query(EnrichedCompanyDB).filter(
+        EnrichedCompanyDB.company_id == company_id
+    ).first()
+    if not company:
+        raise HTTPException(status_code=404, detail="Profile not enriched yet")
+    return db_company_to_dict(company)
+@router.get("/status/{company_id}")
+def check_enrichment_status(company_id: str, db: Session = Depends(get_db)):
+    """Check if a company has been enriched."""
+    company = db.query(EnrichedCompanyDB).filter(
+        EnrichedCompanyDB.company_id == company_id
+    ).first()
+    return {
+        "company_id": company_id,
+        "is_enriched": company is not None
+    }
+@router.get("/all")
+def get_all_enriched(db: Session = Depends(get_db)):
+    """Get all enriched companies (without pagination)."""
+    companies = db.query(EnrichedCompanyDB).order_by(
+        EnrichedCompanyDB.enriched_at.desc()
+    ).all()
+    return [db_company_to_dict(c) for c in companies]
+@router.get("/list")
+def list_enriched_companies(
+    page: int = 1,
+    per_page: int = 12,
+    search: Optional[str] = None,
+    wilaya: Optional[str] = None,
+    has_red_flags: Optional[bool] = None,
+    db: Session = Depends(get_db)
+):
+    """List all enriched companies with filters and pagination."""
+    query = db.query(EnrichedCompanyDB)
+    # Filter by search (company name)
+    if search:
+        query = query.filter(EnrichedCompanyDB.company_name.ilike(f"%{search}%"))
+    # Filter by wilaya
+    if wilaya:
+        query = query.filter(EnrichedCompanyDB.wilaya == wilaya)
+    # Get all matching companies for counting and red flag filtering
+    # (SQLite JSON filtering is limited, so we filter in Python for has_red_flags)
+    all_companies = query.order_by(EnrichedCompanyDB.enriched_at.desc()).all()
+    # Convert to dicts and apply red flag filter if needed
+    companies_dicts = [db_company_to_dict(c) for c in all_companies]
+    if has_red_flags is not None:
+        if has_red_flags:
+            companies_dicts = [
+                c for c in companies_dicts
+                if c.get('metrics', {}).get('red_flags')
+            ]
+        else:
+            companies_dicts = [
+                c for c in companies_dicts
+                if not c.get('metrics', {}).get('red_flags')
+            ]
+    # Pagination
+    total = len(companies_dicts)
+    start = (page - 1) * per_page
+    end = start + per_page
+    paginated = companies_dicts[start:end]
+    return {
+        "companies": paginated,
+        "total": total,
+        "page": page,
+        "per_page": per_page,
+        "total_pages": (total + per_page - 1) // per_page if total > 0 else 1
+    }
+# --- Investigation Notes Endpoints ---
+@router.post("/{company_id}/notes")
+def create_note(company_id: str, request: CreateNoteRequest, db: Session = Depends(get_db)):
+    """Create a new investigation note for a company."""
+    # Check if company exists
+    company = db.query(EnrichedCompanyDB).filter(
+        EnrichedCompanyDB.company_id == company_id
+    ).first()
+    if not company:
+        raise HTTPException(status_code=404, detail="Company not found")
+    now = datetime.utcnow()
+    note = InvestigationNoteDB(
+        id=str(uuid.uuid4()),
+        company_id=company_id,
+        title=request.title,
+        content=request.content,
+        created_by=request.created_by or "Unknown",
+        tags=request.tags or [],
+        created_at=now,
+        updated_at=now,
+    )
+    db.add(note)
+    db.commit()
+    db.refresh(note)
+    # Count total notes for this company
+    total_notes = db.query(InvestigationNoteDB).filter(
+        InvestigationNoteDB.company_id == company_id
+    ).count()
+    return {
+        "status": "success",
+        "note": db_note_to_dict(note),
+        "total_notes": total_notes
+    }
+@router.get("/{company_id}/notes")
+def get_notes(company_id: str, db: Session = Depends(get_db)):
+    """Get all investigation notes for a company."""
+    company = db.query(EnrichedCompanyDB).filter(
+        EnrichedCompanyDB.company_id == company_id
+    ).first()
+    if not company:
+        raise HTTPException(status_code=404, detail="Company not found")
+    notes = db.query(InvestigationNoteDB).filter(
+        InvestigationNoteDB.company_id == company_id
+    ).order_by(InvestigationNoteDB.created_at.desc()).all()
+    return {
+        "company_id": company_id,
+        "company_name": company.company_name,
+        "notes": [db_note_to_dict(n) for n in notes],
+        "total": len(notes)
+    }
+@router.put("/{company_id}/notes/{note_id}")
+def update_note(
+    company_id: str,
+    note_id: str,
+    updates: UpdateNoteRequest,
+    db: Session = Depends(get_db)
+):
+    """Update an existing investigation note."""
+    note = db.query(InvestigationNoteDB).filter(
+        InvestigationNoteDB.company_id == company_id,
+        InvestigationNoteDB.id == note_id
+    ).first()
+    if not note:
+        raise HTTPException(status_code=404, detail="Note not found")
+    if updates.title is not None:
+        note.title = updates.title
+    if updates.content is not None:
+        note.content = updates.content
+    if updates.tags is not None:
+        note.tags = updates.tags
+    note.updated_at = datetime.utcnow()
+    db.commit()
+    db.refresh(note)
+    return {
+        "status": "success",
+        "note": db_note_to_dict(note)
+    }
+@router.delete("/{company_id}/notes/{note_id}")
+def delete_note(company_id: str, note_id: str, db: Session = Depends(get_db)):
+    """Delete an investigation note."""
+    note = db.query(InvestigationNoteDB).filter(
+        InvestigationNoteDB.company_id == company_id,
+        InvestigationNoteDB.id == note_id
+    ).first()
+    if not note:
+        raise HTTPException(status_code=404, detail="Note not found")
+    db.delete(note)
+    db.commit()
+    # Count remaining notes
+    remaining = db.query(InvestigationNoteDB).filter(
+        InvestigationNoteDB.company_id == company_id
+    ).count()
+    return {
+        "status": "success",
+        "deleted_note_id": note_id,
+        "total_notes": remaining
+    }
+# --- Watchlist Endpoints ---
+class WatchCompanyOut(BaseModel):
+    id: str
+    name_ar: str
+    wilaya: Optional[str]
+    delegation: Optional[str]
+    activity: Optional[str]
+    type: Optional[str]
+    date_annonce: Optional[str]
+    etat_enregistrement: str
+    detected_trovit_at: Optional[datetime]
+    detected_trovit_charika_id: Optional[str]
+    detected_trovit_url: Optional[str]
+    created_at: datetime
+    updated_at: datetime
+    class Config:
+        from_attributes = True
+class WatchCompanyUpdate(BaseModel):
+    etat_enregistrement: Optional[str] = None
+    detected_trovit_charika_id: Optional[str] = None
+    detected_trovit_url: Optional[str] = None
+@router.get("/watch-companies", response_model=List[WatchCompanyOut])
+def list_watch_companies(
+    wilaya: Optional[str] = None,
+    etat: Optional[str] = None,
+    q: Optional[str] = None,
+    db: Session = Depends(get_db)
+):
+    """List companies in the watchlist with optional filters."""
+    from app.models.enrichment_models import WatchCompany
+    query = db.query(WatchCompany)
+    if wilaya:
+        query = query.filter(WatchCompany.wilaya == wilaya)
+    if etat:
+        query = query.filter(WatchCompany.etat_enregistrement == etat)
+    if q:
+        query = query.filter(WatchCompany.name_ar.ilike(f"%{q}%"))
+    # Default sort: created_at desc
+    return query.order_by(WatchCompany.created_at.desc()).all()
+@router.patch("/watch-companies/{company_id}", response_model=WatchCompanyOut)
+def update_watch_company(
+    company_id: str,
+    updates: WatchCompanyUpdate,
+    db: Session = Depends(get_db)
+):
+    """Update status or details of a watched company."""
+    from app.models.enrichment_models import WatchCompany
+    company = db.query(WatchCompany).filter(WatchCompany.id == company_id).first()
+    if not company:
+        raise HTTPException(status_code=404, detail="Watch company not found")
+    if updates.etat_enregistrement is not None:
+        company.etat_enregistrement = updates.etat_enregistrement
+        if updates.etat_enregistrement == "detected_trovit" and not company.detected_trovit_at:
+             company.detected_trovit_at = datetime.utcnow()
+    if updates.detected_trovit_charika_id is not None:
+        company.detected_trovit_charika_id = updates.detected_trovit_charika_id
+    if updates.detected_trovit_url is not None:
+        company.detected_trovit_url = updates.detected_trovit_url
+    company.updated_at = datetime.utcnow()
+    db.commit()
+    db.refresh(company)
+    return company

backend/app/api/v1/auth.py ADDED Viewed

	@@ -0,0 +1,88 @@

+from datetime import timedelta
+from fastapi import APIRouter, Depends, HTTPException, status
+from fastapi.security import OAuth2PasswordRequestForm
+from sqlalchemy.orm import Session
+from app.database import get_db
+from app.models.user_models import User
+from app.schemas.auth_schemas import Token, UserCreate, UserRead, UserUpdate
+from app.services.auth_service import (
+    ACCESS_TOKEN_EXPIRE_MINUTES,
+    create_access_token,
+    get_password_hash,
+    verify_password,
+    get_current_active_user,
+    get_current_admin_user
+)
+router = APIRouter()
+@router.post("/login", response_model=Token)
+async def login_for_access_token(form_data: OAuth2PasswordRequestForm = Depends(), db: Session = Depends(get_db)):
+    user = db.query(User).filter(User.email == form_data.username).first()
+    if not user or not verify_password(form_data.password, user.hashed_password):
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Incorrect username or password",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+    access_token_expires = timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES)
+    access_token = create_access_token(
+        data={"sub": user.email}, expires_delta=access_token_expires
+    )
+    return {"access_token": access_token, "token_type": "bearer"}
+@router.post("/users", response_model=UserRead)
+def create_user(user: UserCreate, db: Session = Depends(get_db)):
+    # Check if user exists
+    db_user = db.query(User).filter(User.email == user.email).first()
+    if db_user:
+        raise HTTPException(status_code=400, detail="Email already registered")
+    hashed_password = get_password_hash(user.password)
+    new_user = User(
+        email=user.email,
+        hashed_password=hashed_password,
+        full_name=user.full_name,
+        is_active=user.is_active,
+        is_admin=user.is_admin
+    )
+    db.add(new_user)
+    db.commit()
+    db.refresh(new_user)
+    return new_user
+@router.get("/me", response_model=UserRead)
+async def read_users_me(current_user: User = Depends(get_current_active_user)):
+    return current_user
+@router.get("/users", response_model=list[UserRead])
+def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db), current_user: User = Depends(get_current_admin_user)):
+    users = db.query(User).offset(skip).limit(limit).all()
+    return users
+@router.patch("/users/{user_id}", response_model=UserRead)
+def update_user(user_id: int, user_update: UserUpdate, db: Session = Depends(get_db), current_user: User = Depends(get_current_admin_user)):
+    db_user = db.query(User).filter(User.id == user_id).first()
+    if not db_user:
+        raise HTTPException(status_code=404, detail="User not found")
+    if user_update.is_active is not None:
+        db_user.is_active = user_update.is_active
+    if user_update.is_admin is not None:
+        db_user.is_admin = user_update.is_admin
+    db.commit()
+    db.refresh(db_user)
+    return db_user
+@router.delete("/users/{user_id}", status_code=status.HTTP_204_NO_CONTENT)
+def delete_user(user_id: int, db: Session = Depends(get_db), current_user: User = Depends(get_current_admin_user)):
+    db_user = db.query(User).filter(User.id == user_id).first()
+    if not db_user:
+        raise HTTPException(status_code=404, detail="User not found")
+    db.delete(db_user)
+    db.commit()
+    return None

backend/app/api/v1/companies.py ADDED Viewed

	@@ -0,0 +1,46 @@

+from fastapi import APIRouter, Query
+from typing import List, Optional
+from app.services.data_loader import get_companies_df
+from app.models.schemas import Company, CompanyWithLinks
+from app.services.osint_links import get_company_links
+router = APIRouter()
+@router.get("/", response_model=List[Company])
+def list_companies(
+    wilaya: Optional[str] = None,
+    group: Optional[str] = None,
+    type: Optional[str] = None,
+    search: Optional[str] = None,
+    limit: int = 50
+):
+    df = get_companies_df()
+    if df.empty:
+        return []
+    if wilaya:
+        df = df[df['wilaya'] == wilaya]
+    if group:
+        df = df[df['activity_group'] == group]
+    if type:
+        df = df[df['type'] == type]
+    if search:
+        mask = df['name'].str.contains(search, na=False) | df['activity_normalized'].str.contains(search, na=False)
+        df = df[mask]
+    return df.head(limit).to_dict(orient='records')
+@router.get("/{company_id}", response_model=CompanyWithLinks)
+def read_company(company_id: int):
+    df = get_companies_df()
+    company = df[df['id'] == company_id]
+    if company.empty:
+        return {} # Should raise 404
+    data = company.iloc[0].to_dict()
+    data['osint_links'] = get_company_links(company_id)
+    return data
+@router.get("/{company_id}/osint_links")
+def read_company_links(company_id: int):
+    return get_company_links(company_id)

backend/app/api/v1/investigate.py ADDED Viewed

	@@ -0,0 +1,181 @@

+"""
+Ba7ath Investigation Endpoint
+==============================
+POST /api/v1/investigate/{company_id}
+Cross-references Ahlya (CSV), JORT (DB), and RNE (DB) data via Gemini LLM.
+"""
+from fastapi import APIRouter, HTTPException, Depends
+from pydantic import BaseModel, Field
+from typing import Optional, List
+from datetime import datetime
+from sqlalchemy.orm import Session
+from app.database import get_db
+from app.models.enrichment_models import EnrichedCompany as EnrichedCompanyDB
+from app.services.llm_service import llm_service
+from app.services.data_loader import get_companies_df
+from app.services.auth_service import get_current_user
+import logging
+logger = logging.getLogger("ba7ath.investigate")
+router = APIRouter()
+# ── Pydantic Response Models ─────────────────────────────────────────────
+class LLMAnalysis(BaseModel):
+    """The structured output from Gemini."""
+    match_score: int = Field(0, ge=0, le=100, description="Score de correspondance (0-100)")
+    status: str = Field("Pending", description="Verified | Suspicious | Conflict | Pending")
+    findings: List[str] = Field(default_factory=list, description="النقاط المتطابقة")
+    red_flags: List[str] = Field(default_factory=list, description="التجاوزات المرصودة")
+    summary_ar: str = Field("", description="ملخص التحقيق بالعربية")
+class InvestigationResult(BaseModel):
+    """Full investigation response."""
+    company_id: str
+    company_name: str
+    wilaya: str
+    analysis: LLMAnalysis
+    sources_used: List[str] = Field(default_factory=list)
+    analyzed_at: str
+    model_used: str = "gemini-1.5-flash"
+# ── Helper: Extract Ahlya data from CSV ──────────────────────────────────
+def _get_ahlya_data(company_id: str, company_name: str) -> Optional[dict]:
+    """Find the company in the Ahlya DataFrame by ID or name."""
+    df = get_companies_df()
+    if df is None or df.empty:
+        return None
+    # Try matching by company_id first (if there's an ID column)
+    if "company_id" in df.columns:
+        match = df[df["company_id"] == company_id]
+        if not match.empty:
+            return match.iloc[0].to_dict()
+    # Fallback to name matching
+    name_col = "name" if "name" in df.columns else None
+    if name_col is None:
+        for col in df.columns:
+            if "name" in col.lower() or "اسم" in col:
+                name_col = col
+                break
+    if name_col:
+        # Normalize for fuzzy matching
+        normalized_target = company_name.strip().upper()
+        match = df[df[name_col].astype(str).str.strip().str.upper() == normalized_target]
+        if not match.empty:
+            return match.iloc[0].to_dict()
+    return None
+# ── Main Endpoint ────────────────────────────────────────────────────────
+@router.post(
+    "/{company_id}",
+    response_model=InvestigationResult,
+    summary="تحليل المقارنة المتقاطعة عبر الذكاء الاصطناعي"
+)
+async def investigate_company(
+    company_id: str,
+    db: Session = Depends(get_db),
+    current_user=Depends(get_current_user),
+):
+    """
+    Cross-reference a company's data from Ahlya (CSV), JORT (DB enrichment),
+    and RNE (DB enrichment) using Gemini 1.5 Flash LLM analysis.
+    Returns a structured investigation report in Arabic (MSA).
+    """
+    logger.info(f"📋 Investigation request for company_id: {company_id}")
+    # ── 1. Retrieve enriched data from SQLite ────────────────────────────
+    enriched = db.query(EnrichedCompanyDB).filter(
+        EnrichedCompanyDB.company_id == company_id
+    ).first()
+    if not enriched:
+        raise HTTPException(
+            status_code=404,
+            detail=f"الشركة '{company_id}' غير موجودة في قاعدة البيانات المُثرَاة"
+        )
+    company_name = enriched.company_name
+    wilaya = enriched.wilaya
+    enrichment_data = enriched.data or {}
+    # Extract JORT and RNE from enrichment data
+    jort_data = enrichment_data.get("jort", {})
+    rne_data = enrichment_data.get("rne", {})
+    # ── 2. Retrieve Ahlya data from CSV ──────────────────────────────────
+    ahlya_data = _get_ahlya_data(company_id, company_name)
+    # Track which sources were used
+    sources_used = []
+    if ahlya_data:
+        sources_used.append("أهلية (CSV)")
+    if jort_data and jort_data.get("announcements"):
+        sources_used.append("الرائد الر��مي (JORT)")
+    if rne_data and (rne_data.get("capital_social") or rne_data.get("tax_id")):
+        sources_used.append("السجل الوطني (RNE)")
+    if not sources_used:
+        raise HTTPException(
+            status_code=422,
+            detail="لا توجد بيانات كافية لإجراء التحليل المتقاطع"
+        )
+    # ── 3. Build the payload for Gemini ───────────────────────────────────
+    ahlya_payload = ahlya_data or {"company_name": company_name, "wilaya": wilaya}
+    jort_payload = jort_data if jort_data.get("announcements") else {}
+    rne_payload = rne_data if rne_data.get("capital_social") or rne_data.get("tax_id") else {}
+    # Clean NaN/float values from ahlya DataFrame row
+    if ahlya_payload:
+        ahlya_payload = {
+            k: (None if (isinstance(v, float) and (v != v)) else v)
+            for k, v in ahlya_payload.items()
+        }
+    # ── 4. Call LLM Analysis ─────────────────────────────────────────────
+    logger.info(
+        f"🚀 Sending to Gemini: company='{company_name}', "
+        f"sources={sources_used}"
+    )
+    raw_analysis = await llm_service.analyze_cross_check(
+        ahlya_data=ahlya_payload,
+        jort_data=jort_payload,
+        rne_data=rne_payload,
+    )
+    # Parse into Pydantic model (validates schema)
+    analysis = LLMAnalysis(
+        match_score=raw_analysis.get("match_score", 0),
+        status=raw_analysis.get("status", "Pending"),
+        findings=raw_analysis.get("findings", []),
+        red_flags=raw_analysis.get("red_flags", []),
+        summary_ar=raw_analysis.get("summary_ar", ""),
+    )
+    # ── 5. Build response ────────────────────────────────────────────────
+    return InvestigationResult(
+        company_id=company_id,
+        company_name=company_name,
+        wilaya=wilaya,
+        analysis=analysis,
+        sources_used=sources_used,
+        analyzed_at=datetime.utcnow().isoformat(),
+        model_used="gemini-1.5-flash",
+    )

backend/app/api/v1/meta.py ADDED Viewed

	@@ -0,0 +1,28 @@

+from fastapi import APIRouter
+router = APIRouter()
+@router.get("/methodology")
+def methodology():
+    return {
+        "title": "Methodology",
+        "description": "How we process data and compute metrics.",
+        "content_ar": """
+        تم استخراج البيانات من السجل الوطني للشركات الأهلية (alahlia.tn).
+        مؤشر 'بحث' (Ba7ath Index) هو مؤشر مركب يقيس ثلاث أبعاد رئيسية (0-100):
+        1. الاعتماد على الموارد العمومية (40%): نسبة الشركات في قطاعات الفلاحة، المناجم، والبيئة.
+        2. التركيز القطاعي (40%): مدى هيمنة قطاع واحد على اقتصاد الجهة.
+        3. التوازن المحلي/الجهوي (20%): الفرق بين نسبة الشركات المحلية والجهوية.
+        صيغة الاحتساب: INDEX = 100 * (0.4 * s1 + 0.4 * s2 + 0.2 * s3)
+        """
+    }
+@router.get("/sources")
+def sources():
+    return [
+        {"name": "RNE", "url": "https://www.registre-entreprises.tn", "description_ar": "للتثبت من الوضعية القانونية للشركة."},
+        {"name": "JORT", "url": "http://www.iort.gov.tn", "description_ar": "للبحث عن النصوص التأسيسية."},
+        {"name": "INS", "url": "http://www.ins.tn", "description_ar": "للمقارنة مع الإحصائيات الرسمية."}
+    ]

backend/app/api/v1/risk.py ADDED Viewed

	@@ -0,0 +1,14 @@

+from fastapi import APIRouter
+from typing import List
+from app.services.risk_engine import get_risk_for_wilaya, get_all_risks
+from app.models.schemas import WilayaRisk
+router = APIRouter()
+@router.get("/wilayas", response_model=List[WilayaRisk])
+def list_risks():
+    return get_all_risks()
+@router.get("/wilayas/{name}", response_model=WilayaRisk)
+def read_risk(name: str):
+    return get_risk_for_wilaya(name)

backend/app/api/v1/stats.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from fastapi import APIRouter
+from app.services.aggregation import get_national_stats, get_wilaya_stats
+from app.models.schemas import NationalStats, WilayaStats
+router = APIRouter()
+@router.get("/national", response_model=NationalStats)
+def read_national_stats():
+    return get_national_stats()
+@router.get("/wilayas/{name}", response_model=WilayaStats)
+def read_wilaya_stats(name: str):
+    return get_wilaya_stats(name)

backend/app/data/companies.json ADDED Viewed

The diff for this file is too large to render. See raw diff

backend/app/data/stats.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+  "total": 230,
+  "wilayas": {
+    "باجة": 25,
+    "سيدي بوزيد": 22,
+    "قفصة": 19,
+    "صفاقس": 18,
+    "القيروان": 14,
+    "زغوان": 11,
+    "مدنين": 11,
+    "القصرين": 10,
+    "سليانة": 10,
+    "قبلي": 10,
+    "نابل": 10,
+    "توزر": 9,
+    "جندوبة": 8,
+    "المهدية": 7,
+    "تطاوين": 7,
+    "المنستير": 6,
+    "الكاف": 5,
+    "بنزرت": 5,
+    "سوسة": 5,
+    "منوبة": 5,
+    "بن عروس": 4,
+    "تونس": 4,
+    "قابس": 4,
+    "أريانة": 1
+  },
+  "activites_top10": {
+    "فلاحة / صيد و الخدمات المتصلة بها": 71,
+    "زراعة": 21,
+    "تربية الحيوانات": 17,
+    "فلاحة/ صيد و الخدمات المتصلة بها": 15,
+    "خدمات ملحقة بالنقل": 11,
+    "أنشطة ترفيهية و ثقافية و رياضية": 8,
+    "حراجة / إستغلال الغابات": 6,
+    "أنشطة الخدمات الملحقة بالفلاحة بإستثناء الأنشطة البيطرية": 6,
+    "أنشطة ترفيهية": 6,
+    "التطهير وتنظيف الطرقات و التصرف في الفضلات": 5
+  },
+  "types": {
+    "محلية": 178,
+    "جهوية": 52
+  }
+}

backend/app/database.py ADDED Viewed

	@@ -0,0 +1,24 @@

+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker, declarative_base
+# SQLite database file path (relative to where the server runs)
+SQLALCHEMY_DATABASE_URL = "sqlite:///./ba7ath_enriched.db"
+# For SQLite with FastAPI, check_same_thread is required
+engine = create_engine(
+    SQLALCHEMY_DATABASE_URL,
+    connect_args={"check_same_thread": False}
+)
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+Base = declarative_base()
+def get_db():
+    """Dependency that provides a database session per request."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()

backend/app/main.py ADDED Viewed

	@@ -0,0 +1,91 @@

+from dotenv import load_dotenv
+import os
+# Load environment variables as the very first step
+load_dotenv()
+from fastapi import FastAPI, Request, Depends
+from fastapi.responses import JSONResponse
+from starlette.middleware.cors import CORSMiddleware
+from app.api.v1 import stats, companies, risk, meta
+from app.api.v1 import investigate as investigate_api
+from app.services.data_loader import load_data
+from app.database import engine, Base
+from app.models import enrichment_models, user_models
+from app.api.v1 import auth
+from app.services.auth_service import get_current_user
+app = FastAPI(title="Ba7ath OSINT API", version="1.0.0")
+# ── CORS ──────────────────────────────────────────────────────────────
+# Starlette CORSMiddleware with allow_origins=["*"]
+# NOTE: When allow_origins=["*"], allow_credentials MUST be False.
+# The frontend sends the token in the Authorization header, NOT via cookies,
+# so allow_credentials=False is perfectly fine.
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=False,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# ── Startup ───────────────────────────────────────────────────────────
+@app.on_event("startup")
+async def startup_event():
+    print("=" * 60)
+    print("  Ba7ath OSINT API - VERSION CORS V4 (allow_origins=[*])")
+    print("=" * 60)
+    load_data()
+    Base.metadata.create_all(bind=engine)
+# ── Routers ───────────────────────────────────────────────────────────
+app.include_router(auth.router, prefix="/api/v1/auth", tags=["Auth"])
+app.include_router(
+    stats.router,
+    prefix="/api/v1/stats",
+    tags=["Stats"],
+    dependencies=[Depends(get_current_user)],
+)
+app.include_router(
+    companies.router,
+    prefix="/api/v1/companies",
+    tags=["Companies"],
+    dependencies=[Depends(get_current_user)],
+)
+from app.api import enrichment
+app.include_router(
+    risk.router,
+    prefix="/api/v1/risk",
+    tags=["Risk"],
+    dependencies=[Depends(get_current_user)],
+)
+app.include_router(
+    meta.router,
+    prefix="/api/v1/meta",
+    tags=["Meta"],
+    dependencies=[Depends(get_current_user)],
+)
+app.include_router(
+    enrichment.router,
+    prefix="/api/v1/enrichment",
+    tags=["Enrichment"],
+    dependencies=[Depends(get_current_user)],
+)
+app.include_router(
+    investigate_api.router,
+    prefix="/api/v1/investigate",
+    tags=["Investigation"],
+    dependencies=[Depends(get_current_user)],
+)
+@app.get("/")
+def read_root():
+    return {"message": "Ba7ath OSINT API is running - VERSION CORS V4"}

backend/app/models/enrichment_models.py ADDED Viewed

	@@ -0,0 +1,77 @@

+from sqlalchemy import Column, String, Float, DateTime, Text, ForeignKey
+from sqlalchemy.orm import relationship
+from sqlalchemy.dialects.sqlite import JSON
+from datetime import datetime
+from app.database import Base
+class EnrichedCompany(Base):
+    """SQLAlchemy model for enriched company profiles."""
+    __tablename__ = "enriched_companies"
+    company_id = Column(String, primary_key=True, index=True)
+    company_name = Column(String, index=True, nullable=False)
+    wilaya = Column(String, index=True, nullable=False)
+    # Full raw enrichment data (rne, jort, marches, notes) as JSON
+    data = Column(JSON, nullable=False)
+    # Computed metrics (total_contracts, total_contracts_value, ratio, red_flags) as JSON
+    metrics = Column(JSON, nullable=False)
+    enriched_by = Column(String, nullable=True, default="Journalist")
+    enriched_at = Column(DateTime, default=datetime.utcnow)
+    # Relationship to investigation notes
+    notes = relationship(
+        "InvestigationNote",
+        back_populates="company",
+        cascade="all, delete-orphan"
+    )
+class WatchCompany(Base):  # Using Base from database.py (SQLAlchemy), NOT Pydantic
+    __tablename__ = "watch_companies"
+    id = Column(String, primary_key=True, index=True)
+    name_ar = Column(String, index=True, nullable=False)
+    wilaya = Column(String, index=True, nullable=True)
+    delegation = Column(String, nullable=True)
+    activity = Column(String, nullable=True)
+    type = Column(String, nullable=True)  # jihawiya / mahaliya
+    date_annonce = Column(String, nullable=True)  # YYYY-MM-DD or raw text
+    # Status: 'watch', 'detected_trovit', 'detected_rne', 'archived'
+    etat_enregistrement = Column(String, nullable=False, default="watch", index=True)
+    # Auto-detection fields
+    detected_trovit_at = Column(DateTime, nullable=True)
+    detected_trovit_charika_id = Column(String, nullable=True)
+    detected_trovit_url = Column(String, nullable=True)
+    created_at = Column(DateTime, default=datetime.utcnow)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+class InvestigationNote(Base):
+    """SQLAlchemy model for investigation notes attached to a company dossier."""
+    __tablename__ = "investigation_notes"
+    id = Column(String, primary_key=True, index=True)  # UUID as string
+    company_id = Column(
+        String,
+        ForeignKey("enriched_companies.company_id", ondelete="CASCADE"),
+        index=True,
+        nullable=False
+    )
+    title = Column(String, nullable=False)
+    content = Column(Text, nullable=False)
+    created_at = Column(DateTime, default=datetime.utcnow)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+    created_by = Column(String, nullable=True, default="Unknown")
+    # Tags stored as JSON list of strings
+    tags = Column(JSON, nullable=True)
+    # Back-reference to company
+    company = relationship("EnrichedCompany", back_populates="notes")

backend/app/models/schemas.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from pydantic import BaseModel
+from typing import List, Optional, Dict, Any
+class Company(BaseModel):
+    id: Optional[int] = None # Generated ID
+    name: str
+    wilaya: str
+    delegation: Optional[str] = None
+    locality: Optional[str] = None
+    type: str # محلية / جهوية
+    activity_raw: Optional[str] = None
+    activity_normalized: Optional[str] = None
+    activity_group: Optional[str] = None
+    # Status / Match info
+    match_status: Optional[str] = "not_matched" # matched | partial | none
+    # JORT Data
+    jort_ref: Optional[str] = None
+    jort_date: Optional[str] = None
+    jort_capital: Optional[float] = None
+    jort_text: Optional[str] = None
+    # RNE/Trovit Data
+    rne_id: Optional[str] = None
+    rne_tax_id: Optional[str] = None
+    rne_rc_number: Optional[str] = None
+    rne_founding_date: Optional[str] = None
+    rne_capital: Optional[float] = None
+    rne_legal_form: Optional[str] = None
+    rne_address: Optional[str] = None
+    rne_detail_url: Optional[str] = None
+    # Audit Flags
+    capital_divergence: Optional[bool] = False
+class CompanyWithLinks(Company):
+    osint_links: Dict[str, str]
+class WilayaStats(BaseModel):
+    wilaya: str
+    count: int
+    pct_national: float
+    rank: int
+    types: Dict[str, int]
+    top_groups: Dict[str, int]
+    top_activities: Dict[str, int]
+class NationalStats(BaseModel):
+    total: int
+    wilayas: Dict[str, int]
+    types: Dict[str, int]
+    top_activities: Dict[str, int]
+    top_groups: Dict[str, int]
+class Flag(BaseModel):
+    code: str
+    severity: str # "low", "medium", "high"
+    label_ar: str
+class WilayaRisk(BaseModel):
+    wilaya: str
+    baath_index: float
+    s1: float # Dependency on resource sectors
+    s2: float # Concentration in one group
+    s3: float # Governance imbalance
+    flags: List[Flag]
+    # Editorial Enriched Fields
+    level: str  # LOW | MEDIUM | HIGH
+    level_ar: str
+    color: str  # emerald | amber | red
+    comment_ar: str
+    recommendations: List[str]

backend/app/models/user_models.py ADDED Viewed

	@@ -0,0 +1,12 @@

+from sqlalchemy import Column, Integer, String, Boolean
+from app.database import Base
+class User(Base):
+    __tablename__ = "users"
+    id = Column(Integer, primary_key=True, index=True)
+    email = Column(String, unique=True, index=True)
+    hashed_password = Column(String)
+    full_name = Column(String, nullable=True)
+    is_active = Column(Boolean, default=True)
+    is_admin = Column(Boolean, default=False)

backend/app/schemas/auth_schemas.py ADDED Viewed

	@@ -0,0 +1,28 @@

+from pydantic import BaseModel, EmailStr
+from typing import Optional
+class UserBase(BaseModel):
+    email: EmailStr
+    full_name: Optional[str] = None
+    is_active: Optional[bool] = True
+    is_admin: Optional[bool] = False
+class UserCreate(UserBase):
+    password: str
+class UserRead(UserBase):
+    id: int
+    class Config:
+        from_attributes = True
+class UserUpdate(BaseModel):
+    is_active: Optional[bool] = None
+    is_admin: Optional[bool] = None
+class Token(BaseModel):
+    access_token: str
+    token_type: str
+class TokenData(BaseModel):
+    username: Optional[str] = None

backend/app/services/aggregation.py ADDED Viewed

	@@ -0,0 +1,71 @@

+from app.services.data_loader import get_companies_df, get_stats_data
+from app.models.schemas import NationalStats, WilayaStats
+def _safe_value_counts(df, col, head=None):
+    """Safely get value_counts for a column, returning {} if column doesn't exist."""
+    if col not in df.columns:
+        return {}
+    vc = df[col].dropna().value_counts()
+    if head:
+        vc = vc.head(head)
+    return vc.to_dict()
+def get_national_stats():
+    stats = get_stats_data()
+    df = get_companies_df()
+    total = stats.get("total", 0)
+    wilayas = stats.get("wilayas", {})
+    types = stats.get("types", {})
+    if not df.empty:
+        top_groups = _safe_value_counts(df, 'activity_group')
+        top_activities = _safe_value_counts(df, 'activity_normalized', head=10)
+    else:
+        top_groups = {}
+        top_activities = {}
+    return NationalStats(
+        total=total,
+        wilayas=wilayas,
+        types=types,
+        top_activities=top_activities,
+        top_groups=top_groups
+    )
+def get_wilaya_stats(wilaya: str):
+    df = get_companies_df()
+    stats = get_stats_data()
+    if df.empty:
+        return None
+    wilaya_df = df[df['wilaya'] == wilaya]
+    count = len(wilaya_df)
+    total = stats.get("total", 1)
+    pct = round((count / total) * 100, 1)
+    # Rank
+    sorted_wilayas = sorted(stats.get("wilayas", {}).items(), key=lambda x: x[1], reverse=True)
+    rank = next((i for i, (w, c) in enumerate(sorted_wilayas, 1) if w == wilaya), 0)
+    if not wilaya_df.empty:
+        top_groups = _safe_value_counts(wilaya_df, 'activity_group')
+        top_activities = _safe_value_counts(wilaya_df, 'activity_normalized', head=10)
+        types = _safe_value_counts(wilaya_df, 'type')
+    else:
+        top_groups = {}
+        top_activities = {}
+        types = {}
+    return WilayaStats(
+        wilaya=wilaya,
+        count=count,
+        pct_national=pct,
+        rank=rank,
+        types=types,
+        top_groups=top_groups,
+        top_activities=top_activities
+    )

backend/app/services/auth_service.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from datetime import datetime, timedelta
+from typing import Optional
+from jose import JWTError, jwt
+from passlib.context import CryptContext
+from fastapi import Depends, HTTPException, status
+from fastapi.security import OAuth2PasswordBearer
+from sqlalchemy.orm import Session
+import os
+from dotenv import load_dotenv
+from app.database import get_db
+from app.models.user_models import User
+from app.schemas.auth_schemas import TokenData
+load_dotenv()
+# Config
+SECRET_KEY = os.getenv("SECRET_KEY")
+if not SECRET_KEY:
+    raise RuntimeError("SECRET_KEY environment variable is not set")
+ALGORITHM = os.getenv("ALGORITHM", "HS256")
+ACCESS_TOKEN_EXPIRE_MINUTES = int(os.getenv("ACCESS_TOKEN_EXPIRE_MINUTES", 30))
+pwd_context = CryptContext(schemes=["argon2"], deprecated="auto")
+oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")
+def verify_password(plain_password, hashed_password):
+    return pwd_context.verify(plain_password, hashed_password)
+def get_password_hash(password):
+    return pwd_context.hash(password)
+def create_access_token(data: dict, expires_delta: Optional[timedelta] = None):
+    to_encode = data.copy()
+    if expires_delta:
+        expire = datetime.utcnow() + expires_delta
+    else:
+        expire = datetime.utcnow() + timedelta(minutes=15)
+    to_encode.update({"exp": expire})
+    encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
+    return encoded_jwt
+async def get_current_user(token: str = Depends(oauth2_scheme), db: Session = Depends(get_db)):
+    credentials_exception = HTTPException(
+        status_code=status.HTTP_401_UNAUTHORIZED,
+        detail="Could not validate credentials",
+        headers={"WWW-Authenticate": "Bearer"},
+    )
+    try:
+        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
+        username: str = payload.get("sub")
+        if username is None:
+            raise credentials_exception
+        token_data = TokenData(username=username)
+    except JWTError:
+        raise credentials_exception
+    user = db.query(User).filter(User.email == token_data.username).first()
+    if user is None:
+        raise credentials_exception
+    return user
+async def get_current_active_user(current_user: User = Depends(get_current_user)):
+    if not current_user.is_active:
+        raise HTTPException(status_code=400, detail="Inactive user")
+    return current_user
+async def get_current_admin_user(current_user: User = Depends(get_current_active_user)):
+    if not current_user.is_admin:
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="The user doesn't have enough privileges"
+        )
+    return current_user

backend/app/services/data_loader.py ADDED Viewed

	@@ -0,0 +1,216 @@

+import pandas as pd
+import json
+import os
+import unicodedata
+import re
+from pathlib import Path
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Build paths inside the project like this: BASE_DIR / 'subdir'.
+# BASE_DIR is .../backend/app
+BASE_DIR = Path(__file__).resolve().parent.parent
+DATA_DIR = BASE_DIR / "data"
+STATS_PATH = DATA_DIR / "stats.json"
+COMPANIES_PATH = DATA_DIR / "companies.json"
+# CSV Paths from Environment Variables
+PATH_AHLYA_CSV = os.getenv("PATH_AHLYA_CSV", "Ahlya_Total_Feuil1.csv")
+PATH_JORT_CSV = os.getenv("PATH_JORT_CSV", "app/scripts/Base-JORT.csv")
+PATH_RNE_CSV = os.getenv("PATH_RNE_CSV", "trovit_charikat_ahliya_all.csv")
+def normalize_company_name(name):
+    """
+    Standard logic for the join key:
+    - Uppercase
+    - Stripped
+    - Without accents
+    - Without double spaces
+    """
+    if not isinstance(name, str):
+        return ""
+    # Uppercase
+    name = name.upper()
+    # Normalize unicode to decompose accents
+    name = unicodedata.normalize('NFKD', name)
+    # Remove accents/diacritics
+    name = "".join([c for c in name if not unicodedata.combining(c)])
+    # Remove double spaces and strip
+    name = re.sub(r'\s+', ' ', name).strip()
+    return name
+class DataLoader:
+    _instance = None
+    companies_df = None
+    stats_data = None
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super(DataLoader, cls).__new__(cls)
+        return cls._instance
+    def load(self):
+        print(f"Loading data from {DATA_DIR} and CSVs...")
+        try:
+            # 1. Load Stats
+            if not STATS_PATH.exists():
+                 print(f"Warning: Stats file not found at {STATS_PATH}")
+                 self.stats_data = {}
+            else:
+                with open(STATS_PATH, 'r', encoding='utf-8') as f:
+                    self.stats_data = json.load(f)
+            # 2. Load Base Companies (Ahlya)
+            ahlya_path = Path(PATH_AHLYA_CSV)
+            if not ahlya_path.is_absolute():
+                ahlya_path = BASE_DIR.parent / ahlya_path
+            if ahlya_path.exists():
+                print(f"Loading Ahlya CSV from {ahlya_path}")
+                self.companies_df = pd.read_csv(ahlya_path)
+                # Normalize columns
+                self.companies_df.rename(columns={
+                    "اسم_الشركة": "name",
+                    "الولاية": "wilaya",
+                    "المعتمدية": "delegation",
+                    "المنطقة": "locality",
+                    "النوع": "type",
+                    "الموضوع / النشاط": "activity_raw",
+                    "activité_normalisée": "activity_normalized",
+                    "activité_groupe": "activity_group"
+                }, inplace=True)
+                # Ensure critical columns exist even if CSV is missing them
+                if 'activity_normalized' not in self.companies_df.columns:
+                    self.companies_df['activity_normalized'] = self.companies_df.get('activity_raw', pd.Series(dtype=str))
+                if 'activity_group' not in self.companies_df.columns:
+                    # Derive from activity_normalized if available
+                    self.companies_df['activity_group'] = self.companies_df.get('activity_normalized', pd.Series(dtype=str))
+                print(f"  -> Loaded {len(self.companies_df)} companies. Columns: {list(self.companies_df.columns)}")
+            elif COMPANIES_PATH.exists():
+                print(f"Loading Ahlya from companies.json as fallback")
+                with open(COMPANIES_PATH, 'r', encoding='utf-8') as f:
+                    companies = json.load(f)
+                    self.companies_df = pd.DataFrame(companies)
+                    self.companies_df.rename(columns={
+                        "اسم_الشركة": "name",
+                        "الولاية": "wilaya",
+                        "المعتمدية": "delegation",
+                        "المنطقة": "locality",
+                        "النوع": "type",
+                        "الموضوع / النشاط": "activity_raw",
+                        "activité_normalisée": "activity_normalized",
+                        "activité_groupe": "activity_group"
+                    }, inplace=True)
+                    if 'activity_normalized' not in self.companies_df.columns:
+                        self.companies_df['activity_normalized'] = self.companies_df.get('activity_raw', pd.Series(dtype=str))
+                    if 'activity_group' not in self.companies_df.columns:
+                        self.companies_df['activity_group'] = self.companies_df.get('activity_normalized', pd.Series(dtype=str))
+            else:
+                print("Warning: No Ahlya data found!")
+                self.companies_df = pd.DataFrame()
+            if not self.companies_df.empty:
+                # Normalize name for join
+                self.companies_df['name_normalized'] = self.companies_df['name'].apply(normalize_company_name)
+                self.companies_df['id'] = range(1, len(self.companies_df) + 1)
+                # 3. Load JORT Data
+                jort_path = Path(PATH_JORT_CSV)
+                if not jort_path.is_absolute():
+                    jort_path = BASE_DIR.parent / jort_path
+                if jort_path.exists():
+                    print(f"Integrating JORT from {jort_path}")
+                    jort_df = pd.read_csv(jort_path)
+                    if 'Dénomination' in jort_df.columns:
+                        jort_df['name_normalized'] = jort_df['Dénomination'].apply(normalize_company_name)
+                        # Prepare subset for merge
+                        jort_subset = jort_df[['name_normalized', 'Référence JORT', 'Date Annonce', 'Capital (DT)', 'Texte Source Original']].copy()
+                        jort_subset.rename(columns={
+                            'Référence JORT': 'jort_ref',
+                            'Date Annonce': 'jort_date',
+                            'Capital (DT)': 'jort_capital',
+                            'Texte Source Original': 'jort_text'
+                        }, inplace=True)
+                        # Merge
+                        self.companies_df = pd.merge(self.companies_df, jort_subset, on='name_normalized', how='left')
+                # 4. Load RNE Data
+                rne_path = Path(PATH_RNE_CSV)
+                if not rne_path.is_absolute():
+                    rne_path = BASE_DIR.parent / rne_path
+                if rne_path.exists():
+                    print(f"Integrating RNE from {rne_path}")
+                    rne_df = pd.read_csv(rne_path)
+                    if 'name' in rne_df.columns:
+                        rne_df['name_normalized'] = rne_df['name'].apply(normalize_company_name)
+                        # Prepare subset
+                        rne_subset = rne_df[['name_normalized', 'charika_id', 'tax_id', 'rc_number', 'founding_date_iso', 'legal_form', 'address', 'detail_url', 'capital']].copy()
+                        rne_subset.rename(columns={
+                            'charika_id': 'rne_id',
+                            'tax_id': 'rne_tax_id',
+                            'rc_number': 'rne_rc_number',
+                            'founding_date_iso': 'rne_founding_date',
+                            'legal_form': 'rne_legal_form',
+                            'address': 'rne_address',
+                            'detail_url': 'rne_detail_url',
+                            'capital': 'rne_capital'
+                        }, inplace=True)
+                        # Merge
+                        self.companies_df = pd.merge(self.companies_df, rne_subset, on='name_normalized', how='left')
+                # 5. Capital Divergence Check
+                threshold = float(os.getenv("CAPITAL_DIVERGENCE_THRESHOLD", 0.05))
+                self.companies_df['capital_divergence'] = False
+                # Ensure columns exist before processing
+                if 'jort_capital' in self.companies_df.columns and 'rne_capital' in self.companies_df.columns:
+                    # Ensure capitals are numeric
+                    self.companies_df['jort_capital'] = pd.to_numeric(self.companies_df['jort_capital'], errors='coerce')
+                    self.companies_df['rne_capital'] = pd.to_numeric(self.companies_df['rne_capital'], errors='coerce')
+                    mask = (self.companies_df['jort_capital'].notna()) & (self.companies_df['rne_capital'].notna()) & (self.companies_df['jort_capital'] > 0)
+                    diff = abs(self.companies_df.loc[mask, 'jort_capital'] - self.companies_df.loc[mask, 'rne_capital']) / self.companies_df.loc[mask, 'jort_capital']
+                    self.companies_df.loc[mask, 'capital_divergence'] = diff > threshold
+                else:
+                    # Create empty columns if they dont exist to stay compliant with schema
+                    if 'jort_capital' not in self.companies_df.columns: self.companies_df['jort_capital'] = pd.NA
+                    if 'rne_capital' not in self.companies_df.columns: self.companies_df['rne_capital'] = pd.NA
+                    if 'jort_ref' not in self.companies_df.columns: self.companies_df['jort_ref'] = pd.NA
+                    if 'jort_date' not in self.companies_df.columns: self.companies_df['jort_date'] = pd.NA
+                    if 'jort_text' not in self.companies_df.columns: self.companies_df['jort_text'] = pd.NA
+                    if 'rne_id' not in self.companies_df.columns: self.companies_df['rne_id'] = pd.NA
+                    if 'rne_tax_id' not in self.companies_df.columns: self.companies_df['rne_tax_id'] = pd.NA
+                    if 'rne_rc_number' not in self.companies_df.columns: self.companies_df['rne_rc_number'] = pd.NA
+                    if 'rne_founding_date' not in self.companies_df.columns: self.companies_df['rne_founding_date'] = pd.NA
+                    if 'rne_legal_form' not in self.companies_df.columns: self.companies_df['rne_legal_form'] = pd.NA
+                    if 'rne_address' not in self.companies_df.columns: self.companies_df['rne_address'] = pd.NA
+                    if 'rne_detail_url' not in self.companies_df.columns: self.companies_df['rne_detail_url'] = pd.NA
+        except Exception as e:
+            print(f"Error loading combined data: {e}")
+            import traceback
+            traceback.print_exc()
+            self.companies_df = pd.DataFrame()
+            self.stats_data = {}
+data_loader = DataLoader()
+def load_data():
+    data_loader.load()
+def get_companies_df():
+    return data_loader.companies_df
+def get_stats_data():
+    return data_loader.stats_data

backend/app/services/llm_service.py ADDED Viewed

	@@ -0,0 +1,201 @@

+"""
+Ba7ath LLM Analysis Service
+============================
+Service d'analyse croisée des données Ahlya/JORT/RNE via Google Gemini.
+Ce module utilise l'API REST Gemini DIRECTEMENT via httpx (pas le SDK
+google-generativeai) pour forcer l'utilisation de l'endpoint v1 stable
+et éviter le routage automatique vers v1beta qui provoque des erreurs
+404 sur Render et autres plateformes cloud.
+"""
+import os
+import json
+import logging
+from datetime import datetime
+import httpx
+# Configuration du logging spécifique au module Ba7ath
+logger = logging.getLogger("ba7ath.llm")
+logger.setLevel(logging.INFO)
+# ── Constants ─────────────────────────────────────────────────────────────
+GEMINI_API_BASE = "https://generativelanguage.googleapis.com/v1beta"
+GEMINI_MODEL = "gemini-2.0-flash"
+GEMINI_ENDPOINT = f"{GEMINI_API_BASE}/models/{GEMINI_MODEL}:generateContent"
+# ── System Prompt (Expert Investigation) ──────────────────────────────────
+SYSTEM_PROMPT = """أنت خبير تدقيق محقق في مشروع 'بحث' (Ba7ath). مهمتك هي مقارنة البيانات بدقة متناهية.
+السياق القانوني:
+- "شركة أهلية" (Entreprise Citoyenne) هي كيان قانوني أُنشئ بموجب القانون عدد 20 لسنة 2022.
+- "الرائد الرسمي للجمهورية التونسية" (JORT) هو المنشور الرسمي الذي يتم فيه الإعلان عن تأسيس الشركات.
+- "السجل الوطني للمؤسسات" (RNE) هو قاعدة البيانات الإدارية الرسمية.
+- "المعرّف الجبائي" (Matricule Fiscal) هو رقم التعريف الضريبي.
+- "الولاية" (Gouvernorat) هي الوحدة الإدارية في تونس (24 ولاية).
+قواعد صارمة:
+1. لا تستنتج معلومات غير موجودة في البيانات المقدمة.
+2. إذا وجد اختلاف بين المصادر، صنفه كـ 'تضارب' (Conflict).
+3. اللغة المستخدمة في الإجابة هي العربية الرصينة (MSA).
+4. يجب أن يكون ملخص التحقيق (summary_ar) مهنيًا، مباشرًا، ومبنيًا فقط على الأدلة المقدمة.
+5. لا تضف نصوصًا تفسيرية خارج هيكل JSON المطلوب."""
+# ── Fallback response ────────────────────────────────────────────────────
+def _fallback_response(error_type: str, detail: str = "") -> dict:
+    """Génère une réponse JSON de secours en cas d'indisponibilité du LLM."""
+    return {
+        "match_score": 0,
+        "status": "Pending",
+        "findings": [],
+        "red_flags": [],
+        "summary_ar": f"تعذّر إجراء التحليل: {error_type}. {detail}".strip(),
+        "_error": error_type,
+        "_detail": detail,
+    }
+# ══════════════════════════════════════════════════════════════════════════
+# ██  LLM ANALYSIS SERVICE (Direct REST API — no SDK)
+# ══════════════════════════════════════════════════════════════════════════
+class LLMAnalysisService:
+    """
+    Service d'analyse utilisant l'API REST Gemini directement.
+    Contourne le SDK google-generativeai pour éviter le routage v1beta.
+    Configuré pour le déterminisme total (Temp=0).
+    """
+    def __init__(self):
+        self.api_key = os.getenv("GEMINI_API_KEY")
+        if not self.api_key:
+            logger.warning("⚠️ GEMINI_API_KEY not set — LLM analysis will be unavailable")
+        else:
+            logger.info(f"✅ LLMAnalysisService initialized — model: {GEMINI_MODEL} (REST API direct)")
+    @staticmethod
+    def _build_prompt(ahlya_data: dict, jort_data: dict, rne_data: dict) -> str:
+        """Construit un prompt structuré avec les trois sources de données."""
+        def fmt(data):
+            return json.dumps(data, ensure_ascii=False, indent=2) if data else "لا توجد بيانات"
+        return f"""قم بإجراء مقارنة شاملة ودقيقة بين المصادر الثلاثة التالية لهذه الشركة الأهلية التونسية.
+═══════════════════════════════════════
+المصدر الأول: بيانات أهلية (البيانات التصريحية)
+═══════════════════════════════════════
+{fmt(ahlya_data)}
+═══════════════════════════════════════
+المصدر الثاني: الرائد الرسمي (JORT)
+═══════════════════════════════════════
+{fmt(jort_data)}
+═══════════════════════════════════════
+المصدر الثالث: السجل الوطني للمؤسسات (RNE)
+═══════════════════════════════════════
+{fmt(rne_data)}
+═══════════════════════════════════════
+التعليمات:
+═══════════════════════════════════════
+1. قارن الاسم التجاري، رأس المال، والولاية.
+2. تحقق من تطابق التواريخ والمعرّف الجبائي.
+3. حدد أي تضاربات (Conflicts) أو نقاط مشبوهة.
+4. أجب بصيغة JSON فقط وفق المخطط التالي بالضبط:
+{{
+  "match_score": <عدد صحيح من 0 إلى 100>,
+  "status": "Verified" أو "Suspicious" أو "Conflict",
+  "findings": ["نقطة تطابق 1", "نقطة تطابق 2"],
+  "red_flags": ["تجاوز 1", "تجاوز 2"],
+  "summary_ar": "ملخص التحقيق هنا"
+}}"""
+    async def analyze_cross_check(self, ahlya_data: dict, jort_data: dict, rne_data: dict) -> dict:
+        """Exécute l'analyse croisée via l'API REST Gemini (v1 stable)."""
+        company_name = ahlya_data.get("name", "Unknown")
+        if not self.api_key:
+            logger.error(f"LLM analysis skipped for '{company_name}': no API key")
+            return _fallback_response("no_api_key", "GEMINI_API_KEY غير مُعَيَّن")
+        logger.info(f"🔍 Starting LLM cross-check for: {company_name}")
+        start_time = datetime.now()
+        prompt = self._build_prompt(ahlya_data, jort_data, rne_data)
+        # ── Build the REST API request body ──────────────────────────────
+        request_body = {
+            "system_instruction": {
+                "parts": [{"text": SYSTEM_PROMPT}]
+            },
+            "contents": [
+                {
+                    "parts": [{"text": prompt}]
+                }
+            ],
+            "generationConfig": {
+                "temperature": 0.0,
+                "topP": 1,
+                "topK": 1,
+                "responseMimeType": "application/json"
+            }
+        }
+        url = f"{GEMINI_ENDPOINT}?key={self.api_key}"
+        try:
+            async with httpx.AsyncClient(timeout=60.0) as client:
+                response = await client.post(
+                    url,
+                    json=request_body,
+                    headers={"Content-Type": "application/json"}
+                )
+            # ── Handle HTTP errors ───────────────────────────────────────
+            if response.status_code == 429:
+                logger.warning(f"⚠️ Rate-limit Gemini (429) for '{company_name}'")
+                return _fallback_response("rate_limited", "الخدمة مشغولة حاليًا.")
+            if response.status_code != 200:
+                error_detail = response.text[:300]
+                logger.error(f"❌ Gemini API {response.status_code} for '{company_name}': {error_detail}")
+                return _fallback_response(f"http_{response.status_code}", error_detail)
+            # ── Parse the response ───────────────────────────────────────
+            resp_json = response.json()
+            candidates = resp_json.get("candidates", [])
+            if not candidates:
+                logger.error(f"❌ No candidates in Gemini response for '{company_name}'")
+                return _fallback_response("no_candidates", "لم يتم الحصول على نتائج من النموذج.")
+            text = candidates[0].get("content", {}).get("parts", [{}])[0].get("text", "")
+            result = json.loads(text)
+            elapsed = (datetime.now() - start_time).total_seconds()
+            logger.info(
+                f"✅ Analysis complete for '{company_name}' — "
+                f"score={result.get('match_score')}, status={result.get('status')}, "
+                f"time={elapsed:.1f}s"
+            )
+            return result
+        except json.JSONDecodeError as e:
+            logger.error(f"❌ JSONDecodeError for '{company_name}': {e}")
+            return _fallback_response("json_parse_error", "تعذّر تحليل استجابة النموذج.")
+        except httpx.TimeoutException:
+            logger.error(f"❌ Timeout for '{company_name}' (60s limit)")
+            return _fallback_response("timeout", "انتهت مهلة الاتصال بالنموذج.")
+        except Exception as e:
+            logger.error(f"❌ Unexpected error for '{company_name}': {e}")
+            return _fallback_response("unexpected_error", str(e))
+# Instance unique du service
+llm_service = LLMAnalysisService()

backend/app/services/osint_links.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import urllib.parse
+import os
+from dotenv import load_dotenv
+load_dotenv()
+INTERNAL_OSINT_MODE = os.getenv("INTERNAL_OSINT_MODE", "False").lower() == "true"
+def generate_links(company_name: str, wilaya: str):
+    base_name = urllib.parse.quote(company_name)
+    links = {
+        "Google": f"https://www.google.com/search?q={base_name} {wilaya} site:tn",
+        "Facebook": f"https://www.facebook.com/search/top?q={base_name}"
+    }
+    if INTERNAL_OSINT_MODE:
+        links["RNE"] = f"https://www.registre-entreprises.tn/search?q={base_name}" # Placeholder
+        links["JORT"] = f"http://www.iort.gov.tn/search?q={base_name}" # Placeholder
+    return links
+def get_company_links(company_id: int):
+    from app.services.data_loader import get_companies_df
+    df = get_companies_df()
+    company = df[df['id'] == company_id]
+    if company.empty:
+        return {}
+    row = company.iloc[0]
+    return generate_links(row['name'], row['wilaya'])

backend/app/services/risk_engine.py ADDED Viewed

	@@ -0,0 +1,168 @@

+from app.services.data_loader import get_companies_df
+from app.models.schemas import WilayaRisk, Flag
+import numpy as np
+def generate_risk_commentary(wilaya_data: dict, risk_scores: dict) -> dict:
+    """
+    Génère des commentaires éditoriaux en arabe basés sur les scores de risque.
+    """
+    s1, s2, s3 = risk_scores['s1'], risk_scores['s2'], risk_scores['s3']
+    index = risk_scores['baath_index']
+    # Defaults
+    level = "LOW"
+    level_ar = "منخفض"
+    color = "emerald"
+    if index >= 70:
+        level = "HIGH"
+        level_ar = "مرتفع"
+        color = "red"
+    elif index >= 40:
+        level = "MEDIUM"
+        level_ar = "متوسط"
+        color = "amber"
+    comments = []
+    # S1 - Dépendance
+    if s1 > 0.6: # lowered threshold slightly to match prompt logic 0.7 or 0.6 inconsistency
+        # Prompt said > 0.7 but code example used 0.7. Let's stick to prompt code example logic if possible but use safe checks.
+        dominant_groups = [g for g, count in wilaya_data['groups'].items()
+                           if g in ['AGRI_NATUREL', 'ENVIRONNEMENT', 'ENERGIE_MINES']
+                           and count / (sum(wilaya_data['groups'].values()) or 1) > 0.3]
+        if dominant_groups:
+             comments.append(f"الولاية تعتمد بشكل كبير على الأنشطة المرتبطة بالموارد العمومية ({', '.join(dominant_groups)})")
+    # S2 - Concentration
+    if s2 > 0.7:
+        if wilaya_data['groups']:
+            top_group = max(wilaya_data['groups'].items(), key=lambda x: x[1])[0]
+            pct = (wilaya_data['groups'][top_group] / (sum(wilaya_data['groups'].values()) or 1)) * 100
+            comments.append(f"تركيز عالٍ جدا في مجموعة نشاط واحدة ({top_group}: {pct:.0f}%)")
+    elif s2 > 0.5:
+        comments.append("تركيز ملحوظ في عدد محدود من القطاعات")
+    # S3 - Gouvernance
+    if s3 > 0.5: # Prompt threshold was 0.6 in general description but 0.5 in code example for flag.
+        total_types = sum(wilaya_data['types'].values()) or 1
+        local_pct = (wilaya_data['types'].get('محلية', 0) / total_types) * 100
+        regional_pct = (wilaya_data['types'].get('جهوية', 0) / total_types) * 100
+        comments.append(f"اختلال واضح في الحوكمة: {local_pct:.0f}% محلية مقابل {regional_pct:.0f}% جهوية")
+    # Recommendations
+    recommendations = []
+    if s1 > 0.6:
+        recommendations.append("التحقق من الأراضي الدولية المُسندة (OTD)")
+        recommendations.append("البحث في صفقات التطهير والبيئة (TUNEPS)")
+    if s2 > 0.7:
+        recommendations.append("تحليل الاحتكارات القطاعية المحتملة")
+    if s3 > 0.5:
+        recommendations.append("مراجعة التوازن بين المحلي والجهوي في تركيبة مجالس الإدارة")
+    if index > 70:
+        recommendations.append("يُنصح بتحقيق صحفي معمق على هذه الولاية")
+    return {
+        "level": level,
+        "level_ar": level_ar,
+        "color": color,
+        "comment_ar": " · ".join(comments) if comments else "لا توجد إشارات خطر واضحة في البيانات الحالية",
+        "recommendations": recommendations
+    }
+def compute_baath_index_v2(wilaya_df):
+    """
+    Computes Ba7ath Index (0-100) using continuous formula:
+    INDEX = 100 * (0.4 * s1 + 0.4 * s2 + 0.2 * s3)
+    s1: Dependency on public-resource sectors (AGRI, ENV, MINES)
+    s2: Sector concentration (Max share of any group)
+    s3: Governance imbalance (abs(local - regional))
+    """
+    if wilaya_df.empty:
+        return 0.0, 0.0, 0.0, 0.0, []
+    total = len(wilaya_df)
+    flags = []
+    # --- s1: Resource Dependency ---
+    # Groups: AGRI_NATUREL, ENVIRONNEMENT, ENERGIE_MINES
+    resource_groups = ['AGRI_NATUREL', 'ENVIRONNEMENT', 'ENERGIE_MINES']
+    resource_count = wilaya_df[wilaya_df['activity_group'].isin(resource_groups)].shape[0]
+    s1 = resource_count / total if total > 0 else 0.0
+    if s1 > 0.6:
+        flags.append(Flag(code="RESOURCE_DEPENDENT", severity="high", label_ar="اعتماد كبير على الأنشطة المرتبطة بالموارد العمومية"))
+    # --- s2: Sector Concentration ---
+    # Max share of any single group
+    group_counts = wilaya_df['activity_group'].value_counts(normalize=True)
+    s2 = group_counts.max() if not group_counts.empty else 0.0
+    if s2 > 0.7:
+        flags.append(Flag(code="ULTRA_CONCENTRATION", severity="medium", label_ar="تركيز عالٍ في مجموعة نشاط واحدة"))
+    # --- s3: Governance Imbalance ---
+    # abs(% local - % regional)
+    type_counts = wilaya_df['type'].value_counts(normalize=True)
+    pct_local = type_counts.get('محلية', 0.0)
+    pct_regional = type_counts.get('جهوية', 0.0)
+    s3 = abs(pct_local - pct_regional)
+    if s3 > 0.5:
+        flags.append(Flag(code="GOVERNANCE_IMBALANCE", severity="low", label_ar="اختلال واضح بين الشركات المحلية والجهوية"))
+    # --- Final Score ---
+    # INDEX = 100 * (0.4 * s1 + 0.4 * s2 + 0.2 * s3)
+    raw_index = 100 * (0.4 * s1 + 0.4 * s2 + 0.2 * s3)
+    baath_index = round(min(raw_index, 100), 1)
+    # Return details for commentary
+    details = {
+        'groups': wilaya_df['activity_group'].value_counts().to_dict(),
+        'types': wilaya_df['type'].value_counts().to_dict()
+    }
+    return baath_index, round(s1, 2), round(s2, 2), round(s3, 2), flags, details
+def get_risk_for_wilaya(wilaya: str):
+    df = get_companies_df()
+    if df.empty:
+        return None
+    wilaya_df = df[df['wilaya'] == wilaya]
+    if wilaya_df.empty:
+        # Return neutral risk if no companies
+        return WilayaRisk(
+            wilaya=wilaya, baath_index=0, s1=0, s2=0, s3=0, flags=[],
+            level="LOW", level_ar="منخفض", color="emerald",
+            comment_ar="لا توجد بيانات كافية", recommendations=[]
+        )
+    score, s1, s2, s3, flags, details = compute_baath_index_v2(wilaya_df)
+    # Generate commentary
+    editorial = generate_risk_commentary(details, {
+        's1': s1, 's2': s2, 's3': s3, 'baath_index': score
+    })
+    return WilayaRisk(
+        wilaya=wilaya,
+        baath_index=score,
+        s1=s1,
+        s2=s2,
+        s3=s3,
+        flags=flags,
+        **editorial
+    )
+def get_all_risks():
+    df = get_companies_df()
+    if df.empty:
+        return []
+    risks = []
+    for wilaya in df['wilaya'].unique():
+        risks.append(get_risk_for_wilaya(wilaya))
+    return sorted(risks, key=lambda x: x.baath_index, reverse=True)

backend/compare_by_name_fuzzy.py ADDED Viewed

	@@ -0,0 +1,162 @@

+import pandas as pd
+from pathlib import Path
+from rapidfuzz import process, fuzz
+# ------------- CONFIG À ADAPTER --------------
+# CSV A : ta "liste politique / terrain"
+CSV_A = Path("liste_270.csv")      # ex : Google Sheet complet
+# CSV B : la liste des stés qui ont un RNE (ex. Trovit / base enrichie)
+CSV_B = Path("liste_rne_ou_trovit.csv")
+# Nom des colonnes contenant les NOMS à comparer
+# A peut être en arabe, B en français, ou l'inverse.
+# Idéalement, tu rajoutes dans chaque CSV une colonne 'name_canon'
+# (normalisée/ traduite avec Qwen) et tu mets ces noms ici.
+COL_NAME_A = "name"        # ex : "Nom société (FR)" ou "الاسم"
+COL_NAME_B = "name"        # ex : nom Trovit en arabe
+# (Optionnel) colonnes de contexte à garder pour l'analyse
+CTX_COLS_A = ["wilaya", "delegation"]   # adapte à ton fichier
+CTX_COLS_B = ["wilaya", "delegation"]   # idem
+# Seuils fuzzy
+# score >= HIGH_MATCH  -> match sûr
+# LOW_MATCH <= score < HIGH_MATCH -> match douteux (à vérifier à la main / par LLM)
+# score < LOW_MATCH -> considéré comme "non trouvé"
+HIGH_MATCH = 90
+LOW_MATCH = 70
+# Fichiers de sortie
+OUT_MATCHES = Path("matches_surs.csv")
+OUT_MAYBE   = Path("matches_douteux.csv")
+OUT_MISSING = Path("non_trouves_par_nom.csv")
+# Encodage (UTF‑8 avec BOM fonctionne bien pour arabe + Excel)
+ENC_A = "utf-8-sig"
+ENC_B = "utf-8-sig"
+# ------------- FONCTIONS ---------------------
+def normalize_name(s: str) -> str:
+    """Nettoyage léger pour comparer les noms."""
+    if pd.isna(s):
+        return ""
+    s = str(s).strip()
+    # mettre en minuscules pour la partie latine
+    s = s.lower()
+    # enlever quelques termes génériques FR/AR
+    generic_fr = [
+        "societe", "société", "ste", "sa", "sarl",
+        "société anonyme", "société à responsabilité limitée",
+    ]
+    generic_ar = [
+        "شركة", "الشركة", "الاهلية", "الأهلية", "الجهوية",
+        "المحلية", "شركة أهلية", "شركة الاهلية", "شركة الأهلية",
+    ]
+    for g in generic_fr + generic_ar:
+        s = s.replace(g, "")
+    # normaliser les espaces
+    s = " ".join(s.split())
+    return s
+def load_csv(path: Path, name_col: str, ctx_cols: list, enc: str) -> pd.DataFrame:
+    if not path.exists():
+        raise FileNotFoundError(path.resolve())
+    df = pd.read_csv(path, encoding=enc)
+    if name_col not in df.columns:
+        raise KeyError(f"Colonne '{name_col}' absente dans {path.name}.\n"
+                       f"Colonnes dispo : {list(df.columns)}")
+    df["__name_raw__"] = df[name_col]
+    df["__name_norm__"] = df[name_col].apply(normalize_name)
+    # garder nom + colonnes utiles pour l'analyse
+    keep_cols = ["__name_raw__", "__name_norm__"]
+    for c in ctx_cols:
+        if c in df.columns:
+            keep_cols.append(c)
+    return df[keep_cols].copy()
+def main():
+    # 1. Charger les deux CSV
+    df_a = load_csv(CSV_A, COL_NAME_A, CTX_COLS_A, ENC_A)
+    df_b = load_csv(CSV_B, COL_NAME_B, CTX_COLS_B, ENC_B)
+    print(f"[INFO] Lignes fichier A : {len(df_a)}")
+    print(f"[INFO] Lignes fichier B : {len(df_b)}")
+    # 2. Préparer une série de noms B pour RapidFuzz
+    names_b = df_b["__name_norm__"].tolist()
+    best_matches = []
+    for idx, row in df_a.iterrows():
+        name_a_norm = row["__name_norm__"]
+        if not name_a_norm:
+            best_matches.append({"score": 0, "b_index": None})
+            continue
+        # RapidFuzz: extractOne(label, choices, scorer=...)
+        match = process.extractOne(
+            name_a_norm,
+            names_b,
+            scorer=fuzz.token_sort_ratio,
+        )
+        if match is None:
+            best_matches.append({"score": 0, "b_index": None})
+        else:
+            label_b, score, b_idx = match
+            best_matches.append({"score": score, "b_index": b_idx})
+    # 3. Construire un DataFrame résultat
+    res = df_a.copy()
+    res["match_score"] = [m["score"] for m in best_matches]
+    res["b_index"] = [m["b_index"] for m in best_matches]
+    # joindre les infos du fichier B
+    res["name_b_raw"] = res["b_index"].apply(
+        lambda i: df_b.loc[i, "__name_raw__"] if pd.notna(i) else None
+    )
+    res["name_b_norm"] = res["b_index"].apply(
+        lambda i: df_b.loc[i, "__name_norm__"] if pd.notna(i) else None
+    )
+    # Ajout du contexte B (wilaya, delegation, etc.)
+    for c in CTX_COLS_B:
+        if c in df_b.columns:
+            col_b = f"{c}_b"
+            res[col_b] = res["b_index"].apply(
+                lambda i: df_b.loc[i, c] if pd.notna(i) else None
+            )
+    # 4. Séparer en 3 catégories
+    matches_surs = res[res["match_score"] >= HIGH_MATCH].copy()
+    matches_douteux = res[
+        (res["match_score"] >= LOW_MATCH) & (res["match_score"] < HIGH_MATCH)
+    ].copy()
+    non_trouves = res[res["match_score"] < LOW_MATCH].copy()
+    print(f"[INFO] Matchs sûrs       (score >= {HIGH_MATCH}) : {len(matches_surs)}")
+    print(f"[INFO] Matchs douteux    ({LOW_MATCH} <= score < {HIGH_MATCH}) : {len(matches_douteux)}")
+    print(f"[INFO] Non trouvés       (score < {LOW_MATCH}) : {len(non_trouves)}")
+    # 5. Export CSV
+    matches_surs.to_csv(OUT_MATCHES, index=False, encoding="utf-8-sig")
+    matches_douteux.to_csv(OUT_MAYBE, index=False, encoding="utf-8-sig")
+    non_trouves.to_csv(OUT_MISSING, index=False, encoding="utf-8-sig")
+    print("[OK] Export :")
+    print("   ", OUT_MATCHES.resolve())
+    print("   ", OUT_MAYBE.resolve())
+    print("   ", OUT_MISSING.resolve())
+if __name__ == "__main__":
+    main()

backend/compare_data.py ADDED Viewed

	@@ -0,0 +1,90 @@

+# compare_data.py
+import sqlite3
+import pandas as pd
+from pathlib import Path
+# ----------------- CONFIG -----------------
+# Base SQLite des 141 sociétés enrichies
+DB_PATH = Path("ba7ath_enriched.db")
+# CSV complet des ~270 sociétés Trovit
+CSV_PATH = Path("trovit_charikat_ahliya_all.csv")
+# Table + colonne JSON dans SQLite
+ENRICHED_TABLE = "enriched_companies"
+DATA_COLUMN = "data"
+# ----------------- CODE -----------------
+def main():
+    # 1. Charger les 270 sociétés depuis le CSV
+    if not CSV_PATH.exists():
+        raise FileNotFoundError(f"CSV introuvable : {CSV_PATH.resolve()}")
+    df_270 = pd.read_csv(CSV_PATH)
+    print(f"[INFO] Sociétés dans le CSV Trovit : {len(df_270)}")
+    if "tax_id" not in df_270.columns:
+        raise KeyError(
+            "La colonne 'tax_id' est absente du CSV. "
+            "Vérifie l'en-tête de trovit_charikat_ahliya_all.csv."
+        )
+    # 2. Ouvrir la base SQLite
+    if not DB_PATH.exists():
+        raise FileNotFoundError(f"Base SQLite introuvable : {DB_PATH.resolve()}")
+    conn = sqlite3.connect(DB_PATH)
+    cur = conn.cursor()
+    # 3. Vérifier que la table existe bien
+    cur.execute(
+        "SELECT name FROM sqlite_master WHERE type='table' AND name=?",
+        (ENRICHED_TABLE,),
+    )
+    row = cur.fetchone()
+    if row is None:
+        tables = [
+            r[0]
+            for r in cur.execute(
+                "SELECT name FROM sqlite_master WHERE type='table'"
+            ).fetchall()
+        ]
+        conn.close()
+        raise RuntimeError(
+            f"La table '{ENRICHED_TABLE}' n'existe pas dans la base.\n"
+            f"Tables disponibles : {tables}"
+        )
+    # 4. Extraire les tax_id déjà présents dans data.rne
+    query = f"""
+    SELECT DISTINCT
+        json_extract({DATA_COLUMN}, '$.rne.tax_id') AS tax_id
+    FROM {ENRICHED_TABLE}
+    WHERE json_extract({DATA_COLUMN}, '$.rne.tax_id') IS NOT NULL
+    """
+    df_rne = pd.read_sql(query, conn)
+    conn.close()
+    print(f"[INFO] Sociétés avec tax_id dans la base : {len(df_rne)}")
+    # 5. Comparer par tax_id (270 vs 141)
+    merged = df_270.merge(df_rne, on="tax_id", how="left", indicator=True)
+    # 6. Garder celles absentes de la base
+    missing = merged[merged["_merge"] == "left_only"].drop(columns=["_merge"])
+    print(
+        "[INFO] Sociétés présentes dans le CSV mais absentes de la base :",
+        len(missing),
+    )
+    # 7. Sauvegarder le résultat
+    out_path = Path("trovit_missing_not_in_rne.csv")
+    missing.to_csv(out_path, index=False, encoding="utf-8-sig")
+    print(f"[OK] Fichier généré : {out_path.resolve()}")
+if __name__ == "__main__":
+    main()

backend/compare_names_with_qwen.py ADDED Viewed

	@@ -0,0 +1,185 @@

+# compare_names_with_qwen.py
+import csv
+import json
+import time
+import os
+from pathlib import Path
+import requests
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# ---------------- CONFIG ----------------
+OLLAMA_URL = os.getenv("OLLAMA_URL", "http://127.0.0.1:11434/api/chat")
+MODEL_NAME = os.getenv("MODEL_NAME", "qwen2.5:latest")
+CSV_AR = Path(os.getenv("PATH_AHLYA_CSV", "Ahlya_Total_Feuil1.csv"))
+CSV_FR = Path(os.getenv("PATH_RNE_CSV", "trovit_charikat_ahliya_all.csv"))
+OUT_MATCHES = Path("matches_qwen.csv")
+OUT_NOT_IN_TROVIT = Path("not_in_trovit_qwen.csv")
+SLEEP_SECONDS = 0.05   # petite pause entre appels
+# ----------------------------------------
+def load_names_ar(path: Path):
+    """Charge la 1re colonne (noms en arabe)."""
+    if not path.exists():
+        raise FileNotFoundError(path.resolve())
+    rows = []
+    with path.open("r", encoding="utf-8-sig", newline="") as f:
+        reader = csv.reader(f)
+        header = next(reader, None)
+        for line in reader:
+            if not line:
+                continue
+            name_ar = (line[0] or "").strip()
+            if not name_ar:
+                continue
+            rows.append({"name_ar": name_ar})
+    print(f"[INFO] Noms AR chargés : {len(rows)}")
+    return rows
+def load_names_fr(path: Path):
+    """Charge la 3e colonne (noms en français)."""
+    if not path.exists():
+        raise FileNotFoundError(path.resolve())
+    names_fr = []
+    with path.open("r", encoding="utf-8-sig", newline="") as f:
+        reader = csv.reader(f)
+        header = next(reader, None)
+        for line in reader:
+            if len(line) < 3:
+                continue
+            name_fr = (line[2] or "").strip()
+            if not name_fr:
+                continue
+            names_fr.append(name_fr)
+    print(f"[INFO] Noms FR chargés (Trovit) : {len(names_fr)}")
+    return names_fr
+def build_fr_list_for_prompt(names_fr):
+    """Construit une liste numérotée lisible pour le prompt."""
+    lines = []
+    for i, name in enumerate(names_fr, start=1):
+        lines.append(f"{i}. {name}")
+    return "\n".join(lines)
+def ask_qwen_match(name_ar, fr_list_text):
+    """Demande à Qwen si le nom AR correspond à un/plusieurs noms FR."""
+    system_prompt = (
+        "Tu es un assistant qui fait du rapprochement de noms de sociétés "
+        "entre l'arabe et le français.\n"
+        "Règles :\n"
+        "- Tu dois dire si le nom arabe désigne la même société qu'un ou plusieurs "
+        "noms français dans la liste.\n"
+        "- Prends en compte le sens, pas la traduction littérale exacte.\n"
+        "- Si tu n'es PAS sûr, considère qu'il n'y a PAS de correspondance.\n"
+        "- Réponds STRICTEMENT en JSON valide, sans texte autour.\n"
+        '  Format : {"match": true/false, "indexes": [liste_entiers], "reason": "texte court"}.\n'
+        "- Les indexes commencent à 1 et correspondent à la numérotation de la liste française."
+    )
+    user_prompt = (
+        "Nom de la société en arabe :\n"
+        f"{name_ar}\n\n"
+        "Liste des noms de sociétés en français :\n"
+        f"{fr_list_text}\n\n"
+        "Question :\n"
+        "- Le nom arabe correspond-il à une ou plusieurs sociétés françaises dans cette liste ?\n"
+        "- Si oui, donne les indexes exacts dans le champ \"indexes\".\n"
+        "- Si non, renvoie match=false et indexes=[]."
+    )
+    payload = {
+        "model": MODEL_NAME,
+        "messages": [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": user_prompt},
+        ],
+        "stream": False,
+    }
+    resp = requests.post(OLLAMA_URL, json=payload, timeout=300)
+    resp.raise_for_status()
+    data = resp.json()
+    content = data.get("message", {}).get("content", "").strip()
+    if not content and "response" in data:
+        content = data["response"].strip()
+    try:
+        result = json.loads(content)
+    except json.JSONDecodeError:
+        raise ValueError(f"Réponse non JSON de Qwen : {content}")
+    match = bool(result.get("match", False))
+    indexes = result.get("indexes", []) or []
+    if not isinstance(indexes, list):
+        indexes = []
+    reason = str(result.get("reason", "")).strip()
+    return match, indexes, reason
+def main():
+    rows_ar = load_names_ar(CSV_AR)
+    names_fr = load_names_fr(CSV_FR)
+    fr_list_text = build_fr_list_for_prompt(names_fr)
+    matches = []
+    not_found = []
+    for i, row in enumerate(rows_ar, start=1):
+        name_ar = row["name_ar"]
+        print(f"[{i}/{len(rows_ar)}] Qwen compare : {name_ar}")
+        try:
+            match, indexes, reason = ask_qwen_match(name_ar, fr_list_text)
+        except Exception as e:
+            print(f"  [ERREUR] {e}")
+            match, indexes, reason = False, [], f"error: {e}"
+        if match and indexes:
+            matched_names = [names_fr[idx - 1] for idx in indexes if 1 <= idx <= len(names_fr)]
+            matches.append({
+                "name_ar": name_ar,
+                "matched_indexes": ";".join(str(x) for x in indexes),
+                "matched_names_fr": " | ".join(matched_names),
+                "reason": reason,
+            })
+        else:
+            not_found.append({
+                "name_ar": name_ar,
+                "reason": reason,
+            })
+        time.sleep(SLEEP_SECONDS)
+    # Écriture des résultats
+    with OUT_MATCHES.open("w", encoding="utf-8-sig", newline="") as f:
+        fieldnames = ["name_ar", "matched_indexes", "matched_names_fr", "reason"]
+        writer = csv.DictWriter(f, fieldnames=fieldnames)
+        writer.writeheader()
+        writer.writerows(matches)
+    with OUT_NOT_IN_TROVIT.open("w", encoding="utf-8-sig", newline="") as f:
+        fieldnames = ["name_ar", "reason"]
+        writer = csv.DictWriter(f, fieldnames=fieldnames)
+        writer.writeheader()
+        writer.writerows(not_found)
+    print(f"[OK] Matchs écrits dans : {OUT_MATCHES.resolve()}")
+    print(f"[OK] Non présents (selon Qwen) : {OUT_NOT_IN_TROVIT.resolve()}")
+    print(f"[INFO] Total matchs : {len(matches)}, non trouvés : {len(not_found)}")
+if __name__ == "__main__":
+    main()

backend/create_admin.py ADDED Viewed

	@@ -0,0 +1,44 @@

+from sqlalchemy.orm import Session
+from app.database import SessionLocal, engine, Base
+from app.models.user_models import User
+from app.services.auth_service import get_password_hash
+import sys
+# Ensure tables exist
+Base.metadata.create_all(bind=engine)
+def create_admin_user(email, password, full_name):
+    db: Session = SessionLocal()
+    try:
+        user = db.query(User).filter(User.email == email).first()
+        if user:
+            print(f"User {email} already exists.")
+            return
+        hashed_password = get_password_hash(password)
+        new_user = User(
+            email=email,
+            hashed_password=hashed_password,
+            full_name=full_name,
+            is_active=True,
+            is_admin=True
+        )
+        db.add(new_user)
+        db.commit()
+        db.refresh(new_user)
+        print(f"Admin user {email} created successfully.")
+    except Exception as e:
+        print(f"Error creating user: {e}")
+    finally:
+        db.close()
+if __name__ == "__main__":
+    if len(sys.argv) < 3:
+        print("Usage: python create_admin.py <email> <password> [full_name]")
+        sys.exit(1)
+    email = sys.argv[1]
+    password = sys.argv[2]
+    full_name = sys.argv[3] if len(sys.argv) > 3 else "Admin User"
+    create_admin_user(email, password, full_name)

backend/enrich_not_in_trovit.py ADDED Viewed

	@@ -0,0 +1,71 @@

+# enrich_not_in_trovit.py
+import pandas as pd
+from pathlib import Path
+# Fichiers d'entrée
+CSV_NOT_IN = Path("not_in_trovit_qwen.csv")
+CSV_AHLYA  = Path("Ahlya_Total_Feuil1.csv")
+# Fichier de sortie
+CSV_OUT = Path("not_in_trovit_enriched.csv")
+def main():
+    if not CSV_NOT_IN.exists():
+        raise FileNotFoundError(CSV_NOT_IN.resolve())
+    if not CSV_AHLYA.exists():
+        raise FileNotFoundError(CSV_AHLYA.resolve())
+    # 1. Charger les fichiers
+    df_not = pd.read_csv(CSV_NOT_IN, encoding="utf-8-sig")
+    df_ah  = pd.read_csv(CSV_AHLYA, encoding="utf-8-sig")
+    # 2. Vérifier les colonnes attendues
+    if "name_ar" not in df_not.columns:
+        raise KeyError(f"'name_ar' manquant dans {CSV_NOT_IN.name} ; colonnes = {list(df_not.columns)}")
+    col_nom_ahlya = "اسم_الشركة"
+    if col_nom_ahlya not in df_ah.columns:
+        raise KeyError(f"'{col_nom_ahlya}' manquant dans {CSV_AHLYA.name} ; colonnes = {list(df_ah.columns)}")
+    # 3. Normalisation légère des noms des deux côtés
+    def norm(s):
+        if pd.isna(s):
+            return ""
+        return str(s).strip()
+    df_not["__key__"] = df_not["name_ar"].apply(norm)
+    df_ah["__key__"]  = df_ah[col_nom_ahlya].apply(norm)
+    # 4. Colonnes à ramener depuis Ahlya
+    cols_details = [
+        col_nom_ahlya,
+        "الموضوع / النشاط",
+        "العنوان",
+        "الولاية",
+        "المعتمدية",
+        "المنطقة",
+        "النوع",
+    ]
+    # On garde seulement les colonnes utiles + clé
+    keep_ah = [c for c in cols_details if c in df_ah.columns] + ["__key__"]
+    df_ah_small = df_ah[keep_ah].drop_duplicates("__key__")
+    # 5. Merge left : toutes les lignes de not_in, détails pris dans Ahlya
+    df_merged = df_not.merge(
+        df_ah_small,
+        on="__key__",
+        how="left",
+        suffixes=("", "_ahlya"),
+    )
+    # 6. Nettoyage : on peut retirer la clé technique si tu veux
+    df_merged.drop(columns=["__key__"], inplace=True)
+    # 7. Sauvegarde
+    df_merged.to_csv(CSV_OUT, index=False, encoding="utf-8-sig")
+    print(f"[OK] Fichier enrichi écrit dans : {CSV_OUT.resolve()}")
+    print(f"Lignes : {len(df_merged)}")
+if __name__ == "__main__":
+    main()

backend/inspect_db.py ADDED Viewed

	@@ -0,0 +1,46 @@

+# inspect_db.py
+import sqlite3
+from pathlib import Path
+# Essaie d'abord avec ce nom, puis adapte (microsite.db, database.sqlite, instance/app.db, etc.)
+DB_PATH = Path("ba7ath_enriched.db")
+def main():
+    print("=== Inspection de la base SQLite ===")
+    print("Chemin supposé :", DB_PATH.resolve())
+    if not DB_PATH.exists():
+        print("[ERREUR] Fichier introuvable :", DB_PATH.resolve())
+        return
+    print("Taille fichier (octets) :", DB_PATH.stat().st_size)
+    conn = sqlite3.connect(DB_PATH)
+    print("\n=== Bases attachées ===")
+    for row in conn.execute("PRAGMA database_list;"):
+        # schema, name, file
+        print(row)
+    print("\n=== Tables SQLite ===")
+    tables = [
+        r[0]
+        for r in conn.execute(
+            "SELECT name FROM sqlite_master WHERE type='table'"
+        ).fetchall()
+    ]
+    if not tables:
+        print("(aucune table utilisateur)")
+    for name in tables:
+        print("-", name)
+    print("\n=== Structure des tables ===")
+    for name in tables:
+        print(f"\nTable: {name}")
+        for col in conn.execute(f"PRAGMA table_info({name})"):
+            print(" ", col)
+    conn.close()
+if __name__ == "__main__":
+    main()

backend/readme.md ADDED Viewed

	@@ -0,0 +1,12 @@

+---
+title: Ba7ath OSINT API
+emoji: 🛡️
+colorFrom: green
+colorTo: blue
+sdk: gradio
+app_file: app.py
+pinned: false
+---
+# Ba7ath OSINT API
+Backend pour l'investigation et l'analyse de risque.

backend/test_auth_flow.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import requests
+import sys
+BASE_URL = "http://localhost:8000/api/v1"
+EMAIL = "admin@ba7ath.com"
+PASSWORD = "admin123"
+def test_auth():
+    print(f"Testing auth on {BASE_URL}...")
+    # 1. Login
+    print("\n1. Logging in...")
+    try:
+        response = requests.post(f"{BASE_URL}/auth/login", data={"username": EMAIL, "password": PASSWORD})
+        if response.status_code != 200:
+            print(f"Login failed: {response.status_code} - {response.text}")
+            return
+        token_data = response.json()
+        access_token = token_data.get("access_token")
+        print(f"Login successful! Token: {access_token[:20]}...")
+    except Exception as e:
+        print(f"Login failed: {e}")
+        return
+    # 2. Access protected endpoint (Auth Me)
+    print("\n2. Accessing /auth/me (Protected)...")
+    headers = {"Authorization": f"Bearer {access_token}"}
+    response = requests.get(f"{BASE_URL}/auth/me", headers=headers)
+    if response.status_code == 200:
+        print(f"Success! User: {response.json().get('email')}")
+    else:
+        print(f"Failed: {response.status_code} - {response.text}")
+    # 3. Access protected endpoint (Stats)
+    print("\n3. Accessing /stats/national (Protected)...")
+    response = requests.get(f"{BASE_URL}/stats/national", headers=headers)
+    if response.status_code == 200:
+        print("Success! Stats retrieved.")
+    else:
+        print(f"Failed: {response.status_code} - {response.text}")
+    # 4. Access without token (Expected Failure)
+    print("\n4. Accessing /stats/national WITHOUT token...")
+    response = requests.get(f"{BASE_URL}/stats/national")
+    if response.status_code == 401:
+        print("Success! Request rejected as expected (401 Unauthorized).")
+    else:
+        print(f"Failed! Expected 401, got {response.status_code}")
+if __name__ == "__main__":
+    test_auth()

docs/API_Reference.md ADDED Viewed

	@@ -0,0 +1,103 @@

+# 📖 API Reference
+Tous les endpoints sont préfixés par `/api/v1`.
+**Base URL Production**: `https://ahlya-production.up.railway.app/api/v1`
+## 🔐 Authentification
+La plupart des routes nécessitent un token JWT valide.
+| Header | Valeur |
+| :--- | :--- |
+| `Authorization` | `Bearer <access_token>` |
+---
+## 🔑 Auth Endpoints
+### Login
+`POST /auth/login`
+Authentification via formulaire standard OAuth2.
+- **Request Body** (`application/x-www-form-urlencoded`):
+  - `username`: Email de l'utilisateur.
+  - `password`: Mot de passe.
+- **Success (200)**:
+  ```json
+  {
+    "access_token": "eyJhbG...",
+    "token_type": "bearer"
+  }
+  ```
+---
+## 📊 Statistiques & Risques
+### Statistiques Nationales
+`GET /stats/national` (PROTÉGÉ)
+Retourne les métriques agrégées pour l'ensemble du pays.
+- **Exemple de réponse**:
+  ```json
+  {
+    "total_companies": 31000,
+    "top_wilayas": ["Tunis", "Sousse", "Sfax"],
+    "risk_index": 4.2
+  }
+  ```
+### Risques par Wilaya
+`GET /risk/wilayas` (PROTÉGÉ)
+Liste les scores de risque pour toutes les wilayas.
+---
+## 📂 Enrichment (Core Data)
+### Liste des sociétés enrichies
+`GET /enrichment/list` (PROTÉGÉ)
+- **Paramètres**:
+  - `page` (int): Par défaut 1.
+  - `per_page` (int): Par défaut 12.
+  - `search` (str): Recherche par nom.
+  - `wilaya` (str): Filtre par wilaya.
+  - `has_red_flags` (bool): Filtre les cas critiques.
+- **Response**:
+  ```json
+  {
+    "companies": [...],
+    "total": 150,
+    "total_pages": 13
+  }
+  ```
+### Profil complet
+`GET /enrichment/profile/{company_id}` (PROTÉGÉ)
+Retourne l'intégralité des données (RNE, JORT, Marchés) et les Red Flags calculés.
+---
+## 🛠️ User Management (Admin Only)
+### Liste des utilisateurs
+`GET /auth/users` (PROTECTED ADMIN)
+Retourne la liste des utilisateurs du système.
+### Création d'utilisateur
+`POST /auth/users` (PROTECTED ADMIN)
+- **Body**: `{ "email": "...", "password": "...", "is_admin": true }`
+---
+## 📝 Exemple Curl
+```bash
+curl -X GET "https://ahlya-production.up.railway.app/api/v1/enrichment/list" \
+     -H "Authorization: Bearer <votre_token>"
+```

docs/Authentication_Guide.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# 🔐 Authentication Guide
+Le système utilise une authentification basée sur les **JSON Web Tokens (JWT)** pour sécuriser les données sensibles d'investigation.
+## 🔄 Flux d'Authentification
+```mermaid
+sequenceDiagram
+    participant User as Utilisateur
+    participant FE as Frontend (React)
+    participant BE as Backend (FastAPI)
+    participant DB as SQLite
+    User->>FE: Saisie Email/Password
+    FE->>BE: POST /api/v1/auth/login
+    BE->>DB: Vérifier User / Argon2 Hash
+    DB-->>BE: User Valide
+    BE-->>FE: Retourne JWT Access Token
+    FE->>FE: Stockage dans localStorage
+    FE->>BE: GET /api/v1/enriched (Header Bearer)
+    BE->>BE: Validation Signature JWT
+    BE-->>FE: Retourne Données
+```
+## 🛠️ Configuration Backend
+Le secret et l'algorithme sont définis dans les variables d'environnement.
+- **Variables Clés**:
+  - `SECRET_KEY`: Utilisée pour signer les tokens (indispensable en prod).
+  - `ALGORITHM`: Généralement `HS256`.
+  - `ACCESS_TOKEN_EXPIRE_MINUTES`: Durée de validité.
+## 💻 Implémentation Frontend (`AuthContext`)
+La gestion de l'état `user` et `token` est centralisée dans `src/context/AuthContext.jsx`.
+### Usage dans les services :
+Pour appeler une API protégée, utilisez le helper `authenticatedFetch` dans `src/services/api.js` qui injecte le header `Authorization`.
+```javascript
+const getAuthHeaders = () => {
+    const token = localStorage.getItem('token');
+    return token ? { 'Authorization': `Bearer ${token}` } : {};
+};
+```
+## 🛡️ Rôles et Permissions
+Le système distingue deux niveaux :
+1. **Utilisateur Actif**: Accès aux données d'investigation.
+2. **Administrateur** (`is_admin=true`): Accès au dashboard admin et gestion des utilisateurs.
+## 👤 Création du Premier Admin
+Si la base de données est vide, utilisez le script utilitaire :
+```bash
+python create_admin.py
+```
+**Admin par défaut**:
+- **Email**: `ba77ath@proton.me`
+- **Password**: `Apostroph03`

docs/Contributing_Guide.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# 🤝 Contributing Guide
+Merci de contribuer à la plateforme **Ba7ath** ! Ce document définit les standards et le workflow pour maintenir la qualité du projet.
+## 🌿 Workflow Git
+1. **Branching**: Créez une branche descriptive pour chaque feature ou bugfix.
+   - `feat/nom-de-la-feature`
+   - `fix/nom-du-bug`
+   - `docs/nom-de-la-doc`
+2. **Pull Requests**:
+   - Décrivez clairement les changements effectués.
+   - Liez la PR à une issue si elle existe.
+   - Assurez-vous que le build passe avant de demander une review.
+## 📝 Standards de Code
+### Backend (Python)
+- Respectez la **PEP 8**.
+- Utilisez des **type hints** pour toutes les fonctions FastAPI.
+- Commentez les logiques OSINT complexes.
+### Frontend (React)
+- Utilisez des **Functional Components** avec hooks.
+- **Tailwind CSS** : Évitez les styles inline ou le CSS personnalisé quand c'est possible.
+- Nommez vos composants en `PascalCase`.
+### Architecture
+- Ne jamais coder en dur (hardcode) de secrets ou d'URLs de production.
+- Utilisez toujours `src/services/api.js` pour les appels backend.
+## 💬 Messages de Commit
+Suivez la convention **Conventional Commits** :
+- `feat: ajouter la comparaison par wilaya`
+- `fix: corriger le hachage des mots de passe`
+- `docs: mettre à jour l'architecture frontend`
+---
+## 🛡️ Sécurité
+Si vous découvrez une faille de sécurité, ne créez pas d'issue publique. Contactez directement l'équipe à `ba77ath@proton.me`.

docs/Database_Schema.md ADDED Viewed

	@@ -0,0 +1,81 @@

+# 🗄️ Database Schema
+Le projet utilise **SQLite** pour sa simplicité de déploiement et ses performances suffisantes pour un outil d'investigation spécialisé.
+**Fichier**: `backend/ba7ath_enriched.db`
+## 📊 Diagramme E-R
+```mermaid
+erDiagram
+    USER ||--o{ INVESTIGATION_NOTE : creates
+    ENRICHED_COMPANY ||--o{ INVESTIGATION_NOTE : has
+    WATCH_COMPANY ||--o{ ENRICHED_COMPANY : becomes
+    USER {
+        int id PK
+        string email UK
+        string hashed_password
+        string full_name
+        boolean is_active
+        boolean is_admin
+    }
+    ENRICHED_COMPANY {
+        string company_id PK
+        string company_name
+        string wilaya
+        json data
+        json metrics
+        string enriched_by
+        datetime enriched_at
+    }
+    INVESTIGATION_NOTE {
+        string id PK
+        string company_id FK
+        string title
+        text content
+        datetime created_at
+        string created_by
+        json tags
+    }
+    WATCH_COMPANY {
+        string id PK
+        string name_ar
+        string wilaya
+        string etat_enregistrement
+        datetime detected_trovit_at
+    }
+```
+---
+## 📑 Tables Détail
+### 1. `users`
+Stocke les identifiants et les niveaux de privilèges.
+- `hashed_password`: Hachage sécurisé (Argon2).
+### 2. `enriched_companies`
+C'est le cœur de la plateforme. Les colonnes `data` et `metrics` sont de type JSON.
+- **data**: Contient les données brutes extraites (RNE, JORT, Marchés).
+- **metrics**: Contient les scores de risque et la liste des Red Flags détectés.
+### 3. `investigation_notes`
+Permet aux journalistes d'ajouter des preuves textuelles ou des commentaires sur une société spécifique.
+### 4. `watch_companies`
+Liste des sociétés identifiées comme "Ahlia" mais non encore trouvées dans les registres officiels (RNE).
+---
+## 📁 Migration et Initialisation
+La base de données est automatiquement créée et les tables initialisées lors du démarrage du backend :
+```python
+# backend/app/main.py
+@app.on_event("startup")
+async def startup_event():
+    Base.metadata.create_all(bind=engine)
+```

docs/Deployment_Guide.md ADDED Viewed

	@@ -0,0 +1,41 @@

+# 🚀 Deployment Guide
+Le projet est conçu pour un déploiement Cloud moderne et automatisé.
+## 📁 Backend : Railway
+Le backend FastAPI est hébergé sur **Railway**.
+### Configuration
+1. **Repository**: Liez votre repository GitHub à Railway.
+2. **Volumes** (CRITIQUE) :
+   - SQLite nécessite un stockage persistant.
+   - Créez un Volume Railway nommé `data` monté sur `/app/data`.
+   - Modifiez votre `DATABASE_URL` pour pointer vers `/app/data/ba7ath_enriched.db`.
+3. **Variables d'environnement** :
+   - `SECRET_KEY`: Une chaîne aléatoire longue.
+   - `ALGORITHM`: `HS256`.
+   - `CORS_ORIGINS`: Liste des domaines autorisés (ex: `https://ahlya-investigations.vercel.app`).
+---
+## 🎨 Frontend : Vercel
+Le frontend React est hébergé sur **Vercel**.
+### Configuration
+1. **Framework Preset**: Vite.
+2. **Build Command**: `npm run build`.
+3. **Output Directory**: `dist`. (Ou `build` selon votre config `vite.config.js`).
+4. **Environment Variables**:
+   - `VITE_API_URL`: `https://votre-app-backend.up.railway.app/api/v1`.
+---
+## 🔄 Pipeline CI/CD
+Toute modification poussée sur la branche `main` déclenche automatiquement :
+1. Un redeploy sur Railway (Backend).
+2. Un redeploy sur Vercel (Frontend).
+> [!WARNING]
+> Assurez-vous de migrer les données CSV vers la base SQLite SQL avant le déploiement final pour ne pas avoir une base vide en production.

docs/Development_Guide.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# 🛠️ Development Guide
+Ce guide détaille comment mettre en place l'environnement de développement local pour contribuer au projet Ba7ath.
+## 📋 Prérequis
+- **Python 3.10+**
+- **Node.js 18+**
+- **Git**
+---
+## 🐍 Backend Setup (FastAPI)
+1. **Cloner le repository** :
+   ```bash
+   git clone <repo_url>
+   cd Ba7ath_scripts/Scrap_Ahlya/microsite
+   ```
+2. **Créer l'environnement virtuel** :
+   ```bash
+   cd backend
+   python -m venv venv
+   source venv/bin/activate  # Windows: venv\Scripts\activate
+   ```
+3. **Installer les dépendances** :
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Variables d'environnement** :
+   Créez un fichier `.env` dans `backend/` :
+   ```env
+   SECRET_KEY=votre_cle_secrete_ultra_securisee
+   ALGORITHM=HS256
+   ```
+5. **Lancer le serveur** :
+   ```bash
+   uvicorn app.main:app --reload --port 8000
+   ```
+---
+## ⚛️ Frontend Setup (React)
+1. **Installer les dépendances** :
+   ```bash
+   cd microsite
+   npm install
+   ```
+2. **Variables d'environnement** :
+   Créez un fichier `.env` dans `microsite/` :
+   ```env
+   VITE_API_URL=http://localhost:8000/api/v1
+   ```
+3. **Lancer le serveur de dev** :
+   ```bash
+   npm run dev
+   ```
+   L'application sera accessible sur `http://localhost:5173`.
+---
+## 🚀 Scripts Utilitaires
+- **`backend/create_admin.py`** : Recrée l'utilisateur administrateur par défaut.
+- **`start_all.bat`** (Windows) : Script pour lancer simultanément le backend et le frontend en développement.
+## 🧪 Tests Rapides
+Pour vérifier que l'API répond correctement après installation :
+```bash
+curl http://localhost:8000/
+# Réponse attendue: {"message": "Ba7ath OSINT API is running"}
+```

docs/Frontend_Architecture.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# 💻 Frontend Architecture
+L'application est une **Single Page Application (SPA)** moderne construite avec **React 18** et **Vite**.
+## 🏗️ Structure des Dossiers
+```text
+microsite/
+├── public/          # Assets statiques
+├── src/
+│   ├── components/  # Composants réutilisables (Map, Widgets, Modals)
+│   ├── context/     # AuthContext pour la gestion globale
+│   ├── pages/       # Vues principales (Home, Admin, Enriched)
+│   ├── services/    # Appels API et configuration
+│   ├── App.jsx      # Router et layout global
+│   └── index.css    # Tailwind et styles globaux
+└── vite.config.js   # Configuration de build
+```
+## 🚦 Routing (`App.jsx`)
+Le routage est géré par `react-router-dom`. Les routes sensibles sont protégées.
+```jsx
+<Routes>
+  <Route path="/login" element={<LoginPage />} />
+  <Route element={<ProtectedRoute />}>
+    <Route path="/" element={<HomeDashboard />} />
+    <Route path="/enriched" element={<EnrichedCompaniesPage />} />
+    <Route path="/admin" element={<AdminDashboard />} adminOnly={true} />
+  </Route>
+</Routes>
+```
+## 🔐 Gestion de l'État : `AuthContext`
+Un contexte React global gère :
+- L'utilisateur actuel (`user`).
+- La persistance du token (`localStorage`).
+- Les méthodes `login` / `logout`.
+## 📦 Composants Clés
+### Visualisation
+- **`RegionPanel`**: Affiche les statistiques détaillées d'une wilaya sélectionnée sur la carte.
+- **`SubScoresRadar`**: Graphique radar (Chart.js) montrant les différents axes de risque.
+- **`StatisticalComparisonGrid`**: Grille de comparaison entre wilayas.
+### Investigation
+- **`InvestigationWizard`**: Formulaire pas-à-pas pour guider l'analyse.
+- **`ManualEnrichmentWizard`**: Interface de saisie pour ajouter de nouvelles données d'enrichissement.
+## 🎨 Design System
+- **Tailwind CSS**: Utilisé pour tout le styling.
+- **Inter / Noto Sans Arabic**: Polices utilisées pour une lisibilité maximale bilingue.
+- **Glassmorphism**: Appliqué sur les modals et les overlays pour un aspect premium.
+---
+## 🔌 Intégration API
+Tous les appels passent par `src/services/api.js` qui utilise un wrapper `authenticatedFetch` pour garantir que le token est envoyé si disponible.

docs/OSINT_Methodology.md ADDED Viewed

	@@ -0,0 +1,42 @@

+# 🕵️ OSINT Methodology
+La plateforme Ba7ath ne se contente pas d'afficher des données ; elle les transforme en **renseignements actionnables** grâce à une méthodologie d'enrichissement rigoureuse.
+## 📡 Sources de Données
+1. **RNE (Registre National des Entreprises)** : Source officielle pour le statut légal, le capital social, l'adresse et les actionnaires.
+2. **JORT (Journal Officiel de la République Tunisienne)** : Extraction des annonces de création, de modification de capital et de liquidation.
+3. **Marchés Publics (TUNEPS / Observatoire)** : Données sur les contrats remportés par les sociétés citoyennes.
+4. **Scraping Web (Trovit / Web)** : Identification précoce des sociétés non encore officiellement enregistrées.
+---
+## 🚩 Calcul des Red Flags (Signaux d'Alerte)
+Le système applique des algorithmes automatiques pour détecter des patterns suspects :
+### 1. Ratio Financier Critiques
+- **Logique**: Si `Valeur totale des contrats / Capital social > 10`.
+- **Interprétation**: Une société avec un capital très faible remportant des marchés massifs peut indiquer une structure "écran" ou un manque de capacité réelle.
+- **Badge**: `FINANCIAL_RATIO` (Severity: HIGH).
+### 2. Méthodes de Passation
+- **Logique**: Si `Marchés de gré à gré (Direct) > 50%` du total des contrats.
+- **Interprétation**: Une dépendance excessive aux contrats non-concurrentiels est un indicateur de risque de favoritisme.
+- **Badge**: `PROCUREMENT_METHOD` (Severity: HIGH).
+### 3. Gouvernance
+- **Logique**: Détection d'actionnaire unique ou de liens croisés entre sociétés Ahlia d'une même région.
+- **Badge**: `GOVERNANCE` (Severity: MEDIUM).
+---
+## 🧪 Processus d'Enrichissement Manuel
+Le **ManualEnrichmentWizard** permet aux journalistes d'ajouter une couche d'analyse humaine :
+1. **Saisie des données RNE** : Validation des numéros de registre.
+2. **Ajout de contrats** : Saisie manuelle si TUNEPS n'est pas à jour.
+3. **Calcul Auto** : Le système recalcule instantanément les scores dès que les données sont enregistrées.
+## 📈 Indice de Risque Régional
+Le score d'une wilaya est la moyenne pondérée des scores de risque des sociétés Ahlia qui y sont basées. Cela permet de cartographier les "zones grises" au niveau national.

docs/README.md ADDED Viewed

	@@ -0,0 +1,104 @@

+# 📂 Ba7ath / Ahlya Investigations
+> **Ba7ath** (البحث - La Recherche) est une plateforme OSINT de datajournalisme dédiée à l'investigation sur les sociétés citoyennes (Ahlia - أهلية) en Tunisie.
+[![Status: Functional](https://img.shields.io/badge/Status-Functional-success.svg)](#)
+[![Stack: FastAPI + React](https://img.shields.io/badge/Stack-FastAPI%20%2B%20React-blue.svg)](#)
+## 📌 Mission
+Ce projet permet aux journalistes et analystes d'explorer, de cartographier et d'enrichir les données sur les sociétés Ahlia tunisiennes, en identifiant les anomalies financières, les structures de gouvernance suspectes et les signaux de risque OSINT.
+---
+## 🏗️ Architecture du Système
+```mermaid
+graph TD
+    subgraph Frontend [React SPA - Vercel]
+        UI[Interface Utilisateur]
+        State[AuthContext & State]
+        Map[Leaflet Map]
+        Charts[Chart.js / Radar]
+    end
+    subgraph Backend [FastAPI - Railway]
+        API[V1 API Endpoints]
+        Auth[JWT JWT Service]
+        Logic[Business Logic / Red Flags]
+    end
+    subgraph Data [Storage]
+        DB[(SQLite - ba7ath_enriched.db)]
+        Vol[Railway Persistent Volume]
+    end
+    UI --> State
+    State --> API
+    API --> Auth
+    API --> Logic
+    Logic --> DB
+    DB -.-> Vol
+```
+---
+## 🛠️ Stack Technique
+### Backend
+- **Framework**: FastAPI (Python)
+- **Base de données**: SQLite avec SQLAlchemy ORM.
+- **Authentification**: JWT Bearer avec hachage Argon2.
+- **Service OSINT**: Logique personnalisée de détection de "Red Flags".
+### Frontend
+- **Framework**: React 18 (Vite).
+- **Styling**: Tailwind CSS pour une interface premium et responsive.
+- **Cartographie**: React-Leaflet pour la visualisation géographique des risques.
+- **Visualisation**: Chart.js pour les graphiques radar et de comparaison.
+---
+## 🚀 Quick Start (Local)
+### 1. Backend
+```bash
+cd backend
+python -m venv venv
+source venv/bin/activate  # venv\Scripts\activate sur Windows
+pip install -r requirements.txt
+python create_admin.py    # Initialiser l'admin par défaut
+uvicorn app.main:app --reload
+```
+### 2. Frontend
+```bash
+cd microsite
+npm install
+npm run dev
+```
+---
+## 📖 Documentation Détaillée
+1. [**API Reference**](API_Reference.md) : Détail des endpoints et formats.
+2. [**Authentication Guide**](Authentication_Guide.md) : Flux JWT et gestion admin.
+3. [**Frontend Architecture**](Frontend_Architecture.md) : Structure des composants et hooks.
+4. [**Database Schema**](Database_Schema.md) : Modèles SQLAlchemy et colonnes enrichies.
+5. [**Deployment Guide**](Deployment_Guide.md) : Procédures Railway/Vercel.
+6. [**OSINT Methodology**](OSINT_Methodology.md) : Calcul des risques et sources.
+7. [**Troubleshooting**](Troubleshooting.md) : Problèmes connus et solutions.
+8. [**Development Guide**](Development_Guide.md) : Workflow de contribution.
+---
+## 🕵️ Méthodologie OSINT
+La plateforme agrège des données provenant du **RNE** (Registre National des Entreprises), du **JORT** (Journal Officiel) et des données de marchés publics pour générer des scores de risque basés sur :
+- Le ratio Capital / Valeur des contrats.
+- La fréquence des marchés de gré à gré (بالتراضي).
+- La structure de gouvernance (Actionnaire unique, etc.).
+---
+## ⚖️ Licence
+Projet interne - Tous droits réservés.

docs/Troubleshooting.md ADDED Viewed

	@@ -0,0 +1,49 @@

+# 🔍 Troubleshooting Guide
+Ce guide recense les erreurs courantes rencontrées lors du développement ou du déploiement de la plateforme Ba7ath.
+## 1. Erreurs d'Authentification
+### Symptôme : "401 Unauthorized" ou "403 Forbidden"
+- **Cause 1**: Le token JWT a expiré.
+- **Solution**: Se déconnecter et se reconnecter.
+- **Cause 2**: Le frontend n'envoie pas le header `Authorization`.
+- **Diagnostic**: Vérifiez dans l'onglet Network de votre navigateur si le header `Authorization: Bearer <token>` est présent.
+- **Fix**: Assurez-vous que l'appel API utilise `authenticatedFetch`.
+### Symptôme : Erreur de signature du token après redémarrage
+- **Cause**: La `SECRET_KEY` n'est pas fixe et change à chaque redémarrage du serveur.
+- **Fix**: Définir une `SECRET_KEY` statique dans les variables d'environnement.
+---
+## 2. Erreurs de Données (API 404)
+### Symptôme : Les données enriched sont inaccessibles
+- **Diagnostic**: L'URL appelée est incorrecte (ex: `/enrichment/list` au lieu de `/api/v1/enrichment/list`).
+- **Fix**: Centraliser `API_BASE_URL` dans `config.js` et s'assurer qu'il inclut `/api/v1`.
+### Symptôme : Les sociétés disparaissent au redéploiement Railway
+- **Cause**: La base SQLite n'est pas sur un volume persistant.
+- **Fix**: Monter un Volume Railway et pointer le chemin de la DB vers ce volume (`/data/ba7ath_enriched.db`).
+---
+## 3. Erreurs de Build (Frontend)
+### Symptôme : `vite:html-inline-proxy` error
+- **Cause**: Présence de blocs `<style>` inline dans `index.html` (bug spécifique à certains environnements Windows).
+- **Fix**: Déplacer les styles vers `index.css` et configurer les polices dans `tailwind.config.js`.
+---
+## 🛠️ Diagnostics Utiles
+**Logs Backend** :
+```bash
+# Sur Railway
+railway logs
+```
+**Debugger React** :
+Utilisez les **React DevTools** pour vérifier si `AuthContext` possède bien l'état `user` après le login.

index.html ADDED Viewed

	@@ -0,0 +1,34 @@

+<!DOCTYPE html>
+<html lang="ar" dir="rtl">
+<head>
+  <meta charset="utf-8" />
+  <link rel="icon" href="/favicon.ico" />
+  <meta name="viewport" content="width=device-width, initial-scale=1" />
+  <meta name="theme-color" content="#10B981" />
+  <meta name="description" content="لوحة تفاعلية لقراءة بيانات الشركات الأهلية في تونس حسب الولاية والنشاط." />
+  <link rel="apple-touch-icon" href="/logo192.png" />
+  <!-- خط عربي (اختياري) -->
+  <link rel="preconnect" href="https://fonts.googleapis.com" />
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+  <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+Arabic:wght@300;400;600;700&display=swap"
+    rel="stylesheet" />
+  <!-- ملف manifest لتطبيق الويب -->
+  <link rel="manifest" href="/manifest.json" />
+  <title>الشركات الأهلية في تونس</title>
+</head>
+<body>
+  <noscript>يجب تفعيل جافاسكريبت لتشغيل هذا التطبيق.</noscript>
+  <div id="root"></div>
+  <script type="module" src="/src/index.jsx"></script>
+</body>
+</html>

package-lock.json ADDED Viewed

The diff for this file is too large to render. See raw diff

package.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "name": "microsite",
+  "version": "0.1.0",
+  "private": true,
+  "dependencies": {
+    "@testing-library/dom": "^10.4.1",
+    "@testing-library/jest-dom": "^6.9.1",
+    "@testing-library/react": "^16.3.2",
+    "@testing-library/user-event": "^13.5.0",
+    "chart.js": "^4.5.1",
+    "framer-motion": "^12.34.3",
+    "leaflet": "^1.9.4",
+    "lucide-react": "^0.563.0",
+    "react": "^19.2.4",
+    "react-chartjs-2": "^5.3.1",
+    "react-dom": "^19.2.4",
+    "react-leaflet": "^5.0.0",
+    "react-router-dom": "^7.13.0",
+    "recharts": "^3.7.0"
+  },
+  "scripts": {
+    "dev": "vite",
+    "start": "vite",
+    "build": "vite build",
+    "preview": "vite preview",
+    "test": "react-scripts test",
+    "eject": "react-scripts eject"
+  },
+  "eslintConfig": {
+    "extends": [
+      "react-app",
+      "react-app/jest"
+    ]
+  },
+  "browserslist": {
+    "production": [
+      ">0.2%",
+      "not dead",
+      "not op_mini all"
+    ],
+    "development": [
+      "last 1 chrome version",
+      "last 1 firefox version",
+      "last 1 safari version"
+    ]
+  },
+  "devDependencies": {
+    "@vitejs/plugin-react": "^5.1.3",
+    "autoprefixer": "^10.4.24",
+    "postcss": "^8.5.6",
+    "tailwindcss": "^3.4.19",
+    "vite": "^7.3.1"
+  }
+}

postcss.config.js ADDED Viewed

	@@ -0,0 +1,6 @@

+module.exports = {
+  plugins: {
+    tailwindcss: {},
+    autoprefixer: {},
+  },
+};

project_tree.py ADDED Viewed

	@@ -0,0 +1,16 @@

+import os
+def list_files(startpath):
+    output = []
+    for root, dirs, files in os.walk(startpath):
+        level = root.replace(startpath, '').count(os.sep)
+        indent = ' ' * 4 * (level)
+        output.append('{}{}/'.format(indent, os.path.basename(root)))
+        subindent = ' ' * 4 * (level + 1)
+        for f in files:
+            if not f.startswith("."):
+                output.append('{}{}'.format(subindent, f))
+    return "\n".join(output)
+print(list_files('.'))