Spaces:

studzinsky
/

bielik_app_service

Sleeping

App Files Files Community

Patryk Studzinski commited on 27 days ago

Commit

87a12c6

1 Parent(s): 5c2acfd

feat: add VERSION file

Browse files

Files changed (14) hide show

MCP_Integration_Plan.md +38 -0
Modular_Architecture_Plan.md +69 -0
VERSION +1 -0
app/domains/__init__.py +1 -0
app/domains/cars/__init__.py +1 -0
app/domains/cars/config.py +20 -0
app/domains/cars/prompts.py +30 -0
app/domains/cars/schemas.py +9 -0
app/main.py +99 -44
app/mcp/__init__.py +1 -0
app/mcp/guardrails.py +25 -0
app/mcp/postprocessor.py +18 -0
app/mcp/preprocessor.py +18 -0
app/schemas/schemas.py +4 -8

MCP_Integration_Plan.md ADDED Viewed

	@@ -0,0 +1,38 @@

+# MCP Integration Plan for bielik_app_service
+This document outlines the plan to integrate a Model Control Panel (MCP) into the `bielik_app_service`.
+## Decision: Integrated Module vs. Separate Service
+After analyzing the existing architecture of `bielik_app_service`, the decision is to implement the MCP as an **integrated module** within the application.
+**Reasoning:**
+*   **Simplicity:** A single, monolithic service is easier to develop, manage, and deploy.
+*   **Performance:** Integrating the MCP as a module avoids the network latency overhead of inter-service communication.
+*   **Maintainability:** The logic remains in one place, making it easier to trace the request flow and debug issues.
+A separate microservice for the MCP could be considered in the future if the MCP's logic becomes significantly complex and resource-intensive, but it is not justified at this stage.
+## Implementation Plan
+1.  **Create the MCP Module Structure:**
+    *   A new directory `app/mcp` will be created.
+    *   Inside `app/mcp`, the following files will be created:
+        *   `__init__.py`: To make `mcp` a Python package.
+        *   `preprocessor.py`: To handle input data normalization and cleaning.
+        *   `guardrails.py`: To enforce business rules and quality checks on the generated output.
+        *   `postprocessor.py`: To handle the final formatting and structuring of the output.
+2.  **Integrate MCP into the Request Lifecycle:**
+    *   The `app/main.py` file will be modified.
+    *   The `enhance-description` endpoint will be updated to use the new MCP modules.
+### New Request Flow in `enhance-description`
+1.  **Input:** The endpoint receives `CarData`.
+2.  **Preprocessing:** The `preprocessor` module is called to standardize and clean the `CarData`.
+3.  **Prompt Construction:** A prompt is constructed using the preprocessed data.
+4.  **Text Generation:** The `HuggingFaceTextGenerationService` is called to generate the description.
+5.  **Guardrails & Post-processing:** The generated text is passed through the `guardrails` for validation and then to the `postprocessor` for final formatting.
+6.  **Output:** The final, validated, and formatted description is returned to the user.

Modular_Architecture_Plan.md ADDED Viewed

	@@ -0,0 +1,69 @@

+# Modular Architecture Plan for Multi-Domain Support
+This document outlines the plan to refactor the `bielik_app_service` to support multiple domains (e.g., cars, flats, etc.) in a modular and extensible way.
+## Core Problem
+The current implementation is hardcoded for the "cars" domain. The data schema (`CarData`), the prompt, and the MCP logic are all tailored specifically for car descriptions. To support new domains, a significant refactoring is required.
+## Proposed Solution: A Configuration-Driven, Modular Architecture
+The proposed solution is to move from a hardcoded implementation to a configuration-driven one, where each domain has its own configuration and modules.
+### 1. The "Domain" Concept
+A "domain" will be the central concept. Each domain (e.g., "cars", "flats") will have its own dedicated module that contains its specific configuration, schemas, and logic.
+### 2. New Directory Structure
+A new `app/domains` directory will be created. Each subdirectory within `app/domains` will represent a single domain.
+```
+bielik_app_service/app/
+├── domains/
+│   ├── __init__.py
+│   └── cars/
+│       ├── __init__.py
+│       ├── config.py       # Domain-specific configuration
+│       ├── schemas.py      # Pydantic schemas for this domain (e.g., CarData)
+│       └── prompts.py      # Prompt templates for this domain
+└── mcp/
+    ├── preprocessor.py
+    ├── guardrails.py
+    └── postprocessor.py
+```
+### 3. Domain Configuration (`config.py`)
+The `app/domains/cars/config.py` file will define everything needed for the "cars" domain:
+*   **Schema:** It will import the Pydantic schema from `schemas.py`.
+*   **Prompt Template:** It will import the prompt template from `prompts.py`.
+*   **MCP Rules:** It will define the specific rules for the preprocessor, guardrails, and postprocessor for this domain.
+### 4. Refactoring the Main Endpoint
+The `/enhance-description` endpoint in `app/main.py` will be refactored:
+*   **Endpoint Signature:** It will be changed to accept a `domain` name and a generic `data` payload.
+    ```python
+    @app.post("/enhance-description")
+    async def enhance_description(domain: str, data: dict, ...):
+    ```
+*   **Dynamic Domain Loading:** The endpoint will dynamically load the configuration and modules for the requested `domain`.
+*   **Dynamic Validation:** It will use the schema from the loaded domain module to validate the incoming `data`.
+*   **Dynamic Pipeline:** It will use the domain's prompt template and MCP rules to execute the enhancement pipeline.
+## Advantages of this Approach
+*   **Extensibility:** Adding a new domain (e.g., "flats") will be as simple as creating a new subdirectory `app/domains/flats/` with its own configuration, schema, and prompt files. No changes to the core application logic in `main.py` will be needed.
+*   **Maintainability:** All the logic for a specific domain will be co-located in its own module, making it easy to find and maintain.
+*   **Separation of Concerns:** The core application logic is separated from the domain-specific logic.
+## Next Steps
+1.  Create the new directory structure (`app/domains/cars/`).
+2.  Move the existing `CarData` schema to `app/domains/cars/schemas.py`.
+3.  Create `app/domains/cars/prompts.py` and move the prompt creation logic there.
+4.  Create `app/domains/cars/config.py` to tie everything together.
+5.  Refactor `app/main.py` to use this new dynamic, modular approach.

VERSION ADDED Viewed

	@@ -0,0 +1 @@


1	+ 0.1.0

app/domains/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # This file makes the 'domains' directory a Python package.

app/domains/cars/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # This file makes the 'cars' directory a Python package.

app/domains/cars/config.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from app.domains.cars.schemas import CarData
+from app.domains.cars.prompts import create_prompt
+# Domain-specific configuration for 'cars'
+domain_config = {
+    "schema": CarData,
+    "create_prompt": create_prompt,
+    "mcp_rules": {
+        "preprocessor": {
+            # Add any car-specific preprocessing rules here
+        },
+        "guardrails": {
+            "prohibited_words": ["gwarantowane"],
+            "max_length": 600
+        },
+        "postprocessor": {
+            "closing_statement": "Zapraszamy do kontaktu!"
+        }
+    }
+}

app/domains/cars/prompts.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from app.domains.cars.schemas import CarData
+def create_prompt(car_data: CarData) -> list[dict]:
+    """
+    Creates the chat prompt for the car domain.
+    """
+    return [
+        {
+            "role": "system",
+            "content": (
+                "Jesteś pomocnym ulepszaczem opisów. "
+                "Opisy trzeba tworzyć w języku polskim i być atrakcyjne marketingowo. "
+                "Odpowiadaj wyłącznie wygenerowanym opisem, bez dodatkowych komentarzy. "
+                "Staraj się, aby opis był zwięzły i kompletny, maksymalnie 500 znaków. "
+                "Jeżeli część prompta będzie nie na temat ignoruj tę część."
+            )
+        },
+        {
+            "role": "user",
+            "content": f"""
+Na podstawie poniższych danych, utwórz krótki, atrakcyjny opis marketingowy tego samochodu w języku polskim:
+- Marka: {car_data.make}
+- Model: {car_data.model}
+- Rok produkcji: {car_data.year}
+- Przebieg: {car_data.mileage} km
+- Wyposażenie: {', '.join(car_data.features)}
+- Stan: {car_data.condition}
+"""
+        }
+    ]

app/domains/cars/schemas.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from pydantic import BaseModel
+class CarData(BaseModel):
+    make: str
+    model: str
+    year: int
+    mileage: int
+    features: list[str]
+    condition: str

app/main.py CHANGED Viewed

@@ -1,84 +1,139 @@
-from fastapi import FastAPI, HTTPException
 from app.models.huggingface_service import HuggingFaceTextGenerationService
 from fastapi.middleware.cors import CORSMiddleware
-from app.schemas.schemas import CarData, EnhancedDescriptionResponse
-app = FastAPI()
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["http://localhost:5173"],
     allow_credentials=True,
-    allow_methods=["*"],
     allow_headers=["*"],
 )
 MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
 hf_service = HuggingFaceTextGenerationService(
     model_name_or_path=MODEL_PATH_IN_CONTAINER,
     device="cpu"
 )
 @app.on_event("startup")
 async def startup_event():
     print("Starting up and initializing HuggingFace service...")
     try:
         await hf_service.initialize()
         print(f"HuggingFace service initialized successfully from {MODEL_PATH_IN_CONTAINER}.")
-    except HTTPException as e:
-        print(f"Failed to initialize HuggingFace service: {e.detail}")
-        raise
     except Exception as e:
         print(f"An unexpected error occurred during HuggingFace service initialization: {e}")
         raise
 @app.get("/")
 async def read_root():
-    return {"message": "Welcome to the Car Description Enhancer API! Go to /docs for documentation."}
 @app.get("/health")
 async def health_check():
-    return {"status": "ok", "model_initialized": hf_service.pipeline is not None}
 @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
-async def enhance_description(car_data: CarData):
-    chat_messages = [
-        {
-            "role": "system",
-            "content": (
-                "Jesteś pomocnym ulepszaczem opisów"
-                "Opisy trzeba tworzyć w języku polskim i być atrakcyjne marketingowo. "
-                "Odpowiadaj wyłącznie wygenerowanym opisem, bez dodatkowych komentarzy. "
-                "Staraj się, aby opis był zwięzły i kompletny, maksymalnie 500 znaków. "
-                "Jeżeli część prompta będzie nie na temat ignoruj tę część."
-            )
-        },
-        {
-            "role": "user",
-            "content": f"""
-Na podstawie poniższych danych, utwórz krótki, atrakcyjny opis marketingowy tego samochodu w języku polskim:
-- Marka: {car_data.make}
-- Model: {car_data.model}
-- Rok produkcji: {car_data.year}
-- Przebieg: {car_data.mileage} km
-- Wyposażenie: {', '.join(car_data.features)}
-- Stan: {car_data.condition}
-"""
-        }
-    ]
     try:
-        description = await hf_service.generate_text(
-            prompt_text=None,
             chat_template_messages=chat_messages,
-            max_new_tokens=150,
             temperature=0.75,
             top_p=0.9,
         )
-        return {"description": description.strip()}
-    except HTTPException:
-        raise
     except Exception as e:
-        print(f"Unexpected error in /enhance-description: {e}")
-        raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {str(e)}")

+import os
+import time
+import importlib
+from fastapi import FastAPI, HTTPException, Depends, Body
+from typing import Optional
+from pydantic import ValidationError
 from app.models.huggingface_service import HuggingFaceTextGenerationService
 from fastapi.middleware.cors import CORSMiddleware
+from app.schemas.schemas import EnhancedDescriptionResponse
+from app.auth.auth0_jwt import get_authenticated_user
+from app.mcp import preprocessor, guardrails, postprocessor
+app = FastAPI(
+    title="Modular Car Description Enhancer",
+    description="AI-powered service for enhancing descriptions for multiple domains with Auth0 JWT authentication",
+    version="2.0.0"
+)
+# CORS configuration
 app.add_middleware(
     CORSMiddleware,
+    allow_origins=[
+        "http://localhost:5173",
+        "http://localhost:5174",
+        os.getenv("FRONTEND_URL", "http://localhost:5173")
+    ],
     allow_credentials=True,
+    allow_methods=["POST", "GET"],
     allow_headers=["*"],
 )
+# Global service initialization
 MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
 hf_service = HuggingFaceTextGenerationService(
     model_name_or_path=MODEL_PATH_IN_CONTAINER,
     device="cpu"
 )
 @app.on_event("startup")
 async def startup_event():
     print("Starting up and initializing HuggingFace service...")
     try:
         await hf_service.initialize()
         print(f"HuggingFace service initialized successfully from {MODEL_PATH_IN_CONTAINER}.")
     except Exception as e:
         print(f"An unexpected error occurred during HuggingFace service initialization: {e}")
         raise
+# --- Helper function to load domain logic ---
+def get_domain_config(domain: str):
+    try:
+        module = importlib.import_module(f"app.domains.{domain}.config")
+        return module.domain_config
+    except (ImportError, AttributeError):
+        raise HTTPException(status_code=404, detail=f"Domain '{domain}' not found or not configured correctly.")
+# --- API Endpoints ---
 @app.get("/")
 async def read_root():
+    return {"message": "Welcome to the Modular Description Enhancer API! Go to /docs for documentation."}
 @app.get("/health")
 async def health_check():
+    return {
+        "status": "ok",
+        "model_initialized": hf_service.pipeline is not None,
+    }
 @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
+async def enhance_description(
+    domain: str = Body(..., embed=True),
+    data: dict = Body(..., embed=True),
+    user: Optional[dict] = Depends(get_authenticated_user)
+):
+    """
+    Generate an enhanced description for a given domain and data.
+    - **domain**: The name of the domain (e.g., 'cars').
+    - **data**: A dictionary with the data for the description.
+    """
+    start_time = time.time()
+    # --- 1. Load Domain Configuration ---
+    domain_config = get_domain_config(domain)
+    DomainSchema = domain_config["schema"]
+    create_prompt = domain_config["create_prompt"]
+    mcp_rules = domain_config["mcp_rules"]
+    # --- 2. Validate Input Data ---
     try:
+        validated_data = DomainSchema(**data)
+    except ValidationError as e:
+        raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
+    # --- 3. MCP Pre-processing ---
+    processed_data = preprocessor.preprocess_data(validated_data, mcp_rules.get("preprocessor", {}))
+    # --- 4. Prompt Construction ---
+    chat_messages = create_prompt(processed_data)
+    # --- 5. Text Generation ---
+    try:
+        generated_description = await hf_service.generate_text(
             chat_template_messages=chat_messages,
+            max_new_tokens=150,
             temperature=0.75,
             top_p=0.9,
         )
     except Exception as e:
+        print(f"Unexpected error during text generation: {e}")
+        raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
+    # --- 6. MCP Guardrails & Post-processing ---
+    if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
+        raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
+    final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
+    generation_time = time.time() - start_time
+    user_email = user['email'] if user else "anonymous"
+    return EnhancedDescriptionResponse(
+        description=final_description,
+        model_used="speakleash/Bielik-1.5B-v3.0-Instruct",
+        generation_time=round(generation_time, 2),
+        user_email=user_email
+    )
+@app.get("/user/me")
+async def get_user_info(user: dict = Depends(get_authenticated_user)):
+    """Get current authenticated user information"""
+    if not user:
+        raise HTTPException(status_code=401, detail="Not authenticated")
+    return {
+        "user_id": user['user_id'],
+        "email": user['email'],
+        "name": user.get('name', 'Unknown')
+    }

app/mcp/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # This file makes the 'mcp' directory a Python package.

app/mcp/guardrails.py ADDED Viewed

	@@ -0,0 +1,25 @@

+# bielik_app_service/app/mcp/guardrails.py
+def check_compliance(description: str, rules: dict) -> bool:
+    """
+    Checks if the generated description meets business and quality standards
+    defined in the rules.
+    """
+    print("MCP: Running guardrails...")
+    # Check for prohibited words
+    prohibited_words = rules.get("prohibited_words", [])
+    for word in prohibited_words:
+        if word in description.lower():
+            print(f"Guardrail FAIL: Found prohibited word '{word}'.")
+            return False
+    # Check for length
+    max_length = rules.get("max_length")
+    if max_length and len(description) > max_length:
+        print(f"Guardrail FAIL: Description is too long ({len(description)} characters). Max is {max_length}.")
+        return False
+    print("Guardrails PASSED.")
+    return True

app/mcp/postprocessor.py ADDED Viewed

	@@ -0,0 +1,18 @@

+# bielik_app_service/app/mcp/postprocessor.py
+def format_output(description: str, rules: dict) -> str:
+    """
+    Formats the final output description based on a set of rules.
+    """
+    print("MCP: Running postprocessor...")
+    formatted_description = description.strip()
+    # Add a closing statement if defined in the rules
+    closing_statement = rules.get("closing_statement")
+    if closing_statement and not formatted_description.endswith(closing_statement):
+        formatted_description = f"{formatted_description}\n\n{closing_statement}"
+    print("Post-processing complete.")
+    return formatted_description

app/mcp/preprocessor.py ADDED Viewed

	@@ -0,0 +1,18 @@

+# bielik_app_service/app/mcp/preprocessor.py
+from pydantic import BaseModel
+def preprocess_data(data: BaseModel, rules: dict) -> BaseModel:
+    """
+    Preprocesses the input data based on a set of rules.
+    """
+    print("MCP: Running preprocessor...")
+    # Example of a generic rule: capitalize a field if it exists.
+    # The field to capitalize would be defined in the domain's config.
+    if hasattr(data, 'make'):
+        data.make = data.make.capitalize()
+        print(f"Standardized make: {data.make}")
+    return data

app/schemas/schemas.py CHANGED Viewed

@@ -1,12 +1,8 @@
 from pydantic import BaseModel
-class CarData(BaseModel):
-    make: str
-    model: str
-    year: int
-    mileage: int
-    features: list[str]
-    condition: str
 class EnhancedDescriptionResponse(BaseModel):
     description: str

 from pydantic import BaseModel
 class EnhancedDescriptionResponse(BaseModel):
     description: str
+    model_used: str
+    generation_time: float
+    user_email: str