Spaces:

studzinsky
/

bielik_app_service

Sleeping

App Files Files Community

Patryk Studzinski commited on 27 days ago

Commit

3297dba

1 Parent(s): 87a12c6

cleanup after split to separate mcp service

Browse files

Files changed (8) hide show

.gitignore +4 -1
MCP_Integration_Plan.md +0 -38
Modular_Architecture_Plan.md +0 -69
app/main.py +36 -14
app/mcp/__init__.py +0 -1
app/mcp/guardrails.py +0 -25
app/mcp/postprocessor.py +0 -18
app/mcp/preprocessor.py +0 -18

.gitignore CHANGED Viewed

@@ -49,4 +49,7 @@ build/
 # System files
 .DS_Store
-Thumbs.db

 # System files
 .DS_Store
+Thumbs.db
+# Gemini Plans
+gemini_plans/

MCP_Integration_Plan.md DELETED Viewed

@@ -1,38 +0,0 @@
-# MCP Integration Plan for bielik_app_service
-This document outlines the plan to integrate a Model Control Panel (MCP) into the `bielik_app_service`.
-## Decision: Integrated Module vs. Separate Service
-After analyzing the existing architecture of `bielik_app_service`, the decision is to implement the MCP as an **integrated module** within the application.
-**Reasoning:**
-*   **Simplicity:** A single, monolithic service is easier to develop, manage, and deploy.
-*   **Performance:** Integrating the MCP as a module avoids the network latency overhead of inter-service communication.
-*   **Maintainability:** The logic remains in one place, making it easier to trace the request flow and debug issues.
-A separate microservice for the MCP could be considered in the future if the MCP's logic becomes significantly complex and resource-intensive, but it is not justified at this stage.
-## Implementation Plan
-1.  **Create the MCP Module Structure:**
-    *   A new directory `app/mcp` will be created.
-    *   Inside `app/mcp`, the following files will be created:
-        *   `__init__.py`: To make `mcp` a Python package.
-        *   `preprocessor.py`: To handle input data normalization and cleaning.
-        *   `guardrails.py`: To enforce business rules and quality checks on the generated output.
-        *   `postprocessor.py`: To handle the final formatting and structuring of the output.
-2.  **Integrate MCP into the Request Lifecycle:**
-    *   The `app/main.py` file will be modified.
-    *   The `enhance-description` endpoint will be updated to use the new MCP modules.
-### New Request Flow in `enhance-description`
-1.  **Input:** The endpoint receives `CarData`.
-2.  **Preprocessing:** The `preprocessor` module is called to standardize and clean the `CarData`.
-3.  **Prompt Construction:** A prompt is constructed using the preprocessed data.
-4.  **Text Generation:** The `HuggingFaceTextGenerationService` is called to generate the description.
-5.  **Guardrails & Post-processing:** The generated text is passed through the `guardrails` for validation and then to the `postprocessor` for final formatting.
-6.  **Output:** The final, validated, and formatted description is returned to the user.

Modular_Architecture_Plan.md DELETED Viewed

@@ -1,69 +0,0 @@
-# Modular Architecture Plan for Multi-Domain Support
-This document outlines the plan to refactor the `bielik_app_service` to support multiple domains (e.g., cars, flats, etc.) in a modular and extensible way.
-## Core Problem
-The current implementation is hardcoded for the "cars" domain. The data schema (`CarData`), the prompt, and the MCP logic are all tailored specifically for car descriptions. To support new domains, a significant refactoring is required.
-## Proposed Solution: A Configuration-Driven, Modular Architecture
-The proposed solution is to move from a hardcoded implementation to a configuration-driven one, where each domain has its own configuration and modules.
-### 1. The "Domain" Concept
-A "domain" will be the central concept. Each domain (e.g., "cars", "flats") will have its own dedicated module that contains its specific configuration, schemas, and logic.
-### 2. New Directory Structure
-A new `app/domains` directory will be created. Each subdirectory within `app/domains` will represent a single domain.
-```
-bielik_app_service/app/
-├── domains/
-│   ├── __init__.py
-│   └── cars/
-│       ├── __init__.py
-│       ├── config.py       # Domain-specific configuration
-│       ├── schemas.py      # Pydantic schemas for this domain (e.g., CarData)
-│       └── prompts.py      # Prompt templates for this domain
-└── mcp/
-    ├── preprocessor.py
-    ├── guardrails.py
-    └── postprocessor.py
-```
-### 3. Domain Configuration (`config.py`)
-The `app/domains/cars/config.py` file will define everything needed for the "cars" domain:
-*   **Schema:** It will import the Pydantic schema from `schemas.py`.
-*   **Prompt Template:** It will import the prompt template from `prompts.py`.
-*   **MCP Rules:** It will define the specific rules for the preprocessor, guardrails, and postprocessor for this domain.
-### 4. Refactoring the Main Endpoint
-The `/enhance-description` endpoint in `app/main.py` will be refactored:
-*   **Endpoint Signature:** It will be changed to accept a `domain` name and a generic `data` payload.
-    ```python
-    @app.post("/enhance-description")
-    async def enhance_description(domain: str, data: dict, ...):
-    ```
-*   **Dynamic Domain Loading:** The endpoint will dynamically load the configuration and modules for the requested `domain`.
-*   **Dynamic Validation:** It will use the schema from the loaded domain module to validate the incoming `data`.
-*   **Dynamic Pipeline:** It will use the domain's prompt template and MCP rules to execute the enhancement pipeline.
-## Advantages of this Approach
-*   **Extensibility:** Adding a new domain (e.g., "flats") will be as simple as creating a new subdirectory `app/domains/flats/` with its own configuration, schema, and prompt files. No changes to the core application logic in `main.py` will be needed.
-*   **Maintainability:** All the logic for a specific domain will be co-located in its own module, making it easy to find and maintain.
-*   **Separation of Concerns:** The core application logic is separated from the domain-specific logic.
-## Next Steps
-1.  Create the new directory structure (`app/domains/cars/`).
-2.  Move the existing `CarData` schema to `app/domains/cars/schemas.py`.
-3.  Create `app/domains/cars/prompts.py` and move the prompt creation logic there.
-4.  Create `app/domains/cars/config.py` to tie everything together.
-5.  Refactor `app/main.py` to use this new dynamic, modular approach.

app/main.py CHANGED Viewed

@@ -9,7 +9,7 @@ from app.models.huggingface_service import HuggingFaceTextGenerationService
 from fastapi.middleware.cors import CORSMiddleware
 from app.schemas.schemas import EnhancedDescriptionResponse
 from app.auth.auth0_jwt import get_authenticated_user
-from app.mcp import preprocessor, guardrails, postprocessor
 app = FastAPI(
     title="Modular Car Description Enhancer",
@@ -33,7 +33,7 @@ app.add_middleware(
 # Global service initialization
 MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
 hf_service = HuggingFaceTextGenerationService(
-    model_name_or_path=MODEL_PATH_IN_CONTAINER,
     device="cpu"
 )
@@ -85,7 +85,7 @@ async def enhance_description(
     domain_config = get_domain_config(domain)
     DomainSchema = domain_config["schema"]
     create_prompt = domain_config["create_prompt"]
-    mcp_rules = domain_config["mcp_rules"]
     # --- 2. Validate Input Data ---
     try:
@@ -93,13 +93,10 @@ async def enhance_description(
     except ValidationError as e:
         raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
-    # --- 3. MCP Pre-processing ---
-    processed_data = preprocessor.preprocess_data(validated_data, mcp_rules.get("preprocessor", {}))
-    # --- 4. Prompt Construction ---
-    chat_messages = create_prompt(processed_data)
-    # --- 5. Text Generation ---
     try:
         generated_description = await hf_service.generate_text(
             chat_template_messages=chat_messages,
@@ -111,12 +108,13 @@ async def enhance_description(
         print(f"Unexpected error during text generation: {e}")
         raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
-    # --- 6. MCP Guardrails & Post-processing ---
-    if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
-        raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
-    final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
     generation_time = time.time() - start_time
     user_email = user['email'] if user else "anonymous"
@@ -127,6 +125,30 @@ async def enhance_description(
         user_email=user_email
     )
 @app.get("/user/me")
 async def get_user_info(user: dict = Depends(get_authenticated_user)):
     """Get current authenticated user information"""

 from fastapi.middleware.cors import CORSMiddleware
 from app.schemas.schemas import EnhancedDescriptionResponse
 from app.auth.auth0_jwt import get_authenticated_user
+# MCP imports removed
 app = FastAPI(
     title="Modular Car Description Enhancer",
 # Global service initialization
 MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
 hf_service = HuggingFaceTextGenerationService(
+    model_name_or_PATH=MODEL_PATH_IN_CONTAINER,
     device="cpu"
 )
     domain_config = get_domain_config(domain)
     DomainSchema = domain_config["schema"]
     create_prompt = domain_config["create_prompt"]
+    # mcp_rules removed
     # --- 2. Validate Input Data ---
     try:
     except ValidationError as e:
         raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
+    # --- 3. Prompt Construction ---
+    chat_messages = create_prompt(validated_data)
+    # --- 4. Text Generation ---
     try:
         generated_description = await hf_service.generate_text(
             chat_template_messages=chat_messages,
         print(f"Unexpected error during text generation: {e}")
         raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
+    # --- 5. MCP Guardrails & Post-processing removed ---
+    # if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
+    #     raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
+    # final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
+    final_description = generated_description # No post-processing here
     generation_time = time.time() - start_time
     user_email = user['email'] if user else "anonymous"
         user_email=user_email
     )
+@app.post("/generate")
+async def generate_text_only(
+    chat_template_messages: str = Body(..., embed=True),
+    max_new_tokens: int = 150,
+    temperature: float = 0.75,
+    top_p: float = 0.9
+):
+    """
+    Generates raw text based on provided chat template messages.
+    This endpoint is intended for internal use by the MCP service.
+    """
+    try:
+        generated_text = await hf_service.generate_text(
+            chat_template_messages=chat_template_messages,
+            max_new_tokens=max_new_tokens,
+            temperature=temperature,
+            top_p=top_p,
+        )
+        return {"generated_text": generated_text}
+    except Exception as e:
+        print(f"Unexpected error during raw text generation: {e}")
+        raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
 @app.get("/user/me")
 async def get_user_info(user: dict = Depends(get_authenticated_user)):
     """Get current authenticated user information"""

app/mcp/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- # This file makes the 'mcp' directory a Python package.

app/mcp/guardrails.py DELETED Viewed

@@ -1,25 +0,0 @@
-# bielik_app_service/app/mcp/guardrails.py
-def check_compliance(description: str, rules: dict) -> bool:
-    """
-    Checks if the generated description meets business and quality standards
-    defined in the rules.
-    """
-    print("MCP: Running guardrails...")
-    # Check for prohibited words
-    prohibited_words = rules.get("prohibited_words", [])
-    for word in prohibited_words:
-        if word in description.lower():
-            print(f"Guardrail FAIL: Found prohibited word '{word}'.")
-            return False
-    # Check for length
-    max_length = rules.get("max_length")
-    if max_length and len(description) > max_length:
-        print(f"Guardrail FAIL: Description is too long ({len(description)} characters). Max is {max_length}.")
-        return False
-    print("Guardrails PASSED.")
-    return True

app/mcp/postprocessor.py DELETED Viewed

@@ -1,18 +0,0 @@
-# bielik_app_service/app/mcp/postprocessor.py
-def format_output(description: str, rules: dict) -> str:
-    """
-    Formats the final output description based on a set of rules.
-    """
-    print("MCP: Running postprocessor...")
-    formatted_description = description.strip()
-    # Add a closing statement if defined in the rules
-    closing_statement = rules.get("closing_statement")
-    if closing_statement and not formatted_description.endswith(closing_statement):
-        formatted_description = f"{formatted_description}\n\n{closing_statement}"
-    print("Post-processing complete.")
-    return formatted_description

app/mcp/preprocessor.py DELETED Viewed

@@ -1,18 +0,0 @@
-# bielik_app_service/app/mcp/preprocessor.py
-from pydantic import BaseModel
-def preprocess_data(data: BaseModel, rules: dict) -> BaseModel:
-    """
-    Preprocesses the input data based on a set of rules.
-    """
-    print("MCP: Running preprocessor...")
-    # Example of a generic rule: capitalize a field if it exists.
-    # The field to capitalize would be defined in the domain's config.
-    if hasattr(data, 'make'):
-        data.make = data.make.capitalize()
-        print(f"Standardized make: {data.make}")
-    return data