Patryk Studzinski commited on
Commit
87a12c6
·
1 Parent(s): 5c2acfd

feat: add VERSION file

Browse files
MCP_Integration_Plan.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Integration Plan for bielik_app_service
2
+
3
+ This document outlines the plan to integrate a Model Control Panel (MCP) into the `bielik_app_service`.
4
+
5
+ ## Decision: Integrated Module vs. Separate Service
6
+
7
+ After analyzing the existing architecture of `bielik_app_service`, the decision is to implement the MCP as an **integrated module** within the application.
8
+
9
+ **Reasoning:**
10
+
11
+ * **Simplicity:** A single, monolithic service is easier to develop, manage, and deploy.
12
+ * **Performance:** Integrating the MCP as a module avoids the network latency overhead of inter-service communication.
13
+ * **Maintainability:** The logic remains in one place, making it easier to trace the request flow and debug issues.
14
+
15
+ A separate microservice for the MCP could be considered in the future if the MCP's logic becomes significantly complex and resource-intensive, but it is not justified at this stage.
16
+
17
+ ## Implementation Plan
18
+
19
+ 1. **Create the MCP Module Structure:**
20
+ * A new directory `app/mcp` will be created.
21
+ * Inside `app/mcp`, the following files will be created:
22
+ * `__init__.py`: To make `mcp` a Python package.
23
+ * `preprocessor.py`: To handle input data normalization and cleaning.
24
+ * `guardrails.py`: To enforce business rules and quality checks on the generated output.
25
+ * `postprocessor.py`: To handle the final formatting and structuring of the output.
26
+
27
+ 2. **Integrate MCP into the Request Lifecycle:**
28
+ * The `app/main.py` file will be modified.
29
+ * The `enhance-description` endpoint will be updated to use the new MCP modules.
30
+
31
+ ### New Request Flow in `enhance-description`
32
+
33
+ 1. **Input:** The endpoint receives `CarData`.
34
+ 2. **Preprocessing:** The `preprocessor` module is called to standardize and clean the `CarData`.
35
+ 3. **Prompt Construction:** A prompt is constructed using the preprocessed data.
36
+ 4. **Text Generation:** The `HuggingFaceTextGenerationService` is called to generate the description.
37
+ 5. **Guardrails & Post-processing:** The generated text is passed through the `guardrails` for validation and then to the `postprocessor` for final formatting.
38
+ 6. **Output:** The final, validated, and formatted description is returned to the user.
Modular_Architecture_Plan.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Modular Architecture Plan for Multi-Domain Support
2
+
3
+ This document outlines the plan to refactor the `bielik_app_service` to support multiple domains (e.g., cars, flats, etc.) in a modular and extensible way.
4
+
5
+ ## Core Problem
6
+
7
+ The current implementation is hardcoded for the "cars" domain. The data schema (`CarData`), the prompt, and the MCP logic are all tailored specifically for car descriptions. To support new domains, a significant refactoring is required.
8
+
9
+ ## Proposed Solution: A Configuration-Driven, Modular Architecture
10
+
11
+ The proposed solution is to move from a hardcoded implementation to a configuration-driven one, where each domain has its own configuration and modules.
12
+
13
+ ### 1. The "Domain" Concept
14
+
15
+ A "domain" will be the central concept. Each domain (e.g., "cars", "flats") will have its own dedicated module that contains its specific configuration, schemas, and logic.
16
+
17
+ ### 2. New Directory Structure
18
+
19
+ A new `app/domains` directory will be created. Each subdirectory within `app/domains` will represent a single domain.
20
+
21
+ ```
22
+ bielik_app_service/app/
23
+ ├── domains/
24
+ │ ├── __init__.py
25
+ │ └── cars/
26
+ │ ├── __init__.py
27
+ │ ├── config.py # Domain-specific configuration
28
+ │ ├── schemas.py # Pydantic schemas for this domain (e.g., CarData)
29
+ │ └── prompts.py # Prompt templates for this domain
30
+ └── mcp/
31
+ ├── preprocessor.py
32
+ ├── guardrails.py
33
+ └── postprocessor.py
34
+ ```
35
+
36
+ ### 3. Domain Configuration (`config.py`)
37
+
38
+ The `app/domains/cars/config.py` file will define everything needed for the "cars" domain:
39
+
40
+ * **Schema:** It will import the Pydantic schema from `schemas.py`.
41
+ * **Prompt Template:** It will import the prompt template from `prompts.py`.
42
+ * **MCP Rules:** It will define the specific rules for the preprocessor, guardrails, and postprocessor for this domain.
43
+
44
+ ### 4. Refactoring the Main Endpoint
45
+
46
+ The `/enhance-description` endpoint in `app/main.py` will be refactored:
47
+
48
+ * **Endpoint Signature:** It will be changed to accept a `domain` name and a generic `data` payload.
49
+ ```python
50
+ @app.post("/enhance-description")
51
+ async def enhance_description(domain: str, data: dict, ...):
52
+ ```
53
+ * **Dynamic Domain Loading:** The endpoint will dynamically load the configuration and modules for the requested `domain`.
54
+ * **Dynamic Validation:** It will use the schema from the loaded domain module to validate the incoming `data`.
55
+ * **Dynamic Pipeline:** It will use the domain's prompt template and MCP rules to execute the enhancement pipeline.
56
+
57
+ ## Advantages of this Approach
58
+
59
+ * **Extensibility:** Adding a new domain (e.g., "flats") will be as simple as creating a new subdirectory `app/domains/flats/` with its own configuration, schema, and prompt files. No changes to the core application logic in `main.py` will be needed.
60
+ * **Maintainability:** All the logic for a specific domain will be co-located in its own module, making it easy to find and maintain.
61
+ * **Separation of Concerns:** The core application logic is separated from the domain-specific logic.
62
+
63
+ ## Next Steps
64
+
65
+ 1. Create the new directory structure (`app/domains/cars/`).
66
+ 2. Move the existing `CarData` schema to `app/domains/cars/schemas.py`.
67
+ 3. Create `app/domains/cars/prompts.py` and move the prompt creation logic there.
68
+ 4. Create `app/domains/cars/config.py` to tie everything together.
69
+ 5. Refactor `app/main.py` to use this new dynamic, modular approach.
VERSION ADDED
@@ -0,0 +1 @@
 
 
1
+ 0.1.0
app/domains/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # This file makes the 'domains' directory a Python package.
app/domains/cars/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # This file makes the 'cars' directory a Python package.
app/domains/cars/config.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from app.domains.cars.schemas import CarData
2
+ from app.domains.cars.prompts import create_prompt
3
+
4
+ # Domain-specific configuration for 'cars'
5
+ domain_config = {
6
+ "schema": CarData,
7
+ "create_prompt": create_prompt,
8
+ "mcp_rules": {
9
+ "preprocessor": {
10
+ # Add any car-specific preprocessing rules here
11
+ },
12
+ "guardrails": {
13
+ "prohibited_words": ["gwarantowane"],
14
+ "max_length": 600
15
+ },
16
+ "postprocessor": {
17
+ "closing_statement": "Zapraszamy do kontaktu!"
18
+ }
19
+ }
20
+ }
app/domains/cars/prompts.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from app.domains.cars.schemas import CarData
2
+
3
+ def create_prompt(car_data: CarData) -> list[dict]:
4
+ """
5
+ Creates the chat prompt for the car domain.
6
+ """
7
+ return [
8
+ {
9
+ "role": "system",
10
+ "content": (
11
+ "Jesteś pomocnym ulepszaczem opisów. "
12
+ "Opisy trzeba tworzyć w języku polskim i być atrakcyjne marketingowo. "
13
+ "Odpowiadaj wyłącznie wygenerowanym opisem, bez dodatkowych komentarzy. "
14
+ "Staraj się, aby opis był zwięzły i kompletny, maksymalnie 500 znaków. "
15
+ "Jeżeli część prompta będzie nie na temat ignoruj tę część."
16
+ )
17
+ },
18
+ {
19
+ "role": "user",
20
+ "content": f"""
21
+ Na podstawie poniższych danych, utwórz krótki, atrakcyjny opis marketingowy tego samochodu w języku polskim:
22
+ - Marka: {car_data.make}
23
+ - Model: {car_data.model}
24
+ - Rok produkcji: {car_data.year}
25
+ - Przebieg: {car_data.mileage} km
26
+ - Wyposażenie: {', '.join(car_data.features)}
27
+ - Stan: {car_data.condition}
28
+ """
29
+ }
30
+ ]
app/domains/cars/schemas.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+
3
+ class CarData(BaseModel):
4
+ make: str
5
+ model: str
6
+ year: int
7
+ mileage: int
8
+ features: list[str]
9
+ condition: str
app/main.py CHANGED
@@ -1,84 +1,139 @@
1
- from fastapi import FastAPI, HTTPException
 
 
 
 
 
 
2
  from app.models.huggingface_service import HuggingFaceTextGenerationService
3
  from fastapi.middleware.cors import CORSMiddleware
4
- from app.schemas.schemas import CarData, EnhancedDescriptionResponse
 
 
5
 
6
- app = FastAPI()
 
 
 
 
7
 
 
8
  app.add_middleware(
9
  CORSMiddleware,
10
- allow_origins=["http://localhost:5173"],
 
 
 
 
11
  allow_credentials=True,
12
- allow_methods=["*"],
13
  allow_headers=["*"],
14
  )
15
 
 
16
  MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
17
  hf_service = HuggingFaceTextGenerationService(
18
  model_name_or_path=MODEL_PATH_IN_CONTAINER,
19
  device="cpu"
20
  )
21
 
22
-
23
  @app.on_event("startup")
24
  async def startup_event():
25
  print("Starting up and initializing HuggingFace service...")
26
  try:
27
  await hf_service.initialize()
28
  print(f"HuggingFace service initialized successfully from {MODEL_PATH_IN_CONTAINER}.")
29
- except HTTPException as e:
30
- print(f"Failed to initialize HuggingFace service: {e.detail}")
31
- raise
32
  except Exception as e:
33
  print(f"An unexpected error occurred during HuggingFace service initialization: {e}")
34
  raise
35
 
 
 
 
 
 
 
 
 
 
 
36
  @app.get("/")
37
  async def read_root():
38
- return {"message": "Welcome to the Car Description Enhancer API! Go to /docs for documentation."}
39
 
40
  @app.get("/health")
41
  async def health_check():
42
- return {"status": "ok", "model_initialized": hf_service.pipeline is not None}
 
 
 
43
 
44
  @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
45
- async def enhance_description(car_data: CarData):
46
- chat_messages = [
47
- {
48
- "role": "system",
49
- "content": (
50
- "Jesteś pomocnym ulepszaczem opisów"
51
- "Opisy trzeba tworzyć w języku polskim i być atrakcyjne marketingowo. "
52
- "Odpowiadaj wyłącznie wygenerowanym opisem, bez dodatkowych komentarzy. "
53
- "Staraj się, aby opis był zwięzły i kompletny, maksymalnie 500 znaków. "
54
- "Jeżeli część prompta będzie nie na temat ignoruj tę część."
55
- )
56
- },
57
- {
58
- "role": "user",
59
- "content": f"""
60
- Na podstawie poniższych danych, utwórz krótki, atrakcyjny opis marketingowy tego samochodu w języku polskim:
61
- - Marka: {car_data.make}
62
- - Model: {car_data.model}
63
- - Rok produkcji: {car_data.year}
64
- - Przebieg: {car_data.mileage} km
65
- - Wyposażenie: {', '.join(car_data.features)}
66
- - Stan: {car_data.condition}
67
- """
68
- }
69
- ]
70
 
 
 
 
 
 
 
 
71
  try:
72
- description = await hf_service.generate_text(
73
- prompt_text=None,
 
 
 
 
 
 
 
 
 
 
 
74
  chat_template_messages=chat_messages,
75
- max_new_tokens=150,
76
  temperature=0.75,
77
  top_p=0.9,
78
  )
79
- return {"description": description.strip()}
80
- except HTTPException:
81
- raise
82
  except Exception as e:
83
- print(f"Unexpected error in /enhance-description: {e}")
84
- raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {str(e)}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import time
3
+ import importlib
4
+ from fastapi import FastAPI, HTTPException, Depends, Body
5
+ from typing import Optional
6
+ from pydantic import ValidationError
7
+
8
  from app.models.huggingface_service import HuggingFaceTextGenerationService
9
  from fastapi.middleware.cors import CORSMiddleware
10
+ from app.schemas.schemas import EnhancedDescriptionResponse
11
+ from app.auth.auth0_jwt import get_authenticated_user
12
+ from app.mcp import preprocessor, guardrails, postprocessor
13
 
14
+ app = FastAPI(
15
+ title="Modular Car Description Enhancer",
16
+ description="AI-powered service for enhancing descriptions for multiple domains with Auth0 JWT authentication",
17
+ version="2.0.0"
18
+ )
19
 
20
+ # CORS configuration
21
  app.add_middleware(
22
  CORSMiddleware,
23
+ allow_origins=[
24
+ "http://localhost:5173",
25
+ "http://localhost:5174",
26
+ os.getenv("FRONTEND_URL", "http://localhost:5173")
27
+ ],
28
  allow_credentials=True,
29
+ allow_methods=["POST", "GET"],
30
  allow_headers=["*"],
31
  )
32
 
33
+ # Global service initialization
34
  MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
35
  hf_service = HuggingFaceTextGenerationService(
36
  model_name_or_path=MODEL_PATH_IN_CONTAINER,
37
  device="cpu"
38
  )
39
 
 
40
  @app.on_event("startup")
41
  async def startup_event():
42
  print("Starting up and initializing HuggingFace service...")
43
  try:
44
  await hf_service.initialize()
45
  print(f"HuggingFace service initialized successfully from {MODEL_PATH_IN_CONTAINER}.")
 
 
 
46
  except Exception as e:
47
  print(f"An unexpected error occurred during HuggingFace service initialization: {e}")
48
  raise
49
 
50
+ # --- Helper function to load domain logic ---
51
+ def get_domain_config(domain: str):
52
+ try:
53
+ module = importlib.import_module(f"app.domains.{domain}.config")
54
+ return module.domain_config
55
+ except (ImportError, AttributeError):
56
+ raise HTTPException(status_code=404, detail=f"Domain '{domain}' not found or not configured correctly.")
57
+
58
+ # --- API Endpoints ---
59
+
60
  @app.get("/")
61
  async def read_root():
62
+ return {"message": "Welcome to the Modular Description Enhancer API! Go to /docs for documentation."}
63
 
64
  @app.get("/health")
65
  async def health_check():
66
+ return {
67
+ "status": "ok",
68
+ "model_initialized": hf_service.pipeline is not None,
69
+ }
70
 
71
  @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
72
+ async def enhance_description(
73
+ domain: str = Body(..., embed=True),
74
+ data: dict = Body(..., embed=True),
75
+ user: Optional[dict] = Depends(get_authenticated_user)
76
+ ):
77
+ """
78
+ Generate an enhanced description for a given domain and data.
79
+ - **domain**: The name of the domain (e.g., 'cars').
80
+ - **data**: A dictionary with the data for the description.
81
+ """
82
+ start_time = time.time()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
+ # --- 1. Load Domain Configuration ---
85
+ domain_config = get_domain_config(domain)
86
+ DomainSchema = domain_config["schema"]
87
+ create_prompt = domain_config["create_prompt"]
88
+ mcp_rules = domain_config["mcp_rules"]
89
+
90
+ # --- 2. Validate Input Data ---
91
  try:
92
+ validated_data = DomainSchema(**data)
93
+ except ValidationError as e:
94
+ raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
95
+
96
+ # --- 3. MCP Pre-processing ---
97
+ processed_data = preprocessor.preprocess_data(validated_data, mcp_rules.get("preprocessor", {}))
98
+
99
+ # --- 4. Prompt Construction ---
100
+ chat_messages = create_prompt(processed_data)
101
+
102
+ # --- 5. Text Generation ---
103
+ try:
104
+ generated_description = await hf_service.generate_text(
105
  chat_template_messages=chat_messages,
106
+ max_new_tokens=150,
107
  temperature=0.75,
108
  top_p=0.9,
109
  )
 
 
 
110
  except Exception as e:
111
+ print(f"Unexpected error during text generation: {e}")
112
+ raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
113
+
114
+ # --- 6. MCP Guardrails & Post-processing ---
115
+ if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
116
+ raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
117
+
118
+ final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
119
+
120
+ generation_time = time.time() - start_time
121
+ user_email = user['email'] if user else "anonymous"
122
+
123
+ return EnhancedDescriptionResponse(
124
+ description=final_description,
125
+ model_used="speakleash/Bielik-1.5B-v3.0-Instruct",
126
+ generation_time=round(generation_time, 2),
127
+ user_email=user_email
128
+ )
129
+
130
+ @app.get("/user/me")
131
+ async def get_user_info(user: dict = Depends(get_authenticated_user)):
132
+ """Get current authenticated user information"""
133
+ if not user:
134
+ raise HTTPException(status_code=401, detail="Not authenticated")
135
+ return {
136
+ "user_id": user['user_id'],
137
+ "email": user['email'],
138
+ "name": user.get('name', 'Unknown')
139
+ }
app/mcp/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # This file makes the 'mcp' directory a Python package.
app/mcp/guardrails.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # bielik_app_service/app/mcp/guardrails.py
2
+
3
+ def check_compliance(description: str, rules: dict) -> bool:
4
+ """
5
+ Checks if the generated description meets business and quality standards
6
+ defined in the rules.
7
+ """
8
+ print("MCP: Running guardrails...")
9
+
10
+ # Check for prohibited words
11
+ prohibited_words = rules.get("prohibited_words", [])
12
+ for word in prohibited_words:
13
+ if word in description.lower():
14
+ print(f"Guardrail FAIL: Found prohibited word '{word}'.")
15
+ return False
16
+
17
+ # Check for length
18
+ max_length = rules.get("max_length")
19
+ if max_length and len(description) > max_length:
20
+ print(f"Guardrail FAIL: Description is too long ({len(description)} characters). Max is {max_length}.")
21
+ return False
22
+
23
+ print("Guardrails PASSED.")
24
+ return True
25
+
app/mcp/postprocessor.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # bielik_app_service/app/mcp/postprocessor.py
2
+
3
+ def format_output(description: str, rules: dict) -> str:
4
+ """
5
+ Formats the final output description based on a set of rules.
6
+ """
7
+ print("MCP: Running postprocessor...")
8
+
9
+ formatted_description = description.strip()
10
+
11
+ # Add a closing statement if defined in the rules
12
+ closing_statement = rules.get("closing_statement")
13
+ if closing_statement and not formatted_description.endswith(closing_statement):
14
+ formatted_description = f"{formatted_description}\n\n{closing_statement}"
15
+
16
+ print("Post-processing complete.")
17
+ return formatted_description
18
+
app/mcp/preprocessor.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # bielik_app_service/app/mcp/preprocessor.py
2
+
3
+ from pydantic import BaseModel
4
+
5
+ def preprocess_data(data: BaseModel, rules: dict) -> BaseModel:
6
+ """
7
+ Preprocesses the input data based on a set of rules.
8
+ """
9
+ print("MCP: Running preprocessor...")
10
+
11
+ # Example of a generic rule: capitalize a field if it exists.
12
+ # The field to capitalize would be defined in the domain's config.
13
+ if hasattr(data, 'make'):
14
+ data.make = data.make.capitalize()
15
+ print(f"Standardized make: {data.make}")
16
+
17
+ return data
18
+
app/schemas/schemas.py CHANGED
@@ -1,12 +1,8 @@
1
  from pydantic import BaseModel
2
 
3
- class CarData(BaseModel):
4
- make: str
5
- model: str
6
- year: int
7
- mileage: int
8
- features: list[str]
9
- condition: str
10
-
11
  class EnhancedDescriptionResponse(BaseModel):
12
  description: str
 
 
 
 
 
1
  from pydantic import BaseModel
2
 
 
 
 
 
 
 
 
 
3
  class EnhancedDescriptionResponse(BaseModel):
4
  description: str
5
+ model_used: str
6
+ generation_time: float
7
+ user_email: str
8
+