Patryk Studzinski commited on
Commit
3297dba
·
1 Parent(s): 87a12c6

cleanup after split to separate mcp service

Browse files
.gitignore CHANGED
@@ -49,4 +49,7 @@ build/
49
 
50
  # System files
51
  .DS_Store
52
- Thumbs.db
 
 
 
 
49
 
50
  # System files
51
  .DS_Store
52
+ Thumbs.db
53
+
54
+ # Gemini Plans
55
+ gemini_plans/
MCP_Integration_Plan.md DELETED
@@ -1,38 +0,0 @@
1
- # MCP Integration Plan for bielik_app_service
2
-
3
- This document outlines the plan to integrate a Model Control Panel (MCP) into the `bielik_app_service`.
4
-
5
- ## Decision: Integrated Module vs. Separate Service
6
-
7
- After analyzing the existing architecture of `bielik_app_service`, the decision is to implement the MCP as an **integrated module** within the application.
8
-
9
- **Reasoning:**
10
-
11
- * **Simplicity:** A single, monolithic service is easier to develop, manage, and deploy.
12
- * **Performance:** Integrating the MCP as a module avoids the network latency overhead of inter-service communication.
13
- * **Maintainability:** The logic remains in one place, making it easier to trace the request flow and debug issues.
14
-
15
- A separate microservice for the MCP could be considered in the future if the MCP's logic becomes significantly complex and resource-intensive, but it is not justified at this stage.
16
-
17
- ## Implementation Plan
18
-
19
- 1. **Create the MCP Module Structure:**
20
- * A new directory `app/mcp` will be created.
21
- * Inside `app/mcp`, the following files will be created:
22
- * `__init__.py`: To make `mcp` a Python package.
23
- * `preprocessor.py`: To handle input data normalization and cleaning.
24
- * `guardrails.py`: To enforce business rules and quality checks on the generated output.
25
- * `postprocessor.py`: To handle the final formatting and structuring of the output.
26
-
27
- 2. **Integrate MCP into the Request Lifecycle:**
28
- * The `app/main.py` file will be modified.
29
- * The `enhance-description` endpoint will be updated to use the new MCP modules.
30
-
31
- ### New Request Flow in `enhance-description`
32
-
33
- 1. **Input:** The endpoint receives `CarData`.
34
- 2. **Preprocessing:** The `preprocessor` module is called to standardize and clean the `CarData`.
35
- 3. **Prompt Construction:** A prompt is constructed using the preprocessed data.
36
- 4. **Text Generation:** The `HuggingFaceTextGenerationService` is called to generate the description.
37
- 5. **Guardrails & Post-processing:** The generated text is passed through the `guardrails` for validation and then to the `postprocessor` for final formatting.
38
- 6. **Output:** The final, validated, and formatted description is returned to the user.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Modular_Architecture_Plan.md DELETED
@@ -1,69 +0,0 @@
1
- # Modular Architecture Plan for Multi-Domain Support
2
-
3
- This document outlines the plan to refactor the `bielik_app_service` to support multiple domains (e.g., cars, flats, etc.) in a modular and extensible way.
4
-
5
- ## Core Problem
6
-
7
- The current implementation is hardcoded for the "cars" domain. The data schema (`CarData`), the prompt, and the MCP logic are all tailored specifically for car descriptions. To support new domains, a significant refactoring is required.
8
-
9
- ## Proposed Solution: A Configuration-Driven, Modular Architecture
10
-
11
- The proposed solution is to move from a hardcoded implementation to a configuration-driven one, where each domain has its own configuration and modules.
12
-
13
- ### 1. The "Domain" Concept
14
-
15
- A "domain" will be the central concept. Each domain (e.g., "cars", "flats") will have its own dedicated module that contains its specific configuration, schemas, and logic.
16
-
17
- ### 2. New Directory Structure
18
-
19
- A new `app/domains` directory will be created. Each subdirectory within `app/domains` will represent a single domain.
20
-
21
- ```
22
- bielik_app_service/app/
23
- ├── domains/
24
- │ ├── __init__.py
25
- │ └── cars/
26
- │ ├── __init__.py
27
- │ ├── config.py # Domain-specific configuration
28
- │ ├── schemas.py # Pydantic schemas for this domain (e.g., CarData)
29
- │ └── prompts.py # Prompt templates for this domain
30
- └── mcp/
31
- ├── preprocessor.py
32
- ├── guardrails.py
33
- └── postprocessor.py
34
- ```
35
-
36
- ### 3. Domain Configuration (`config.py`)
37
-
38
- The `app/domains/cars/config.py` file will define everything needed for the "cars" domain:
39
-
40
- * **Schema:** It will import the Pydantic schema from `schemas.py`.
41
- * **Prompt Template:** It will import the prompt template from `prompts.py`.
42
- * **MCP Rules:** It will define the specific rules for the preprocessor, guardrails, and postprocessor for this domain.
43
-
44
- ### 4. Refactoring the Main Endpoint
45
-
46
- The `/enhance-description` endpoint in `app/main.py` will be refactored:
47
-
48
- * **Endpoint Signature:** It will be changed to accept a `domain` name and a generic `data` payload.
49
- ```python
50
- @app.post("/enhance-description")
51
- async def enhance_description(domain: str, data: dict, ...):
52
- ```
53
- * **Dynamic Domain Loading:** The endpoint will dynamically load the configuration and modules for the requested `domain`.
54
- * **Dynamic Validation:** It will use the schema from the loaded domain module to validate the incoming `data`.
55
- * **Dynamic Pipeline:** It will use the domain's prompt template and MCP rules to execute the enhancement pipeline.
56
-
57
- ## Advantages of this Approach
58
-
59
- * **Extensibility:** Adding a new domain (e.g., "flats") will be as simple as creating a new subdirectory `app/domains/flats/` with its own configuration, schema, and prompt files. No changes to the core application logic in `main.py` will be needed.
60
- * **Maintainability:** All the logic for a specific domain will be co-located in its own module, making it easy to find and maintain.
61
- * **Separation of Concerns:** The core application logic is separated from the domain-specific logic.
62
-
63
- ## Next Steps
64
-
65
- 1. Create the new directory structure (`app/domains/cars/`).
66
- 2. Move the existing `CarData` schema to `app/domains/cars/schemas.py`.
67
- 3. Create `app/domains/cars/prompts.py` and move the prompt creation logic there.
68
- 4. Create `app/domains/cars/config.py` to tie everything together.
69
- 5. Refactor `app/main.py` to use this new dynamic, modular approach.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/main.py CHANGED
@@ -9,7 +9,7 @@ from app.models.huggingface_service import HuggingFaceTextGenerationService
9
  from fastapi.middleware.cors import CORSMiddleware
10
  from app.schemas.schemas import EnhancedDescriptionResponse
11
  from app.auth.auth0_jwt import get_authenticated_user
12
- from app.mcp import preprocessor, guardrails, postprocessor
13
 
14
  app = FastAPI(
15
  title="Modular Car Description Enhancer",
@@ -33,7 +33,7 @@ app.add_middleware(
33
  # Global service initialization
34
  MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
35
  hf_service = HuggingFaceTextGenerationService(
36
- model_name_or_path=MODEL_PATH_IN_CONTAINER,
37
  device="cpu"
38
  )
39
 
@@ -85,7 +85,7 @@ async def enhance_description(
85
  domain_config = get_domain_config(domain)
86
  DomainSchema = domain_config["schema"]
87
  create_prompt = domain_config["create_prompt"]
88
- mcp_rules = domain_config["mcp_rules"]
89
 
90
  # --- 2. Validate Input Data ---
91
  try:
@@ -93,13 +93,10 @@ async def enhance_description(
93
  except ValidationError as e:
94
  raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
95
 
96
- # --- 3. MCP Pre-processing ---
97
- processed_data = preprocessor.preprocess_data(validated_data, mcp_rules.get("preprocessor", {}))
98
 
99
- # --- 4. Prompt Construction ---
100
- chat_messages = create_prompt(processed_data)
101
-
102
- # --- 5. Text Generation ---
103
  try:
104
  generated_description = await hf_service.generate_text(
105
  chat_template_messages=chat_messages,
@@ -111,12 +108,13 @@ async def enhance_description(
111
  print(f"Unexpected error during text generation: {e}")
112
  raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
113
 
114
- # --- 6. MCP Guardrails & Post-processing ---
115
- if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
116
- raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
 
 
 
117
 
118
- final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
119
-
120
  generation_time = time.time() - start_time
121
  user_email = user['email'] if user else "anonymous"
122
 
@@ -127,6 +125,30 @@ async def enhance_description(
127
  user_email=user_email
128
  )
129
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  @app.get("/user/me")
131
  async def get_user_info(user: dict = Depends(get_authenticated_user)):
132
  """Get current authenticated user information"""
 
9
  from fastapi.middleware.cors import CORSMiddleware
10
  from app.schemas.schemas import EnhancedDescriptionResponse
11
  from app.auth.auth0_jwt import get_authenticated_user
12
+ # MCP imports removed
13
 
14
  app = FastAPI(
15
  title="Modular Car Description Enhancer",
 
33
  # Global service initialization
34
  MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
35
  hf_service = HuggingFaceTextGenerationService(
36
+ model_name_or_PATH=MODEL_PATH_IN_CONTAINER,
37
  device="cpu"
38
  )
39
 
 
85
  domain_config = get_domain_config(domain)
86
  DomainSchema = domain_config["schema"]
87
  create_prompt = domain_config["create_prompt"]
88
+ # mcp_rules removed
89
 
90
  # --- 2. Validate Input Data ---
91
  try:
 
93
  except ValidationError as e:
94
  raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
95
 
96
+ # --- 3. Prompt Construction ---
97
+ chat_messages = create_prompt(validated_data)
98
 
99
+ # --- 4. Text Generation ---
 
 
 
100
  try:
101
  generated_description = await hf_service.generate_text(
102
  chat_template_messages=chat_messages,
 
108
  print(f"Unexpected error during text generation: {e}")
109
  raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
110
 
111
+ # --- 5. MCP Guardrails & Post-processing removed ---
112
+ # if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
113
+ # raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
114
+
115
+ # final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
116
+ final_description = generated_description # No post-processing here
117
 
 
 
118
  generation_time = time.time() - start_time
119
  user_email = user['email'] if user else "anonymous"
120
 
 
125
  user_email=user_email
126
  )
127
 
128
+ @app.post("/generate")
129
+ async def generate_text_only(
130
+ chat_template_messages: str = Body(..., embed=True),
131
+ max_new_tokens: int = 150,
132
+ temperature: float = 0.75,
133
+ top_p: float = 0.9
134
+ ):
135
+ """
136
+ Generates raw text based on provided chat template messages.
137
+ This endpoint is intended for internal use by the MCP service.
138
+ """
139
+ try:
140
+ generated_text = await hf_service.generate_text(
141
+ chat_template_messages=chat_template_messages,
142
+ max_new_tokens=max_new_tokens,
143
+ temperature=temperature,
144
+ top_p=top_p,
145
+ )
146
+ return {"generated_text": generated_text}
147
+ except Exception as e:
148
+ print(f"Unexpected error during raw text generation: {e}")
149
+ raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
150
+
151
+
152
  @app.get("/user/me")
153
  async def get_user_info(user: dict = Depends(get_authenticated_user)):
154
  """Get current authenticated user information"""
app/mcp/__init__.py DELETED
@@ -1 +0,0 @@
1
- # This file makes the 'mcp' directory a Python package.
 
 
app/mcp/guardrails.py DELETED
@@ -1,25 +0,0 @@
1
- # bielik_app_service/app/mcp/guardrails.py
2
-
3
- def check_compliance(description: str, rules: dict) -> bool:
4
- """
5
- Checks if the generated description meets business and quality standards
6
- defined in the rules.
7
- """
8
- print("MCP: Running guardrails...")
9
-
10
- # Check for prohibited words
11
- prohibited_words = rules.get("prohibited_words", [])
12
- for word in prohibited_words:
13
- if word in description.lower():
14
- print(f"Guardrail FAIL: Found prohibited word '{word}'.")
15
- return False
16
-
17
- # Check for length
18
- max_length = rules.get("max_length")
19
- if max_length and len(description) > max_length:
20
- print(f"Guardrail FAIL: Description is too long ({len(description)} characters). Max is {max_length}.")
21
- return False
22
-
23
- print("Guardrails PASSED.")
24
- return True
25
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/mcp/postprocessor.py DELETED
@@ -1,18 +0,0 @@
1
- # bielik_app_service/app/mcp/postprocessor.py
2
-
3
- def format_output(description: str, rules: dict) -> str:
4
- """
5
- Formats the final output description based on a set of rules.
6
- """
7
- print("MCP: Running postprocessor...")
8
-
9
- formatted_description = description.strip()
10
-
11
- # Add a closing statement if defined in the rules
12
- closing_statement = rules.get("closing_statement")
13
- if closing_statement and not formatted_description.endswith(closing_statement):
14
- formatted_description = f"{formatted_description}\n\n{closing_statement}"
15
-
16
- print("Post-processing complete.")
17
- return formatted_description
18
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/mcp/preprocessor.py DELETED
@@ -1,18 +0,0 @@
1
- # bielik_app_service/app/mcp/preprocessor.py
2
-
3
- from pydantic import BaseModel
4
-
5
- def preprocess_data(data: BaseModel, rules: dict) -> BaseModel:
6
- """
7
- Preprocesses the input data based on a set of rules.
8
- """
9
- print("MCP: Running preprocessor...")
10
-
11
- # Example of a generic rule: capitalize a field if it exists.
12
- # The field to capitalize would be defined in the domain's config.
13
- if hasattr(data, 'make'):
14
- data.make = data.make.capitalize()
15
- print(f"Standardized make: {data.make}")
16
-
17
- return data
18
-