File size: 1,513 Bytes
e9084d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Current Version of this application features:

1. dual mode with embedding and llm mode
2. data preprocessing retrieving from csv data 
3. Pincode Logic has been updated 



Objective:
This repository contains the implementation of a **GenAI-based Entity Matching** system. It supports a dual‑mode architecture with a Fastapi backend, a Streamlit frontend, and a collection of services for data processing and model interaction.


Features:

- **Flexible matching service** implemented in `backend/matching_service.py`.
- **Modular data models** defined in `backend/models.py`.
- **Streamlit frontend** for quick experimentation (`frontend/app_streamlit.py`).
- **Configurable rules and LLM model integration** under `services/`.
- **Extensive test suite** located in `tests/`.
- **Configuration files** and property management in `backend/config` and `services/config.py`.


Active endpoints :

    POST /backend/v1/match         – Match a single pair of records
    POST /backend/v1/match/batch   – Match multiple pairs  # multithread implementation 
    GET  /backend/v1/health        – Full health check (CSV data, models, LLM)
    GET  /backend/v1/health/llm    – LLM server health check only




To Run the application :

for embedding mode: 
models will be loaded when we initiate the server

for llm mode:
we have to paste the llm up url in the common.properties , base-url:

for frontend :

python -m streamlit run frontend/app_streamlit.py


for backend:

python -m uvicorn backend.server:app