LogicGoInfotechSpaces commited on
Commit
033c29e
·
verified ·
1 Parent(s): 705812e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -107
README.md CHANGED
@@ -8,117 +8,12 @@ pinned: true
8
  short_description: WalletSync DUPLICATE TRANSACTION DETECTION
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
12
 
13
  Auto Expense Categorization – Duplicate Detection
14
  =================================================
15
 
16
- This mini-service connects to the `expense` MongoDB database and surfaces *soft* merge suggestions whenever two or more expense entries look like the same purchase. The rules currently implemented are the ones requested:
17
 
18
- * Amount difference no more than ±1 %
19
- * Timestamp difference within a configurable ±N minutes window (default: 10 min)
20
- * Merchant names that are either identical once normalised or mapped through a merchant-alias table
21
 
22
- Instead of destroying or editing any expense rows, the service writes a merge suggestion into the `merge_suggestions` collection so that an operator (or another automation) can perform the actual merge later on.
23
 
24
- Quick Start
25
- -----------
26
-
27
- 1. Create a virtual environment and install dependencies:
28
-
29
- ```
30
- python3 -m venv .venv
31
- .\.venv\Scripts\activate
32
- python3 -m pip install -r requirements.txt
33
- ```
34
-
35
- 2. Copy `.env.example` to `.env` and set the Mongo connection string if you do not want to rely on the baked-in default.
36
-
37
- 3. Run the detector (the default config scans the last 48 h of data and writes suggestions only). For the historical `transactions` collection you may want to bump the lookback window:
38
-
39
- ```
40
- python3 -m src.main --minutes 30 --lookback-hours 720
41
- ```
42
-
43
- You will see log lines such as:
44
-
45
- ```
46
- INFO DuplicateDetector Identified 2 duplicates, suggestion 673a...
47
- ```
48
-
49
- API Server
50
- ----------
51
-
52
- Run the HTTP service with FastAPI/uvicorn:
53
-
54
- ```
55
- python3 -m uvicorn src.api:app --reload
56
- ```
57
-
58
- Endpoints:
59
-
60
- * `GET /health` – readiness probe.
61
- * `POST /duplicates/detect` – kicks off a scan (body can override `lookback_hours`, `limit`, `amount_pct`, `minutes`).
62
- * `GET /suggestions?limit=50` – lists recent merge suggestions so the UI can ask “These seem similar. Would you like to merge them?”.
63
-
64
- Collections
65
- -----------
66
-
67
- * `transactions` (default): source data. The detector automatically maps the `date`/`createdAt` timestamp and `note`/`paymentType` merchant fields so you still get near-duplicate detection without reshaping your documents. Entries are only compared if they belong to the same `user`.
68
- * `merchant_aliases`: optional alias definitions (`name`, `aliases`).
69
- * `merge_suggestions`: the service writes documents shaped as:
70
-
71
- ```
72
- {
73
- "_id": ObjectId(...),
74
- "candidate_ids": [...],
75
- "message": "These seem similar. Would you like to merge them?",
76
- "details": {
77
- "amount_delta_pct": 0.53,
78
- "time_delta_minutes": 4.2,
79
- "merchant_match_rule": "alias"
80
- },
81
- "audit": {
82
- "generated_by": "duplicate-detector",
83
- "generated_at": ISODate(...)
84
- },
85
- "status": "pending"
86
- }
87
- ```
88
-
89
- Configuration
90
- -------------
91
-
92
- All tunables live in `src/config.py`. Environment variables take precedence, so you can tune tolerances per deployment without editing code.
93
-
94
- | Variable | Description | Default |
95
- | --- | --- | --- |
96
- | `MONGO_URI` | Mongo connection string | Provided URI |
97
- | `MONGO_DB` | Database name | `expense` |
98
- | `MONGO_EXPENSE_COLLECTION` | Expenses collection | `transactions` |
99
- | `MONGO_ALIAS_COLLECTION` | Merchant alias collection | `merchant_aliases` |
100
- | `MONGO_SUGGESTION_COLLECTION` | Merge-suggestion collection | `merge_suggestions` |
101
- | `AMOUNT_TOLERANCE_PCT` | Amount delta percentage | `1.0` |
102
- | `TIME_TOLERANCE_MINUTES` | Time delta minutes | `10` |
103
- | `DEFAULT_LOOKBACK_HOURS` | How far back to scan | `48` |
104
- | `TIME_FIELDS` | CSV priority order for timestamps | `date,expense_time,createdAt` |
105
- | `MERCHANT_FIELDS` | CSV priority order for merchant labels | `merchant,note,paymentType,type,to` |
106
- | `USER_FIELD` | Source field that stores the user id (inferred automatically) | `user` |
107
-
108
- Smoke Test
109
- ----------
110
-
111
- Use the bundled `test.py` script to hit the running API (locally or on the Hugging Face Space) via the base URL:
112
-
113
- ```
114
- python3 test.py --base-url https://LogicGoInfotechSpaces-duplicate-transaction-detection.hf.space --lookback-hours 720 --limit 5000
115
- ```
116
-
117
- The script calls `/health`, `/duplicates/detect`, and `/suggestions` in sequence and prints the responses so you can quickly verify the deployment.
118
-
119
- Next Steps
120
- ----------
121
-
122
- * Wire this module into your ingestion pipeline so suggestions are generated immediately after a new expense is stored.
123
- * Surface the `merge_suggestions` collection in your UI to show prompts such as “These seem similar. Would you like to merge them?”
124
- * Extend `MerchantAliasResolver` to sync aliases from your upstream ERP or ML model.
 
8
  short_description: WalletSync DUPLICATE TRANSACTION DETECTION
9
  ---
10
 
 
11
 
12
  Auto Expense Categorization – Duplicate Detection
13
  =================================================
14
 
 
15
 
16
+
 
 
17
 
 
18
 
19
+ ==