spacedout-bits Oz commited on
Commit
af365fe
·
0 Parent(s):

Add Personal Finance Manager with HF Hub CSV storage

Browse files
Files changed (6) hide show
  1. .gitignore +80 -0
  2. README.md +222 -0
  3. app.py +298 -0
  4. hf_storage.py +267 -0
  5. requirements.txt +5 -0
  6. utils.py +225 -0
.gitignore ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ pip-wheel-metadata/
20
+ share/python-wheels/
21
+ *.egg-info/
22
+ .installed.cfg
23
+ *.egg
24
+ MANIFEST
25
+
26
+ # Virtual Environment
27
+ venv/
28
+ ENV/
29
+ env/
30
+ .venv
31
+ env.bak/
32
+ venv.bak/
33
+
34
+ # IDE
35
+ .vscode/
36
+ .idea/
37
+ *.swp
38
+ *.swo
39
+ *~
40
+ .DS_Store
41
+
42
+ # Environment
43
+ .env
44
+ .env.local
45
+ .env.*.local
46
+
47
+ # Application
48
+ ledger.csv
49
+ *.csv
50
+ *.xlsx
51
+ *.xls
52
+
53
+ # Logs
54
+ *.log
55
+ logs/
56
+
57
+ # Cache
58
+ .cache/
59
+ .pytest_cache/
60
+ .mypy_cache/
61
+
62
+ # Gradio
63
+ gradio_cached_examples/
64
+ flagged/
65
+
66
+ # Cache directory
67
+ cache/
68
+
69
+ # Deployment docs (not needed in Space)
70
+ DEPLOY_AND_TEST.md
71
+ DEPLOYMENT_QUICK_START.sh
72
+ DEVELOPMENT.md
73
+ HF_STORAGE_SETUP.md
74
+ QUICKSTART.md
75
+ SPACES_DEPLOYMENT.md
76
+ PUSH_TO_SPACE.md
77
+ spaces_config.yaml
78
+ test_app.py
79
+ test_standalone.py
80
+ .env.example
README.md ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Money Manager
3
+ emoji: 💸
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # 💸 Personal Finance Manager
13
+
14
+ A Gradio-based web application for managing personal finances with LLM-powered natural language expense logging. Log expenses like "Spent $15 on a burrito at Chipotle" and let AI parse them into organized ledger entries.
15
+
16
+ ## Features
17
+
18
+ ✨ **Natural Language Parsing**: Describe expenses in your own words—the LLM handles extraction
19
+ 📊 **Dynamic Ledger**: Real-time table showing all expenses with sorting and filtering
20
+ 💰 **Total Tracking**: Automatically calculated total spending that updates instantly
21
+ 🏷️ **Smart Categorization**: Expenses are automatically categorized (Food, Transportation, Utilities, etc.)
22
+ 🎨 **Clean Dashboard**: Financial-themed UI using Gradio's Soft theme
23
+ 🔄 **Session Persistence**: Ledger data persists throughout your session
24
+ ⚡ **Fallback Parser**: Works even without LLM API keys using rule-based parsing
25
+
26
+ ## Tech Stack
27
+
28
+ - **Frontend**: Gradio (Python web framework)
29
+ - **Data**: Pandas (DataFrames)
30
+ - **LLM**: LangChain with HuggingFace Hub or OpenAI
31
+ - **Language**: Python 3.8+
32
+
33
+ ## Setup
34
+
35
+ ### 1. Clone the Repository
36
+ ```bash
37
+ cd financemanager
38
+ ```
39
+
40
+ ### 2. Create Virtual Environment
41
+ ```bash
42
+ python3 -m venv venv
43
+ source venv/bin/activate # On Windows: venv\Scripts\activate
44
+ ```
45
+
46
+ ### 3. Install Dependencies
47
+ ```bash
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ ### 4. Configure API Keys (Optional)
52
+
53
+ Copy `.env.example` to `.env` and add your API keys:
54
+
55
+ ```bash
56
+ cp .env.example .env
57
+ ```
58
+
59
+ Edit `.env`:
60
+ - **HuggingFace**: Get token from https://huggingface.co/settings/tokens
61
+ - **OpenAI**: Get key from https://platform.openai.com/api-keys
62
+
63
+ If you don't configure any API keys, the app will use the fallback rule-based parser.
64
+
65
+ ### 5. Run the Application
66
+
67
+ ```bash
68
+ python app.py
69
+ ```
70
+
71
+ The app will launch at `http://localhost:7860`
72
+
73
+ ## Usage
74
+
75
+ 1. **Describe Your Expense**: Type a natural language description in the input box
76
+ - Examples:
77
+ - "Spent $15 on a burrito at Chipotle"
78
+ - "Paid $1200 for rent today"
79
+ - "Gas: $45.50"
80
+ - "Movie tickets $32"
81
+
82
+ 2. **Click Log Expense** or press Enter
83
+ 3. **View Results**:
84
+ - Status message confirms the entry
85
+ - Table updates with the new expense
86
+ - Total spending updates automatically
87
+ - Expenses sorted by date (newest first)
88
+
89
+ ## How It Works
90
+
91
+ ### LLM-Based Parsing (Recommended)
92
+ When an LLM is configured, the app sends your input to the model with this prompt:
93
+
94
+ ```
95
+ Parse this expense and return JSON with:
96
+ - date (YYYY-MM-DD)
97
+ - description (what was purchased)
98
+ - category (Food, Transportation, Utilities, etc.)
99
+ - amount (numeric value)
100
+ ```
101
+
102
+ The LLM returns structured JSON that the app parses and stores.
103
+
104
+ ### Fallback Parser
105
+ Without an LLM, the app uses:
106
+ - **Regex** to extract dollar amounts
107
+ - **Keyword matching** for category detection
108
+ - **Current date** for entries without explicit dates
109
+
110
+ ## Expense Data Structure
111
+
112
+ Each entry contains:
113
+
114
+ | Field | Example | Notes |
115
+ |-------|---------|-------|
116
+ | Date | 2024-05-01 | YYYY-MM-DD format |
117
+ | Description | Burrito at Chipotle | What was purchased |
118
+ | Category | Food | Auto-categorized |
119
+ | Amount | 15.00 | Dollar amount |
120
+
121
+ ## Supported Categories
122
+
123
+ - **Food**: Restaurant, groceries, coffee
124
+ - **Transportation**: Gas, Uber, parking, taxi
125
+ - **Utilities**: Electric, water, internet, phone
126
+ - **Entertainment**: Movies, concerts, books, games
127
+ - **Rent**: Rent, mortgage, apartment
128
+ - **Other**: Uncategorized expenses
129
+
130
+ ## Deployment to HuggingFace Spaces
131
+
132
+ ### 1. Create a Space
133
+ - Go to https://huggingface.co/spaces
134
+ - Click "Create new Space"
135
+ - Choose "Gradio" as the SDK
136
+ - Set repository visibility to public/private
137
+
138
+ ### 2. Upload Files
139
+ ```bash
140
+ git clone https://huggingface.co/spaces/your-username/your-space
141
+ cd your-space
142
+ # Copy app.py, requirements.txt, .env to this directory
143
+ git add .
144
+ git commit -m "Add finance manager"
145
+ git push
146
+ ```
147
+
148
+ ### 3. Add Secrets
149
+ In your Space's Settings → Repository secrets, add:
150
+ - `HUGGINGFACEHUB_API_TOKEN`
151
+ - `OPENAI_API_KEY` (if using OpenAI)
152
+
153
+ The space will auto-deploy and be accessible at: `https://huggingface.co/spaces/your-username/your-space`
154
+
155
+ ## Customization
156
+
157
+ ### Change Theme
158
+ In `app.py`, line 214:
159
+ ```python
160
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
161
+ ```
162
+
163
+ Try: `Default()`, `Glass()`, `Monochrome()`, `Soft()`, `Base()`
164
+
165
+ ### Add More Categories
166
+ Edit `parse_expense_fallback()` function, around line 149:
167
+ ```python
168
+ categories = {
169
+ "Shopping": ["amazon", "mall", "store", "buy"],
170
+ "Medical": ["doctor", "pharmacy", "clinic"],
171
+ # Add more...
172
+ }
173
+ ```
174
+
175
+ ### Change LLM Model
176
+ In `initialize_llm()`, line 68:
177
+ ```python
178
+ repo_id="mistralai/Mistral-7B-Instruct-v0.2",
179
+ # Try: "HuggingFaceH4/zephyr-7b-beta", "meta-llama/Llama-2-7b-chat-hf"
180
+ ```
181
+
182
+ ## Limitations
183
+
184
+ - ⚠️ Session data is not persisted between app restarts (no database)
185
+ - ⚠️ All amounts are in USD (no multi-currency support)
186
+ - ⚠️ LLM parsing may fail for very ambiguous inputs
187
+ - ⚠️ No built-in authentication (use for personal/private deployments)
188
+
189
+ ## Future Enhancements
190
+
191
+ - [ ] CSV export functionality
192
+ - [ ] Monthly/yearly summaries with charts
193
+ - [ ] Budget alerts
194
+ - [ ] Receipt image upload
195
+ - [ ] Multi-currency support
196
+ - [ ] SQLite database for persistence
197
+ - [ ] User authentication for Spaces deployment
198
+
199
+ ## Troubleshooting
200
+
201
+ ### "LLM not available" Warning
202
+ The app works without an LLM. This just means it's using the fallback parser. Add an API key to `.env` to enable intelligent parsing.
203
+
204
+ ### "JSON parsing error"
205
+ The LLM response format was unexpected. Try rephrasing your expense description or check your API key.
206
+
207
+ ### App Hangs on Startup
208
+ - Check that your API keys are correct
209
+ - Ensure you have internet connectivity
210
+ - Try disabling the LLM by not setting environment variables
211
+
212
+ ## License
213
+
214
+ MIT License - feel free to modify and deploy!
215
+
216
+ ## Support
217
+
218
+ For issues or suggestions, please check the code comments or modify as needed.
219
+
220
+ ---
221
+
222
+ **Happy budgeting! 💰**
app.py ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import json
4
+ from datetime import datetime
5
+ from typing import Tuple, Dict, Any
6
+ import os
7
+ import logging
8
+
9
+ try:
10
+ from langchain.llms import HuggingFaceHub
11
+ from langchain.prompts import PromptTemplate
12
+ from langchain.chains import LLMChain
13
+ except ImportError:
14
+ # Fallback: try OpenAI or basic mock
15
+ pass
16
+
17
+ from hf_storage import HFHubLedger
18
+
19
+ # Setup logging
20
+ logging.basicConfig(level=logging.INFO)
21
+ logger = logging.getLogger(__name__)
22
+
23
+
24
+ class ExpenseManager:
25
+ """Manages ledger entries and DataFrame operations."""
26
+
27
+ def __init__(self):
28
+ """Initialize the expense manager with an empty DataFrame."""
29
+ self.df = pd.DataFrame(
30
+ columns=["Date", "Description", "Category", "Amount"]
31
+ )
32
+ self.df["Date"] = pd.to_datetime(self.df["Date"])
33
+ self.df["Amount"] = pd.to_numeric(self.df["Amount"])
34
+
35
+ def add_entry(self, date: str, description: str, category: str, amount: float) -> bool:
36
+ """Add a new expense entry to the ledger."""
37
+ try:
38
+ new_entry = pd.DataFrame({
39
+ "Date": [pd.to_datetime(date)],
40
+ "Description": [description],
41
+ "Category": [category],
42
+ "Amount": [float(amount)]
43
+ })
44
+ self.df = pd.concat([self.df, new_entry], ignore_index=True)
45
+ self.df = self.df.sort_values("Date", ascending=False).reset_index(drop=True)
46
+ return True
47
+ except Exception as e:
48
+ print(f"Error adding entry: {e}")
49
+ return False
50
+
51
+ def get_dataframe(self) -> pd.DataFrame:
52
+ """Return the current DataFrame."""
53
+ return self.df.copy()
54
+
55
+ def get_total_spending(self) -> float:
56
+ """Calculate and return total spending."""
57
+ if self.df.empty:
58
+ return 0.0
59
+ return self.df["Amount"].sum()
60
+
61
+ def get_category_summary(self) -> Dict[str, float]:
62
+ """Get spending summary by category."""
63
+ if self.df.empty:
64
+ return {}
65
+ return self.df.groupby("Category")["Amount"].sum().to_dict()
66
+
67
+
68
+ def initialize_llm():
69
+ """Initialize the LLM. Supports HuggingFace or OpenAI."""
70
+ try:
71
+ # Try HuggingFace
72
+ api_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
73
+ if api_token:
74
+ llm = HuggingFaceHub(
75
+ repo_id="mistralai/Mistral-7B-Instruct-v0.2",
76
+ huggingfacehub_api_token=api_token,
77
+ model_kwargs={"temperature": 0.1, "max_length": 200}
78
+ )
79
+ return llm
80
+ except Exception as e:
81
+ print(f"HuggingFace initialization failed: {e}")
82
+
83
+ try:
84
+ # Fallback to OpenAI
85
+ from langchain.llms import OpenAI
86
+ api_key = os.getenv("OPENAI_API_KEY")
87
+ if api_key:
88
+ return OpenAI(temperature=0.1, max_tokens=200)
89
+ except Exception as e:
90
+ print(f"OpenAI initialization failed: {e}")
91
+
92
+ return None
93
+
94
+
95
+ def parse_expense_with_llm(user_input: str, llm) -> Dict[str, Any]:
96
+ """
97
+ Parse natural language input into structured expense data using LLM.
98
+ Returns a dictionary with keys: date, description, category, amount
99
+ """
100
+ if not llm:
101
+ return parse_expense_fallback(user_input)
102
+
103
+ prompt_template = PromptTemplate(
104
+ input_variables=["user_input"],
105
+ template="""Parse the following expense entry and extract the information into a JSON object.
106
+
107
+ User input: {user_input}
108
+
109
+ Return ONLY a valid JSON object with these fields (use today's date if not specified):
110
+ - date (YYYY-MM-DD format)
111
+ - description (what was purchased)
112
+ - category (e.g., Food, Transportation, Utilities, Entertainment, Other)
113
+ - amount (numeric value without currency symbol)
114
+
115
+ JSON:"""
116
+ )
117
+
118
+ chain = LLMChain(llm=llm, prompt=prompt_template)
119
+ response = chain.run(user_input=user_input)
120
+
121
+ try:
122
+ # Extract JSON from response
123
+ json_str = response.strip()
124
+ # Find JSON object in response
125
+ start_idx = json_str.find("{")
126
+ end_idx = json_str.rfind("}") + 1
127
+ if start_idx != -1 and end_idx > start_idx:
128
+ json_str = json_str[start_idx:end_idx]
129
+ parsed = json.loads(json_str)
130
+ return parsed
131
+ except json.JSONDecodeError as e:
132
+ print(f"JSON parsing error: {e}")
133
+ return parse_expense_fallback(user_input)
134
+
135
+
136
+ def parse_expense_fallback(user_input: str) -> Dict[str, Any]:
137
+ """
138
+ Fallback parser using regex and heuristics when LLM is unavailable.
139
+ """
140
+ import re
141
+
142
+ result = {
143
+ "date": datetime.now().strftime("%Y-%m-%d"),
144
+ "description": user_input,
145
+ "category": "Other",
146
+ "amount": 0.0
147
+ }
148
+
149
+ # Try to extract amount
150
+ amount_pattern = r"\$?(\d+(?:\.\d{2})?)"
151
+ amount_match = re.search(amount_pattern, user_input)
152
+ if amount_match:
153
+ result["amount"] = float(amount_match.group(1))
154
+
155
+ # Simple category detection
156
+ categories = {
157
+ "Food": ["food", "lunch", "dinner", "breakfast", "coffee", "restaurant", "burrito", "pizza", "eat"],
158
+ "Transportation": ["gas", "uber", "lyft", "taxi", "bus", "train", "parking", "car"],
159
+ "Utilities": ["electric", "water", "gas", "internet", "phone", "utility"],
160
+ "Entertainment": ["movie", "concert", "game", "book", "music"],
161
+ "Rent": ["rent", "apartment", "mortgage"],
162
+ }
163
+
164
+ user_lower = user_input.lower()
165
+ for category, keywords in categories.items():
166
+ if any(keyword in user_lower for keyword in keywords):
167
+ result["category"] = category
168
+ break
169
+
170
+ return result
171
+
172
+
173
+ def process_expense_entry(
174
+ user_input: str,
175
+ manager: ExpenseManager,
176
+ llm,
177
+ hf_ledger: HFHubLedger = None
178
+ ) -> Tuple[pd.DataFrame, str, str]:
179
+ """
180
+ Process user input, parse it, add to ledger, and return updated table.
181
+ """
182
+ if not user_input.strip():
183
+ return manager.get_dataframe(), "", "Please enter an expense description."
184
+
185
+ try:
186
+ # Parse the expense
187
+ parsed = parse_expense_with_llm(user_input, llm)
188
+
189
+ # Validate parsed data
190
+ if not parsed.get("amount") or parsed["amount"] <= 0:
191
+ return manager.get_dataframe(), "", "❌ Error: Could not extract valid amount. Try again."
192
+
193
+ # Add to ledger
194
+ success = manager.add_entry(
195
+ date=parsed.get("date", datetime.now().strftime("%Y-%m-%d")),
196
+ description=parsed.get("description", user_input),
197
+ category=parsed.get("category", "Other"),
198
+ amount=float(parsed["amount"])
199
+ )
200
+
201
+ if success:
202
+ # Sync to HF Hub if enabled
203
+ if hf_ledger:
204
+ hf_ledger.save(manager.df)
205
+
206
+ total = manager.get_total_spending()
207
+ message = f"✅ Logged: ${parsed['amount']:.2f} - {parsed['description']}"
208
+ return manager.get_dataframe(), "", message
209
+ else:
210
+ return manager.get_dataframe(), "", "❌ Error adding entry. Please try again."
211
+
212
+ except Exception as e:
213
+ return manager.get_dataframe(), "", f"❌ Error: {str(e)}"
214
+
215
+
216
+ def build_interface(manager, llm, hf_ledger: HFHubLedger):
217
+ """Build the Gradio interface."""
218
+
219
+ def log_expense_callback(user_input: str) -> Tuple[pd.DataFrame, str, str]:
220
+ """Callback for log expense button."""
221
+ df, cleared_input, message = process_expense_entry(user_input, manager, llm, hf_ledger)
222
+ total = manager.get_total_spending()
223
+ total_md = f"### 💰 Total Spending: ${total:.2f}"
224
+ return df, cleared_input, message, total_md
225
+
226
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
227
+ gr.Markdown("# 💸 Personal Finance Manager")
228
+ gr.Markdown("Log your expenses using natural language. The AI will parse and categorize them for you.")
229
+ gr.Markdown(f"**Storage Status:** {hf_ledger.get_status()}")
230
+
231
+ with gr.Row():
232
+ with gr.Column(scale=3):
233
+ user_input = gr.Textbox(
234
+ label="Describe your expense",
235
+ placeholder="e.g., 'Spent $15 on a burrito at Chipotle' or 'Paid $1200 for rent'",
236
+ lines=2
237
+ )
238
+ with gr.Column(scale=1):
239
+ log_button = gr.Button("Log Expense", variant="primary", scale=1)
240
+
241
+ status_output = gr.Textbox(
242
+ label="Status",
243
+ interactive=False,
244
+ max_lines=1
245
+ )
246
+
247
+ total_display = gr.Markdown("### 💰 Total Spending: $0.00")
248
+
249
+ gr.Markdown("## 📊 Ledger")
250
+ ledger_table = gr.Dataframe(
251
+ value=manager.get_dataframe(),
252
+ interactive=False,
253
+ label="Expense Entries",
254
+ datatype=["str", "str", "str", "number"],
255
+ )
256
+
257
+ # Connect button click to callback
258
+ log_button.click(
259
+ fn=log_expense_callback,
260
+ inputs=[user_input],
261
+ outputs=[ledger_table, user_input, status_output, total_display]
262
+ )
263
+
264
+ # Allow Enter key to submit
265
+ user_input.submit(
266
+ fn=log_expense_callback,
267
+ inputs=[user_input],
268
+ outputs=[ledger_table, user_input, status_output, total_display]
269
+ )
270
+
271
+ return demo
272
+
273
+
274
+ def main():
275
+ """Main entry point."""
276
+ # Initialize HuggingFace Hub ledger
277
+ hf_ledger = HFHubLedger()
278
+
279
+ # Initialize components
280
+ manager = ExpenseManager()
281
+
282
+ # Load existing data from HF Hub if available
283
+ if hf_ledger.df is not None and not hf_ledger.df.empty:
284
+ manager.df = hf_ledger.df.copy()
285
+ logger.info(f"Loaded {len(manager.df)} entries from persistent storage")
286
+
287
+ llm = initialize_llm()
288
+
289
+ if not llm:
290
+ logger.warning("⚠️ Warning: LLM not available. Using fallback parser.")
291
+
292
+ # Build and launch interface
293
+ demo = build_interface(manager, llm, hf_ledger)
294
+ demo.launch(share=False)
295
+
296
+
297
+ if __name__ == "__main__":
298
+ main()
hf_storage.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """HuggingFace Hub storage integration for persistent ledger management."""
2
+
3
+ import os
4
+ import time
5
+ import pandas as pd
6
+ import tempfile
7
+ from pathlib import Path
8
+ from typing import Optional
9
+ import logging
10
+
11
+ logger = logging.getLogger(__name__)
12
+
13
+
14
+ class HFHubLedger:
15
+ """Manages ledger CSV persistence using HuggingFace Hub storage."""
16
+
17
+ def __init__(
18
+ self,
19
+ hf_token: Optional[str] = None,
20
+ repo_id: Optional[str] = None,
21
+ repo_type: str = "dataset",
22
+ csv_filename: str = "ledger.csv",
23
+ local_cache_dir: str = "./cache",
24
+ max_retries: int = 3,
25
+ retry_delay: float = 1.0,
26
+ ):
27
+ """
28
+ Initialize HuggingFace Hub ledger storage.
29
+
30
+ Args:
31
+ hf_token: HuggingFace API token (uses HF_TOKEN env var if not provided)
32
+ repo_id: Repository ID in format "username/repo-name"
33
+ repo_type: Type of repo ("dataset", "model", "space")
34
+ csv_filename: Name of the CSV file in the repo
35
+ local_cache_dir: Local directory for caching
36
+ max_retries: Maximum number of upload retries
37
+ retry_delay: Initial delay between retries (exponential backoff)
38
+ """
39
+ self.hf_token = hf_token or os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACEHUB_API_TOKEN")
40
+ self.repo_id = repo_id or os.getenv("HF_REPO_ID")
41
+ self.repo_type = repo_type
42
+ self.csv_filename = csv_filename
43
+ self.local_cache_dir = local_cache_dir
44
+ self.max_retries = max_retries
45
+ self.retry_delay = retry_delay
46
+ self.enabled = self.hf_token and self.repo_id
47
+ self.df = None
48
+
49
+ # Create local cache directory
50
+ Path(self.local_cache_dir).mkdir(parents=True, exist_ok=True)
51
+ self.local_csv_path = Path(self.local_cache_dir) / self.csv_filename
52
+
53
+ if self.enabled:
54
+ logger.info(f"HF Hub storage enabled: {self.repo_id}")
55
+ self._ensure_repo_exists()
56
+ self._load_from_hub()
57
+ else:
58
+ logger.warning("HF Hub storage disabled. Set HF_TOKEN and HF_REPO_ID to enable.")
59
+ self._load_local_or_create()
60
+
61
+ def _ensure_repo_exists(self) -> bool:
62
+ """
63
+ Ensure the HuggingFace Hub repository exists.
64
+
65
+ Returns:
66
+ True if repo exists or was created, False otherwise
67
+ """
68
+ try:
69
+ from huggingface_hub import create_repo, repo_exists
70
+
71
+ if repo_exists(self.repo_id, repo_type=self.repo_type, token=self.hf_token):
72
+ logger.info(f"Repository {self.repo_id} exists")
73
+ return True
74
+
75
+ # Create repo if it doesn't exist
76
+ repo_url = create_repo(
77
+ self.repo_id,
78
+ repo_type=self.repo_type,
79
+ private=True,
80
+ exist_ok=True,
81
+ token=self.hf_token,
82
+ )
83
+ logger.info(f"Created repository: {repo_url}")
84
+ return True
85
+ except Exception as e:
86
+ logger.error(f"Failed to ensure repo exists: {e}")
87
+ return False
88
+
89
+ def _load_from_hub(self) -> bool:
90
+ """
91
+ Download and load CSV from HuggingFace Hub.
92
+
93
+ Returns:
94
+ True if successful, False otherwise
95
+ """
96
+ try:
97
+ from huggingface_hub import hf_hub_download
98
+
99
+ logger.info(f"Attempting to download {self.csv_filename} from {self.repo_id}")
100
+
101
+ file_path = hf_hub_download(
102
+ repo_id=self.repo_id,
103
+ filename=self.csv_filename,
104
+ repo_type=self.repo_type,
105
+ token=self.hf_token,
106
+ cache_dir=self.local_cache_dir,
107
+ )
108
+
109
+ # Load CSV
110
+ self.df = pd.read_csv(file_path)
111
+ self.df["Date"] = pd.to_datetime(self.df["Date"])
112
+ self.df["Amount"] = pd.to_numeric(self.df["Amount"])
113
+ self.df = self.df.sort_values("Date", ascending=False).reset_index(drop=True)
114
+
115
+ logger.info(f"Loaded {len(self.df)} entries from HF Hub")
116
+ return True
117
+
118
+ except Exception as e:
119
+ logger.warning(f"Could not load from Hub: {e}. Starting fresh.")
120
+ self._load_local_or_create()
121
+ return False
122
+
123
+ def _load_local_or_create(self) -> bool:
124
+ """
125
+ Load CSV from local cache or create new DataFrame.
126
+
127
+ Returns:
128
+ True if loaded, False if created new
129
+ """
130
+ if self.local_csv_path.exists():
131
+ try:
132
+ self.df = pd.read_csv(self.local_csv_path)
133
+ self.df["Date"] = pd.to_datetime(self.df["Date"])
134
+ self.df["Amount"] = pd.to_numeric(self.df["Amount"])
135
+ logger.info(f"Loaded {len(self.df)} entries from local cache")
136
+ return True
137
+ except Exception as e:
138
+ logger.warning(f"Failed to load local CSV: {e}")
139
+
140
+ # Create new empty DataFrame
141
+ self.df = pd.DataFrame(columns=["Date", "Description", "Category", "Amount"])
142
+ self.df["Date"] = pd.to_datetime(self.df["Date"])
143
+ self.df["Amount"] = pd.to_numeric(self.df["Amount"])
144
+ logger.info("Created new empty ledger")
145
+ return False
146
+
147
+ def save(self, df: pd.DataFrame) -> bool:
148
+ """
149
+ Save DataFrame to local cache and optionally to HF Hub.
150
+
151
+ Args:
152
+ df: DataFrame to save
153
+
154
+ Returns:
155
+ True if successful, False otherwise
156
+ """
157
+ try:
158
+ # Save locally first
159
+ df_copy = df.copy()
160
+ df_copy["Date"] = df_copy["Date"].dt.strftime("%Y-%m-%d")
161
+ df_copy.to_csv(self.local_csv_path, index=False)
162
+ self.df = df
163
+
164
+ # Upload to Hub if enabled
165
+ if self.enabled:
166
+ self._upload_to_hub_with_retry()
167
+
168
+ return True
169
+ except Exception as e:
170
+ logger.error(f"Failed to save ledger: {e}")
171
+ return False
172
+
173
+ def _upload_to_hub_with_retry(self) -> bool:
174
+ """
175
+ Upload CSV to HuggingFace Hub with exponential backoff retry.
176
+
177
+ Returns:
178
+ True if successful, False otherwise
179
+ """
180
+ for attempt in range(self.max_retries):
181
+ try:
182
+ from huggingface_hub import upload_file
183
+
184
+ logger.info(f"Uploading to HF Hub (attempt {attempt + 1}/{self.max_retries})")
185
+
186
+ upload_file(
187
+ path_or_fileobj=str(self.local_csv_path),
188
+ path_in_repo=self.csv_filename,
189
+ repo_id=self.repo_id,
190
+ repo_type=self.repo_type,
191
+ token=self.hf_token,
192
+ commit_message=f"Auto-save ledger at {pd.Timestamp.now()}",
193
+ )
194
+
195
+ logger.info("Successfully uploaded to HF Hub")
196
+ return True
197
+
198
+ except Exception as e:
199
+ wait_time = self.retry_delay * (2 ** attempt) # Exponential backoff
200
+ logger.warning(f"Upload failed (attempt {attempt + 1}): {e}")
201
+
202
+ if attempt < self.max_retries - 1:
203
+ logger.info(f"Retrying in {wait_time:.1f}s...")
204
+ time.sleep(wait_time)
205
+ else:
206
+ logger.error(f"Failed to upload after {self.max_retries} attempts")
207
+ return False
208
+
209
+ return False
210
+
211
+ def get_dataframe(self) -> pd.DataFrame:
212
+ """Return a copy of the current DataFrame."""
213
+ if self.df is None:
214
+ return pd.DataFrame(columns=["Date", "Description", "Category", "Amount"])
215
+ return self.df.copy()
216
+
217
+ def add_entry(self, date: str, description: str, category: str, amount: float) -> bool:
218
+ """
219
+ Add a new entry and save.
220
+
221
+ Args:
222
+ date: Date in YYYY-MM-DD format
223
+ description: Expense description
224
+ category: Expense category
225
+ amount: Amount in dollars
226
+
227
+ Returns:
228
+ True if successful, False otherwise
229
+ """
230
+ try:
231
+ new_entry = pd.DataFrame({
232
+ "Date": [pd.to_datetime(date)],
233
+ "Description": [description],
234
+ "Category": [category],
235
+ "Amount": [float(amount)]
236
+ })
237
+ self.df = pd.concat([self.df, new_entry], ignore_index=True)
238
+ self.df = self.df.sort_values("Date", ascending=False).reset_index(drop=True)
239
+
240
+ # Save immediately
241
+ return self.save(self.df)
242
+ except Exception as e:
243
+ logger.error(f"Failed to add entry: {e}")
244
+ return False
245
+
246
+ def get_total_spending(self) -> float:
247
+ """Calculate and return total spending."""
248
+ if self.df is None or self.df.empty:
249
+ return 0.0
250
+ return float(self.df["Amount"].sum())
251
+
252
+ def get_category_summary(self) -> dict:
253
+ """Get spending summary by category."""
254
+ if self.df is None or self.df.empty:
255
+ return {}
256
+ return self.df.groupby("Category")["Amount"].sum().to_dict()
257
+
258
+ def is_enabled(self) -> bool:
259
+ """Check if HF Hub storage is enabled."""
260
+ return self.enabled
261
+
262
+ def get_status(self) -> str:
263
+ """Get human-readable status string."""
264
+ if self.enabled:
265
+ return f"✅ HF Hub: {self.repo_id}"
266
+ else:
267
+ return "⚠️ Local cache only (HF Hub disabled)"
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ pandas>=2.0.0
3
+ langchain>=0.1.0
4
+ huggingface-hub>=0.17.0
5
+ python-dotenv>=1.0.0
utils.py ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Utility functions for the Finance Manager application."""
2
+
3
+ import pandas as pd
4
+ import os
5
+ from datetime import datetime
6
+ from typing import Optional
7
+
8
+
9
+ class CSVLedger:
10
+ """Handles CSV persistence for the expense ledger."""
11
+
12
+ def __init__(self, filepath: str = "ledger.csv"):
13
+ """
14
+ Initialize the CSV ledger handler.
15
+
16
+ Args:
17
+ filepath: Path to the CSV file
18
+ """
19
+ self.filepath = filepath
20
+ self.df = self._load_or_create()
21
+
22
+ def _load_or_create(self) -> pd.DataFrame:
23
+ """Load existing CSV or create new DataFrame."""
24
+ if os.path.exists(self.filepath):
25
+ try:
26
+ df = pd.read_csv(self.filepath)
27
+ df["Date"] = pd.to_datetime(df["Date"])
28
+ df["Amount"] = pd.to_numeric(df["Amount"])
29
+ return df.sort_values("Date", ascending=False).reset_index(drop=True)
30
+ except Exception as e:
31
+ print(f"Error loading CSV: {e}. Creating new ledger.")
32
+
33
+ return pd.DataFrame(columns=["Date", "Description", "Category", "Amount"])
34
+
35
+ def save(self, df: pd.DataFrame) -> bool:
36
+ """
37
+ Save DataFrame to CSV.
38
+
39
+ Args:
40
+ df: DataFrame to save
41
+
42
+ Returns:
43
+ True if successful, False otherwise
44
+ """
45
+ try:
46
+ # Convert datetime to string for CSV
47
+ df_copy = df.copy()
48
+ df_copy["Date"] = df_copy["Date"].dt.strftime("%Y-%m-%d")
49
+ df_copy.to_csv(self.filepath, index=False)
50
+ return True
51
+ except Exception as e:
52
+ print(f"Error saving CSV: {e}")
53
+ return False
54
+
55
+ def append_from_dataframe(self, df: pd.DataFrame) -> bool:
56
+ """
57
+ Append DataFrame entries to CSV.
58
+
59
+ Args:
60
+ df: DataFrame with new entries
61
+
62
+ Returns:
63
+ True if successful, False otherwise
64
+ """
65
+ self.df = pd.concat([self.df, df], ignore_index=True)
66
+ self.df = self.df.sort_values("Date", ascending=False).reset_index(drop=True)
67
+ return self.save(self.df)
68
+
69
+
70
+ def format_currency(amount: float) -> str:
71
+ """
72
+ Format amount as USD currency.
73
+
74
+ Args:
75
+ amount: Numeric amount
76
+
77
+ Returns:
78
+ Formatted string like "$123.45"
79
+ """
80
+ return f"${amount:,.2f}"
81
+
82
+
83
+ def parse_date_flexible(date_str: Optional[str]) -> str:
84
+ """
85
+ Parse various date formats and return ISO format (YYYY-MM-DD).
86
+
87
+ Args:
88
+ date_str: Date string in various formats or None
89
+
90
+ Returns:
91
+ ISO format date string
92
+ """
93
+ if not date_str or date_str.lower() == "today" or date_str.lower() == "now":
94
+ return datetime.now().strftime("%Y-%m-%d")
95
+
96
+ # Try common formats
97
+ formats = [
98
+ "%Y-%m-%d",
99
+ "%m/%d/%Y",
100
+ "%m/%d/%y",
101
+ "%m-%d-%Y",
102
+ "%d/%m/%Y",
103
+ "%Y/%m/%d",
104
+ ]
105
+
106
+ for fmt in formats:
107
+ try:
108
+ dt = datetime.strptime(date_str.strip(), fmt)
109
+ return dt.strftime("%Y-%m-%d")
110
+ except ValueError:
111
+ continue
112
+
113
+ # Default to today
114
+ return datetime.now().strftime("%Y-%m-%d")
115
+
116
+
117
+ def get_spending_summary(df: pd.DataFrame) -> dict:
118
+ """
119
+ Generate spending summary by category.
120
+
121
+ Args:
122
+ df: Expense DataFrame
123
+
124
+ Returns:
125
+ Dictionary with category totals
126
+ """
127
+ if df.empty:
128
+ return {}
129
+
130
+ summary = df.groupby("Category")["Amount"].agg(["sum", "count"]).to_dict("index")
131
+ return {
132
+ cat: {
133
+ "total": values["sum"],
134
+ "count": int(values["count"]),
135
+ "average": values["sum"] / values["count"]
136
+ }
137
+ for cat, values in summary.items()
138
+ }
139
+
140
+
141
+ def get_daily_summary(df: pd.DataFrame) -> pd.DataFrame:
142
+ """
143
+ Generate daily spending summary.
144
+
145
+ Args:
146
+ df: Expense DataFrame
147
+
148
+ Returns:
149
+ DataFrame with daily totals
150
+ """
151
+ if df.empty:
152
+ return pd.DataFrame(columns=["Date", "Total", "Count"])
153
+
154
+ daily = df.groupby(df["Date"].dt.date).agg({
155
+ "Amount": ["sum", "count"]
156
+ }).reset_index()
157
+ daily.columns = ["Date", "Total", "Count"]
158
+ return daily.sort_values("Date", ascending=False)
159
+
160
+
161
+ def validate_expense_data(date: str, description: str, category: str, amount: float) -> tuple[bool, str]:
162
+ """
163
+ Validate expense entry data.
164
+
165
+ Args:
166
+ date: Date string
167
+ description: Expense description
168
+ category: Expense category
169
+ amount: Amount in dollars
170
+
171
+ Returns:
172
+ Tuple of (is_valid, error_message)
173
+ """
174
+ errors = []
175
+
176
+ # Validate date
177
+ if not date:
178
+ errors.append("Date is required")
179
+ else:
180
+ try:
181
+ datetime.strptime(date, "%Y-%m-%d")
182
+ except ValueError:
183
+ errors.append("Date must be in YYYY-MM-DD format")
184
+
185
+ # Validate description
186
+ if not description or len(description.strip()) == 0:
187
+ errors.append("Description is required")
188
+ elif len(description) > 500:
189
+ errors.append("Description is too long (max 500 characters)")
190
+
191
+ # Validate category
192
+ if not category or len(category.strip()) == 0:
193
+ errors.append("Category is required")
194
+
195
+ # Validate amount
196
+ if amount is None or amount <= 0:
197
+ errors.append("Amount must be greater than 0")
198
+ elif amount > 999999.99:
199
+ errors.append("Amount is too large (max $999,999.99)")
200
+
201
+ if errors:
202
+ return False, "\n".join(errors)
203
+
204
+ return True, ""
205
+
206
+
207
+ def export_to_csv(df: pd.DataFrame, filepath: str) -> bool:
208
+ """
209
+ Export DataFrame to CSV file.
210
+
211
+ Args:
212
+ df: DataFrame to export
213
+ filepath: Output file path
214
+
215
+ Returns:
216
+ True if successful, False otherwise
217
+ """
218
+ try:
219
+ df_copy = df.copy()
220
+ df_copy["Date"] = df_copy["Date"].dt.strftime("%Y-%m-%d")
221
+ df_copy.to_csv(filepath, index=False)
222
+ return True
223
+ except Exception as e:
224
+ print(f"Error exporting to CSV: {e}")
225
+ return False