dystomachina commited on
Commit
203d1e9
·
1 Parent(s): 8803281

refactor(data-fetcher): simplify data fetcher to use only YFinance

Browse files

BREAKING CHANGE: Removed FMP data fetcher and related configuration options

- Removed src/fmp.py module and all related tests
- Consolidated DataFetcherSingleton into a single implementation in src/stockdata.py
- Removed data_source configuration option from folio.yaml
- Simplified create_data_fetcher to only support YFinance
- Updated all imports and usages to use the simplified data fetcher
- Removed scripts/check_beta.py and tests/fetch_sample_data.py that depended on FMP
- Updated documentation to reflect YFinance as the only supported data source

This change simplifies the codebase by removing unnecessary abstraction and
focusing on a single data source implementation.

.env.example CHANGED
@@ -1,8 +1,6 @@
1
  # Environment variables for the Folio application
2
 
3
  # API Keys
4
- # FMP (FinancialModelingPrep) API - not required when using yfinance:
5
- FMP_API_KEY=your_fmp_api_key_here
6
 
7
  # Used for portfolio analysis premium feature
8
  GEMINI_API_KEY=your_gemini_api_key_here
 
1
  # Environment variables for the Folio application
2
 
3
  # API Keys
 
 
4
 
5
  # Used for portfolio analysis premium feature
6
  GEMINI_API_KEY=your_gemini_api_key_here
.gitignore CHANGED
@@ -80,5 +80,5 @@ src/lab/portfolio.csv
80
  .executive.md
81
  private-data/*
82
 
83
- # Documentation folder (temporary exclusion)
84
- docs/
 
80
  .executive.md
81
  private-data/*
82
 
83
+ # Local documentation folder
84
+ .docs/
DOCKER.md CHANGED
@@ -99,7 +99,7 @@ If you prefer to use Docker directly without Make:
99
 
100
  - **Port conflicts**: If port 8050 is already in use, modify the `PORT` environment variable in your `.env` file and update the port mapping in `docker-compose.yml`.
101
 
102
- - **Data source issues**: By default, the application uses yfinance as the data source. If you want to use FMP instead, you'll need to set the FMP_API_KEY in your `.env` file and change DATA_SOURCE to 'fmp'.
103
 
104
  - **Volume mounting**: If you're making changes to the code and want to see them reflected immediately, ensure the volumes in `docker-compose.yml` are correctly mapping your local directories.
105
 
 
99
 
100
  - **Port conflicts**: If port 8050 is already in use, modify the `PORT` environment variable in your `.env` file and update the port mapping in `docker-compose.yml`.
101
 
102
+ - **Data source**: The application uses yfinance as the data source for stock data.
103
 
104
  - **Volume mounting**: If you're making changes to the code and want to see them reflected immediately, ensure the volumes in `docker-compose.yml` are correctly mapping your local directories.
105
 
README.md CHANGED
@@ -17,20 +17,19 @@ Folio is a powerful web-based dashboard for analyzing and optimizing your invest
17
 
18
  - **Complete Portfolio Visibility**: See your entire financial picture in one place
19
  - **Smart Risk Assessment**: Understand your portfolio's risk profile with beta analysis
20
- - **AI-Powered Insights**: Get personalized investment advice from our AI portfolio advisor
21
  - **Cash & Equivalents Detection**: Automatically identifies money market and cash-like positions
22
- - **Option Analytics**: Detailed metrics for options including implied volatility and Greeks
23
  - **Zero Cost**: Free to use, with no hidden fees or subscriptions
24
 
25
  ## Key Features
26
 
27
  - **Portfolio Summary**: View total exposure, beta, and allocation breakdown
28
  - **Position Details**: Analyze individual positions with detailed metrics
29
- - **AI Portfolio Advisor**: Get personalized investment advice powered by Google's Gemini AI
 
30
  - **Filtering & Sorting**: Filter by position type and sort by various metrics
31
  - **Real-time Data**: Uses Yahoo Finance API for up-to-date market data
32
  - **Responsive Design**: Works seamlessly on desktop and mobile devices
33
- - **Dark Mode**: Easy on the eyes for late-night financial analysis
34
 
35
  ## Getting Started
36
 
@@ -99,12 +98,14 @@ For more Docker commands and options, see [DOCKER.md](DOCKER.md).
99
 
100
  For information about logging configuration, see [docs/logging.md](docs/logging.md).
101
 
 
 
102
  ## Using Folio
103
 
104
  1. **Upload Your Portfolio**: Use the upload button to import a CSV file with your holdings
105
  2. **Explore Your Data**: View summary metrics and detailed breakdowns of your investments
106
  3. **Filter and Sort**: Focus on specific asset types or metrics that matter to you
107
- 4. **Get AI Insights**: Click the "Robot Advisor" button to get personalized advice about your portfolio
108
  5. **Export or Share**: Save your analysis or share insights with your financial advisor
109
 
110
  ## Sample Portfolio
 
17
 
18
  - **Complete Portfolio Visibility**: See your entire financial picture in one place
19
  - **Smart Risk Assessment**: Understand your portfolio's risk profile with beta analysis
 
20
  - **Cash & Equivalents Detection**: Automatically identifies money market and cash-like positions
21
+ - **Option Analytics**: Detailed metrics for options including delta exposure and notional value
22
  - **Zero Cost**: Free to use, with no hidden fees or subscriptions
23
 
24
  ## Key Features
25
 
26
  - **Portfolio Summary**: View total exposure, beta, and allocation breakdown
27
  - **Position Details**: Analyze individual positions with detailed metrics
28
+ - **Position Grouping**: Automatically groups stocks with their related options
29
+ - **P&L Visualization**: See potential profit/loss scenarios for option strategies
30
  - **Filtering & Sorting**: Filter by position type and sort by various metrics
31
  - **Real-time Data**: Uses Yahoo Finance API for up-to-date market data
32
  - **Responsive Design**: Works seamlessly on desktop and mobile devices
 
33
 
34
  ## Getting Started
35
 
 
98
 
99
  For information about logging configuration, see [docs/logging.md](docs/logging.md).
100
 
101
+ For a detailed explanation of the project architecture, see [docs/project-design.md](docs/project-design.md).
102
+
103
  ## Using Folio
104
 
105
  1. **Upload Your Portfolio**: Use the upload button to import a CSV file with your holdings
106
  2. **Explore Your Data**: View summary metrics and detailed breakdowns of your investments
107
  3. **Filter and Sort**: Focus on specific asset types or metrics that matter to you
108
+ 4. **Analyze Positions**: Click on any position to see detailed metrics and P&L scenarios
109
  5. **Export or Share**: Save your analysis or share insights with your financial advisor
110
 
111
  ## Sample Portfolio
docs/ai-rules.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ description: Miscellaneous rules to get the AI to behave
3
+ globs: *
4
+ alwaysApply: true
5
+ ---
6
+ # General rules for AI
7
+ - Prior to generating any code, carefully read the project conventions
8
+ - Read [project-design.md](docs/project-design.md) to understand the codebase
9
+ - Read [project-conventions.md](docs/project-conventions.md) to understand _how_ to write code for the codebase
10
+
11
+ ## Prohibited actions
12
+
13
+ - Do not run `make folio`. This is for the user to run only.
14
+ - Do not use `git` commands unless explicitly asked.
docs/coding-guidelines.md ADDED
@@ -0,0 +1,405 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # A Concise, Opinionated Guide to Writing Good Code (with Python examples)
2
+
3
+ This guide summarizes core principles for writing clean, maintainable, and effective code. It's opinionated and rule-based, designed to provide clear direction for junior developers. Adhering to these rules will help you build better software and become a more valuable team member. Python examples are provided for clarity.
4
+
5
+ ## 1. Naming Matters Immensely
6
+
7
+ * **Rule:** Use intention-revealing names.
8
+ * **Don't:** `d = (datetime.now() - start_date).days`
9
+ * **Do:** `elapsed_time_in_days = (datetime.now() - start_date).days`
10
+ * **Rule:** Avoid disinformation.
11
+ * **Don't:** `account_list = {"id": 1, "name": "Alice"}` (It's a dictionary, not a list)
12
+ * **Do:** `account_data = {"id": 1, "name": "Alice"}` or `account_dict = ...`
13
+ * **Rule:** Use pronounceable and searchable names.
14
+ * **Don't:** `genymdhms = datetime.now().strftime('%Y%m%d%H%M%S')`
15
+ * **Do:** `generation_timestamp = datetime.now().strftime('%Y%m%d%H%M%S')`
16
+ * **Rule:** Be consistent.
17
+ * **Don't:** Using `fetch_user_data`, `getUserInfo`, `retrieve_client_details` in the same project.
18
+ * **Do:** Consistently use one style, e.g., `get_user_data`, `get_order_info`, `get_product_details`.
19
+
20
+ ## 2. Functions Should Be Small and Focused
21
+
22
+ * **Rule:** Functions must do **one thing**.
23
+ * **Don't:**
24
+ ```python
25
+ def process_user_data(user_id):
26
+ # Fetches data
27
+ response = requests.get(f"/api/users/{user_id}")
28
+ user_data = response.json()
29
+ # Validates data
30
+ if not user_data.get("email"):
31
+ raise ValueError("Email missing")
32
+ # Saves data
33
+ db.save(user_data)
34
+ # Sends notification
35
+ send_email(user_data["email"], "Welcome!")
36
+ return user_data
37
+ ```
38
+ * **Do:** Break it down:
39
+ ```python
40
+ def fetch_user_data(user_id):
41
+ response = requests.get(f"/api/users/{user_id}")
42
+ response.raise_for_status() # Raise HTTP errors
43
+ return response.json()
44
+
45
+ def validate_user_data(user_data):
46
+ if not user_data.get("email"):
47
+ raise ValueError("Email missing")
48
+ # ... other validations
49
+
50
+ def save_user_data(user_data):
51
+ db.save(user_data)
52
+
53
+ def send_welcome_email(email_address):
54
+ send_email(email_address, "Welcome!")
55
+
56
+ def register_user(user_id):
57
+ user_data = fetch_user_data(user_id)
58
+ validate_user_data(user_data)
59
+ save_user_data(user_data)
60
+ send_welcome_email(user_data["email"])
61
+ return user_data
62
+ ```
63
+ * **Rule:** Functions must be **small**. (The "Do" example above also illustrates this).
64
+ * **Rule:** Minimize function arguments.
65
+ * **Don't:** `def create_user(name, email, password, dob, address, phone, role, is_active): ...`
66
+ * **Do:**
67
+ ```python
68
+ class UserProfile:
69
+ def __init__(self, name, email, dob, address, phone):
70
+ # ... initialization ...
71
+
72
+ def create_user(profile: UserProfile, password: str, role: str, is_active: bool = True):
73
+ # ... use profile attributes ...
74
+ ```
75
+ Or pass a dictionary:
76
+ ```python
77
+ def create_user(user_details: dict):
78
+ # Access details via user_details['name'], user_details['email'] etc.
79
+ # Consider using TypedDict for better structure if using Python 3.8+
80
+ ...
81
+ ```
82
+ * **Rule:** Avoid side effects where possible.
83
+ * **Don't (Hidden Side Effect):**
84
+ ```python
85
+ user_list = []
86
+ def add_user_if_valid(name, email):
87
+ if "@" in email:
88
+ user_list.append({"name": name, "email": email}) # Modifies global state
89
+ return True
90
+ return False
91
+ ```
92
+ * **Do (Explicit):**
93
+ ```python
94
+ def create_user_record(name, email):
95
+ if "@" not in email:
96
+ raise ValueError("Invalid email")
97
+ return {"name": name, "email": email}
98
+
99
+ # Usage
100
+ try:
101
+ new_user = create_user_record("Bob", "bob@example.com")
102
+ user_list.append(new_user) # State change happens outside the function
103
+ except ValueError as e:
104
+ print(f"Error: {e}")
105
+ ```
106
+
107
+ ## 3. Comments Are for "Why," Not "What"
108
+
109
+ * **Rule:** Comment the "Why," not the "What."
110
+ * **Don't:**
111
+ ```python
112
+ # Check if user is eligible
113
+ if age >= 18 and country == "US": # This just repeats the code
114
+ is_eligible = True
115
+ ```
116
+ * **Do:**
117
+ ```python
118
+ # User must be a legal adult in the US to qualify for this specific offer.
119
+ if age >= 18 and country == "US":
120
+ is_eligible = True
121
+ ```
122
+ * **Rule:** Do **not** leave commented-out code.
123
+ * **Don't:**
124
+ ```python
125
+ def calculate_total(items):
126
+ total = 0
127
+ for item in items:
128
+ total += item['price']
129
+ # tax = total * 0.10 # Old tax calculation
130
+ # total += tax
131
+ total *= 1.10 # Apply 10% tax
132
+ return total
133
+ ```
134
+ * **Do:** Remove the commented lines. Use Git history if you need to see the old calculation.
135
+ ```python
136
+ def calculate_total(items):
137
+ total = sum(item['price'] for item in items)
138
+ total *= 1.10 # Apply 10% tax
139
+ return total
140
+ ```
141
+ * **Rule:** Keep comments up-to-date. (Self-explanatory - if the logic changes, update or remove the comment).
142
+ * **Rule:** Avoid redundant comments.
143
+ * **Don't:**
144
+ ```python
145
+ count = 0 # Initialize count
146
+ count += 1 # Increment count
147
+ ```
148
+ * **Do:** Just the code is enough.
149
+ ```python
150
+ count = 0
151
+ count += 1
152
+ ```
153
+
154
+ ## 4. Formatting and Structure Enhance Readability
155
+
156
+ * **Rule:** Use a consistent style guide (e.g., PEP 8 for Python). Use tools like `Black`, `Flake8`, `isort`.
157
+ * **Don't:** Inconsistent spacing, line lengths, import orders.
158
+ * **Do:** Code automatically formatted by tools like `Black`.
159
+ * **Rule:** Top-down narrative.
160
+ * **Don't:** Define helper functions *before* the main function that uses them, forcing the reader to jump around.
161
+ * **Do:**
162
+ ```python
163
+ def main_process():
164
+ data = _fetch_data()
165
+ result = _process_data(data)
166
+ _save_result(result)
167
+
168
+ # --- Helper functions defined below ---
169
+ def _fetch_data(): ...
170
+ def _process_data(data): ...
171
+ def _save_result(result): ...
172
+ ```
173
+ *(Note: Leading underscore `_` often indicates internal/helper functions)*
174
+ * **Rule:** Keep related concepts vertically close. (The example above also shows this).
175
+ * **Rule:** Use whitespace.
176
+ * **Don't:**
177
+ ```python
178
+ def process(a,b,c):
179
+ x=a+b
180
+ y=x*c
181
+ if y>10:
182
+ print("Large")
183
+ else:
184
+ print("Small")
185
+ z=y-a
186
+ return z
187
+ ```
188
+ * **Do:**
189
+ ```python
190
+ def process(a, b, c):
191
+ intermediate_value = a + b
192
+ final_value = intermediate_value * c
193
+
194
+ if final_value > 10:
195
+ print("Large")
196
+ else:
197
+ print("Small")
198
+
199
+ adjusted_value = final_value - a
200
+ return adjusted_value
201
+ ```
202
+
203
+ ## 5. Keep It Simple (KISS & YAGNI)
204
+
205
+ * **Rule:** KISS (Keep It Simple, Stupid).
206
+ * **Don't:** Using complex metaprogramming or obscure language features when a simple loop or conditional would suffice.
207
+ * **Do:** Prefer straightforward, readable solutions.
208
+ * **Rule:** YAGNI (You Ain't Gonna Need It).
209
+ * **Don't:** Adding configuration options, database fields, or API endpoints for features that *might* be needed in the future but aren't required now.
210
+ * **Do:** Implement only what's necessary for the current requirements.
211
+ * **Rule:** Avoid premature optimization.
212
+ * **Don't:** Spending hours micro-optimizing a function with string concatenations before profiling to see if it's even a bottleneck.
213
+ * **Do:** Write clean code first. If performance is an issue (measure it!), profile and optimize the specific hotspots. Often, a better algorithm beats micro-optimization.
214
+
215
+ ## 6. Don't Repeat Yourself (DRY)
216
+
217
+ * **Rule:** Avoid duplication.
218
+ * **Don't:**
219
+ ```python
220
+ def process_file_a(path):
221
+ # 10 lines of validation logic
222
+ if not valid: return None
223
+ # Process file A specific logic
224
+ ...
225
+
226
+ def process_file_b(path):
227
+ # Same 10 lines of validation logic copied here
228
+ if not valid: return None
229
+ # Process file B specific logic
230
+ ...
231
+ ```
232
+ * **Do:**
233
+ ```python
234
+ def _validate_input(path):
235
+ # 10 lines of validation logic
236
+ return is_valid
237
+
238
+ def process_file_a(path):
239
+ if not _validate_input(path): return None
240
+ # Process file A specific logic
241
+ ...
242
+
243
+ def process_file_b(path):
244
+ if not _validate_input(path): return None
245
+ # Process file B specific logic
246
+ ...
247
+ ```
248
+
249
+ ## 7. Handle Errors Gracefully
250
+
251
+ * **Rule:** Use exceptions over error codes.
252
+ * **Don't:**
253
+ ```python
254
+ def divide(a, b):
255
+ if b == 0:
256
+ return -1 # Error code
257
+ return a / b
258
+
259
+ result = divide(10, 0)
260
+ if result == -1:
261
+ print("Error: Division by zero")
262
+ ```
263
+ * **Do:**
264
+ ```python
265
+ def divide(a, b):
266
+ if b == 0:
267
+ raise ValueError("Cannot divide by zero")
268
+ return a / b
269
+
270
+ try:
271
+ result = divide(10, 0)
272
+ except ValueError as e:
273
+ print(f"Error: {e}")
274
+ ```
275
+ * **Rule:** Provide context with errors.
276
+ * **Don't:** `raise Exception("Error!")`
277
+ * **Do:** `raise ValueError(f"Invalid user ID format: '{user_id_str}'")`
278
+
279
+ ## 8. Test Your Code
280
+
281
+ * **Rule:** Write unit tests (using frameworks like `pytest` or `unittest`).
282
+ * **Don't:** Skipping tests because the code "looks simple."
283
+ * **Do:**
284
+ ```python
285
+ # Example using pytest
286
+ from my_module import add
287
+
288
+ def test_add_positive_numbers():
289
+ assert add(2, 3) == 5
290
+
291
+ def test_add_negative_numbers():
292
+ assert add(-1, -1) == -2
293
+
294
+ def test_add_mixed_numbers():
295
+ assert add(5, -3) == 2
296
+ ```
297
+ * **Rule:** Test behavior, not implementation.
298
+ * **Don't:** Writing a test that checks if a specific private helper method (`_helper`) was called.
299
+ * **Do:** Writing a test that checks if the public method produces the correct output or state change, regardless of which internal helpers were used.
300
+ * **Rule:** Keep tests clean, readable, and fast. (Apply the same principles from this guide to your test code).
301
+
302
+ ## 9. Practice Continuous Refactoring
303
+
304
+ * **Rule:** Follow the Boy Scout Rule.
305
+ * **Don't:** Seeing a poorly named variable or a slightly complex block of code and leaving it because "it works."
306
+ * **Do:** Taking a few moments to rename the variable or extract a small function to improve clarity before committing your primary change.
307
+ * **Rule:** Refactoring is part of development. (This is a mindset, less about specific code examples).
308
+
309
+ ## 10. Optimize for Readability
310
+
311
+ * **Rule:** Code is read more than written.
312
+ * **Don't:** Using overly clever one-liners or complex list comprehensions that are hard to decipher.
313
+ ```python
314
+ # Clever but potentially hard to read
315
+ result = [x**2 for x in range(10) if x % 2 == 0 and x > 3]
316
+ ```
317
+ * **Do:** Prioritize clarity, even if it means slightly more verbose code.
318
+ ```python
319
+ result = []
320
+ for x in range(10):
321
+ is_even = x % 2 == 0
322
+ is_greater_than_3 = x > 3
323
+ if is_even and is_greater_than_3:
324
+ result.append(x**2)
325
+ # Or a more readable comprehension if appropriate
326
+ result = [x**2 for x in range(4, 10, 2)] # Clearer range
327
+ ```
328
+
329
+ ## 11. Python-Specific Best Practices
330
+
331
+ * **Rule:** Embrace Pythonic idioms.
332
+ * **Use List Comprehensions (when clear):** Prefer `squares = [x*x for x in numbers]` over manual `for` loop appends for simple transformations.
333
+ * **Use Context Managers (`with` statement):** Ensure resources like files or network connections are properly closed.
334
+ ```python
335
+ # Don't
336
+ f = open("myfile.txt", "w")
337
+ try:
338
+ f.write("Hello")
339
+ finally:
340
+ f.close()
341
+
342
+ # Do
343
+ with open("myfile.txt", "w") as f:
344
+ f.write("Hello")
345
+ # File is automatically closed here, even if errors occur
346
+ ```
347
+ * **Iterate Directly:** Iterate over sequences directly instead of using index manipulation.
348
+ ```python
349
+ # Don't
350
+ for i in range(len(my_list)):
351
+ print(my_list[i])
352
+
353
+ # Do
354
+ for item in my_list:
355
+ print(item)
356
+
357
+ # Do (if index is needed)
358
+ for i, item in enumerate(my_list):
359
+ print(f"Index {i}: {item}")
360
+ ```
361
+ * **Rule:** Use Type Hinting (Python 3.5+). Improves readability, enables static analysis tools (`mypy`), and clarifies intent.
362
+ ```python
363
+ # Don't
364
+ def greet(name):
365
+ print("Hello " + name)
366
+
367
+ # Do
368
+ def greet(name: str) -> None:
369
+ print("Hello " + name)
370
+
371
+ def add(a: int, b: int) -> int:
372
+ return a + b
373
+ ```
374
+ * **Rule:** Use Virtual Environments (`venv`). Isolate project dependencies to avoid conflicts between projects. Always create and activate a virtual environment before installing packages (`pip install ...`).
375
+ * **Rule:** Prefer f-strings (Python 3.6+) for string formatting. They are generally more readable and often faster than `.format()` or `%` formatting.
376
+ ```python
377
+ name = "Alice"
378
+ age = 30
379
+
380
+ # Don't (older styles)
381
+ print("Name: %s, Age: %d" % (name, age))
382
+ print("Name: {}, Age: {}".format(name, age))
383
+
384
+ # Do
385
+ print(f"Name: {name}, Age: {age}")
386
+ ```
387
+ * **Rule:** Understand Mutable Default Arguments. Be wary of using mutable types (like lists or dicts) as default function arguments, as they are shared across calls.
388
+ ```python
389
+ # Don't (potential bug)
390
+ def add_item(item, my_list=[]):
391
+ my_list.append(item)
392
+ return my_list
393
+
394
+ list1 = add_item(1) # [1]
395
+ list2 = add_item(2) # [1, 2] - Unexpected!
396
+
397
+ # Do
398
+ def add_item(item, my_list=None):
399
+ if my_list is None:
400
+ my_list = []
401
+ my_list.append(item)
402
+ return my_list
403
+
404
+ list1 = add_item(1) # [1]
405
+ list2 = add_item(2) # [2] - Correct
docs/project-conventions.md ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ description: Concise coding conventions for the Folio project
3
+ alwaysApply: true
4
+ ---
5
+
6
+ # Folio Project Conventions
7
+
8
+ This document outlines the key coding conventions for the Folio project. These conventions are designed to help maintain code quality, readability, and consistency across the codebase.
9
+
10
+ ## Project Tech Stack
11
+
12
+ - **Web Framework**: Dash (Python)
13
+ - **Data Processing**: Pandas, NumPy
14
+ - **Financial Data**: Yahoo Finance API (default), FMP API (optional)
15
+ - **Testing**: Pytest
16
+ - **Linting**: Flake8, Black, isort
17
+
18
+ ## Core Conventions
19
+
20
+ ### 1. Fail Fast and Transparently
21
+
22
+ Never hide errors with default values. Financial data must be accurate or explicitly marked as unavailable.
23
+
24
+ ```python
25
+ # ❌ Bad: Hiding errors with defaults
26
+ def get_beta(ticker):
27
+ try:
28
+ return data_fetcher.get_beta(ticker)
29
+ except Exception:
30
+ return 1.0 # Dangerous default!
31
+
32
+ # ✅ Good: Transparent failure
33
+ def get_beta(ticker):
34
+ try:
35
+ return data_fetcher.get_beta(ticker)
36
+ except Exception as e:
37
+ logger.error(f"Failed to get beta for {ticker}: {e}", exc_info=True)
38
+ raise # Let the caller handle the error
39
+ ```
40
+
41
+ ### 2. Use Intention-Revealing Names
42
+
43
+ Names should clearly communicate what a variable, function, or class is for.
44
+
45
+ ```python
46
+ # ❌ Bad: Unclear names
47
+ def calc(p, q):
48
+ return p * q * 1.1
49
+
50
+ # ✅ Good: Clear names
51
+ def calculate_total_with_tax(price, quantity):
52
+ return price * quantity * 1.1
53
+ ```
54
+
55
+ ### 3. Write Small, Focused Functions
56
+
57
+ Each function should do one thing well and be reasonably small.
58
+
59
+ ```python
60
+ # ❌ Bad: Function doing too much
61
+ def process_portfolio(portfolio_data):
62
+ # Validate data
63
+ if not portfolio_data:
64
+ raise ValueError("Empty portfolio")
65
+
66
+ # Calculate metrics
67
+ total_value = 0
68
+ total_beta_adjusted = 0
69
+ for position in portfolio_data:
70
+ price = position["price"]
71
+ quantity = position["quantity"]
72
+ beta = get_beta(position["ticker"])
73
+ value = price * quantity
74
+ total_value += value
75
+ total_beta_adjusted += value * beta
76
+
77
+ # Generate report
78
+ report = {
79
+ "total_value": total_value,
80
+ "portfolio_beta": total_beta_adjusted / total_value if total_value else 0,
81
+ "positions": len(portfolio_data)
82
+ }
83
+
84
+ # Save to database
85
+ db.save_portfolio_report(report)
86
+
87
+ return report
88
+
89
+ # ✅ Good: Functions with single responsibilities
90
+ def validate_portfolio(portfolio_data):
91
+ if not portfolio_data:
92
+ raise ValueError("Empty portfolio")
93
+ return portfolio_data
94
+
95
+ def calculate_position_metrics(position):
96
+ price = position["price"]
97
+ quantity = position["quantity"]
98
+ beta = get_beta(position["ticker"])
99
+ value = price * quantity
100
+ beta_adjusted = value * beta
101
+ return {"value": value, "beta_adjusted": value * beta}
102
+
103
+ def calculate_portfolio_metrics(portfolio_data):
104
+ validated_data = validate_portfolio(portfolio_data)
105
+
106
+ position_metrics = [calculate_position_metrics(pos) for pos in validated_data]
107
+
108
+ total_value = sum(pos["value"] for pos in position_metrics)
109
+ total_beta_adjusted = sum(pos["beta_adjusted"] for pos in position_metrics)
110
+
111
+ return {
112
+ "total_value": total_value,
113
+ "portfolio_beta": total_beta_adjusted / total_value if total_value else 0,
114
+ "positions": len(portfolio_data)
115
+ }
116
+
117
+ def save_portfolio_report(report):
118
+ db.save_portfolio_report(report)
119
+ return report
120
+
121
+ def process_portfolio(portfolio_data):
122
+ metrics = calculate_portfolio_metrics(portfolio_data)
123
+ return save_portfolio_report(metrics)
124
+ ```
125
+
126
+ ### 4. Validate Early, Return Fast
127
+
128
+ Check inputs at the beginning of functions to avoid deep nesting and keep the happy path clean.
129
+
130
+ ```python
131
+ # ❌ Bad: Deeply nested conditionals
132
+ def process_data(data):
133
+ if data is not None:
134
+ if "ticker" in data:
135
+ if data["ticker"] != "":
136
+ # Process the data...
137
+ return result
138
+ else:
139
+ return None
140
+ else:
141
+ return None
142
+ else:
143
+ return None
144
+
145
+ # ✅ Good: Early validation
146
+ def process_data(data):
147
+ if data is None:
148
+ raise ValueError("Data cannot be None")
149
+ if "ticker" not in data:
150
+ raise ValueError("Missing required 'ticker' field")
151
+ if data["ticker"] == "":
152
+ raise ValueError("Ticker cannot be empty")
153
+
154
+ # Process the data...
155
+ return result
156
+ ```
157
+
158
+ ### 5. Comment the "Why," Not the "What"
159
+
160
+ Explain reasoning behind complex code, not obvious operations.
161
+
162
+ ```python
163
+ # ❌ Bad: Commenting the obvious
164
+ # Calculate the sum of prices
165
+ total = sum(item.price for item in items)
166
+
167
+ # ❌ Bad: Commented-out code
168
+ # Old calculation method
169
+ # for item in items:
170
+ # total += item.price
171
+
172
+ # ✅ Good: Explaining the why
173
+ # Apply 15% discount for bulk orders (>10 items) per company policy
174
+ if len(items) > 10:
175
+ total *= 0.85
176
+ ```
177
+
178
+ ### 6. Write Minimal, Effective Tests
179
+
180
+ Focus on testing critical business logic, not framework functionality.
181
+
182
+ ```python
183
+ # ❌ Bad: Testing framework functionality
184
+ def test_dataframe_creation():
185
+ # This just tests pandas functionality, not our code
186
+ data = {"ticker": ["AAPL"], "price": [150]}
187
+ df = pd.DataFrame(data)
188
+ assert len(df) == 1
189
+ assert "ticker" in df.columns
190
+
191
+ # ✅ Good: Testing critical business logic
192
+ def test_portfolio_beta_calculation():
193
+ # Arrange: Set up test data
194
+ portfolio = Portfolio()
195
+ portfolio.add_position(
196
+ StockPosition(ticker="AAPL", quantity=10, price=150)
197
+ )
198
+
199
+ # Mock external dependencies
200
+ data_fetcher = MagicMock()
201
+ data_fetcher.get_beta.return_value = 1.2
202
+
203
+ # Act: Call the method under test
204
+ beta = portfolio.calculate_beta(data_fetcher=data_fetcher)
205
+
206
+ # Assert: Verify the result
207
+ assert beta == 1.2
208
+ data_fetcher.get_beta.assert_called_once_with("AAPL")
209
+ ```
210
+
211
+ ### 7. Embrace Pythonic Idioms
212
+
213
+ Use Python's built-in features to write cleaner, more readable code.
214
+
215
+ ```python
216
+ # ❌ Bad: Non-Pythonic code
217
+ result = []
218
+ for i in range(len(items)):
219
+ if items[i].price > 100:
220
+ result.append(items[i].name)
221
+
222
+ # ✅ Good: Pythonic code
223
+ result = [item.name for item in items if item.price > 100]
224
+
225
+ # ❌ Bad: Manual resource management
226
+ f = open("data.csv", "r")
227
+ try:
228
+ data = f.read()
229
+ finally:
230
+ f.close()
231
+
232
+ # ✅ Good: Context manager
233
+ with open("data.csv", "r") as f:
234
+ data = f.read()
235
+ ```
236
+
237
+ ### 8. Use Type Hints
238
+
239
+ Add type hints to improve readability and enable static analysis.
240
+
241
+ ```python
242
+ # ❌ Bad: No type hints
243
+ def calculate_position_value(quantity, price):
244
+ return quantity * price
245
+
246
+ # ✅ Good: With type hints
247
+ def calculate_position_value(quantity: float, price: float) -> float:
248
+ return quantity * price
249
+
250
+ # Even better: With more specific types and docstring
251
+ from typing import Dict, List, Optional
252
+
253
+ def get_positions_by_sector(
254
+ positions: List[Dict[str, any]],
255
+ sector: Optional[str] = None
256
+ ) -> Dict[str, List[Dict[str, any]]]:
257
+ """
258
+ Group positions by sector.
259
+
260
+ Args:
261
+ positions: List of position dictionaries
262
+ sector: Optional sector to filter by
263
+
264
+ Returns:
265
+ Dictionary mapping sectors to lists of positions
266
+ """
267
+ result = {}
268
+ for position in positions:
269
+ pos_sector = position.get("sector", "Unknown")
270
+ if sector and pos_sector != sector:
271
+ continue
272
+ if pos_sector not in result:
273
+ result[pos_sector] = []
274
+ result[pos_sector].append(position)
275
+ return result
276
+ ```
277
+
278
+ ### 9. Handle Errors Gracefully
279
+
280
+ Use exceptions with context and handle them appropriately.
281
+
282
+ ```python
283
+ # ❌ Bad: Using error codes
284
+ def divide_stocks(total_value, num_stocks):
285
+ if num_stocks == 0:
286
+ return -1 # Error code
287
+ return total_value / num_stocks
288
+
289
+ # Usage
290
+ result = divide_stocks(1000, 0)
291
+ if result == -1:
292
+ print("Error: Cannot divide by zero")
293
+
294
+ # ✅ Good: Using exceptions
295
+ def divide_stocks(total_value: float, num_stocks: int) -> float:
296
+ if num_stocks == 0:
297
+ raise ValueError("Cannot divide by zero stocks")
298
+ return total_value / num_stocks
299
+
300
+ # Usage
301
+ try:
302
+ result = divide_stocks(1000, 0)
303
+ except ValueError as e:
304
+ logger.error(f"Portfolio calculation error: {e}")
305
+ # Handle the error appropriately
306
+ ```
307
+
308
+ ### 10. Keep It Simple (KISS)
309
+
310
+ Prefer simple, straightforward solutions over complex ones.
311
+
312
+ ```python
313
+ # ❌ Bad: Overly complex
314
+ def is_valid_ticker(ticker):
315
+ if ticker is not None:
316
+ if isinstance(ticker, str):
317
+ if len(ticker) > 0:
318
+ if len(ticker) <= 5:
319
+ if ticker.isalpha():
320
+ return True
321
+ return False
322
+
323
+ # ✅ Good: Simple and clear
324
+ def is_valid_ticker(ticker: str) -> bool:
325
+ return (
326
+ isinstance(ticker, str) and
327
+ 1 <= len(ticker) <= 5 and
328
+ ticker.isalpha()
329
+ )
330
+ ```
331
+
332
+ ## Additional Guidelines
333
+
334
+ 1. **Follow the Boy Scout Rule**: Leave the code cleaner than you found it.
335
+
336
+ 2. **Don't Repeat Yourself (DRY)**: Extract repeated code into reusable functions.
337
+
338
+ 3. **You Aren't Gonna Need It (YAGNI)**: Don't add functionality until it's necessary.
339
+
340
+ 4. **Optimize After Measuring**: Profile code to identify actual bottlenecks before optimizing.
341
+
342
+ 5. **Use Consistent Formatting**: Use Black, Flake8, and isort to maintain consistent code style.
343
+
344
+ 6. **Imports at Top**: Always place all imports at the top of the file.
345
+
346
+ 7. **No Unused Code**: Remove commented-out code and unused imports/variables.
347
+
348
+ 8. **Configuration Over Hardcoding**: Use configuration files for values that might change.
349
+
350
+ 9. **Log with Context**: Include relevant information in log messages.
351
+
352
+ 10. **Make Small, Focused Changes**: Don't modify unrelated code when implementing a feature or fixing a bug.
353
+
354
+ ## Benefits of Following These Conventions
355
+
356
+ - **Readability**: Code is easier to understand at a glance
357
+ - **Maintainability**: Simpler structure makes changes easier and safer
358
+ - **Testability**: Clear paths make testing more straightforward
359
+ - **Reliability**: Proper error handling prevents unexpected behavior
360
+ - **Performance**: Well-structured code leads to better performance
docs/project-design.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ description: This document explains the system architecture and data flow of the Folio application
3
+ globs: *
4
+ alwaysApply: true
5
+ ---
6
+
7
+ # Folio Project Design
8
+
9
+ This document outlines how the Folio codebase is structured and how data flows through the application. Folio is a web-based dashboard for analyzing and visualizing investment portfolios, with a focus on stocks and options.
10
+
11
+ ## Application Overview
12
+
13
+ Folio is a Python-based web application built with Dash that provides comprehensive portfolio analysis capabilities. The primary domain entities for this app are outlined below. For an authoritative overview of the data model, [data_model.py](src/folio/data_model.py) is the source of truth.
14
+
15
+ ## Deployment Modes
16
+
17
+ Folio can run in multiple deployment environments:
18
+
19
+ - **Local Development**: Running directly on a developer's machine
20
+ - **Docker Container**: Running in a containerized environment
21
+ - **Hugging Face Spaces**: Deployed as a Hugging Face Space for public access
22
+
23
+ The application detects its environment and adjusts settings accordingly, such as cache directories and logging behavior.
24
+
25
+ ## Core Data Model
26
+
27
+ The core data model consists of several key classes that represent portfolio components:
28
+
29
+ - **Position**: Base class for all positions
30
+ - **StockPosition**: Represents a stock position with quantity, price, beta, etc.
31
+ - **OptionPosition**: Represents an option position with strike, expiry, option type, delta, etc.
32
+ - **PortfolioGroup**: Groups a stock with its related options (e.g., AAPL stock with AAPL options)
33
+ - **PortfolioSummary**: Contains aggregated metrics for the entire portfolio
34
+ - **ExposureBreakdown**: Detailed breakdown of exposure metrics by category
35
+
36
+ These classes are defined in [data_model.py](src/folio/data_model.py) and provide the foundation for all portfolio analysis.
37
+
38
+ ## Data Flow
39
+
40
+ The data flow in Folio follows these main steps:
41
+
42
+ 1. **Data Input**: User uploads a portfolio CSV file or loads a sample portfolio
43
+ 2. **Data Processing**: The CSV is parsed, validated, and transformed into structured portfolio data
44
+ 3. **Position Grouping**: Stocks and their related options are grouped together
45
+ 4. **Metrics Calculation**: Exposure, beta, and other metrics are calculated for each position and group
46
+ 5. **Visualization**: The processed data is displayed in the dashboard with charts and tables
47
+ 6. **Interactivity**: User interactions trigger callbacks that update the displayed data
48
+
49
+ ### CSV Processing
50
+
51
+ When a user uploads a CSV file, the following process occurs:
52
+
53
+ 1. The file is validated for security in [security.py](src/folio/security.py)
54
+ 2. The CSV is parsed into a pandas DataFrame
55
+ 3. The DataFrame is processed by `process_portfolio_data()` in [portfolio.py](src/folio/portfolio.py)
56
+ 4. Stock positions are identified and processed
57
+ 5. Option positions are parsed and matched to their underlying stocks
58
+ 6. Cash-like positions are identified using [cash_detection.py](src/folio/cash_detection.py)
59
+ 7. Portfolio groups and summary metrics are calculated
60
+
61
+ ### Stock Data Fetching
62
+
63
+ Folio uses a pluggable data fetching system to retrieve stock data:
64
+
65
+ 1. A `DataFetcherInterface` defined in [stockdata.py](src/stockdata.py) provides a common interface
66
+ 2. Concrete implementations include `YFinanceDataFetcher` and `FMP` (Financial Modeling Prep) fetchers
67
+ 3. A singleton pattern ensures only one data fetcher is created throughout the application
68
+ 4. The data source can be configured at runtime through the `folio.yaml` configuration file
69
+ 5. Data is cached to improve performance and reduce API calls
70
+
71
+ ### Options Processing
72
+
73
+ Option positions require special processing:
74
+
75
+ 1. Option descriptions are parsed in [options.py](src/folio/options.py) to extract strike, expiry, and option type
76
+ 2. QuantLib is used for option pricing and Greeks calculations
77
+ 3. Delta exposure is calculated as delta * notional value
78
+ 4. Options are matched to their underlying stocks to form portfolio groups
79
+ 5. Option metrics are aggregated into the portfolio summary
80
+
81
+ ### Portfolio Metrics Calculation
82
+
83
+ Portfolio metrics are calculated in several steps:
84
+
85
+ 1. Individual position metrics are calculated first (market value, beta, exposure)
86
+ 2. Positions are grouped by underlying ticker
87
+ 3. Group-level metrics are calculated (net exposure, beta-adjusted exposure)
88
+ 4. Portfolio-level metrics are calculated (total exposure, portfolio beta, etc.)
89
+ 5. Exposure breakdowns are created for visualization
90
+
91
+ The canonical implementations for these calculations are in [portfolio_value.py](src/folio/portfolio_value.py).
92
+
93
+ ## UI Components
94
+
95
+ The UI is built with Dash and consists of several key components:
96
+
97
+ 1. **Summary Cards**: Display high-level portfolio metrics
98
+ 2. **Charts**: Visualize portfolio allocation and exposure
99
+ 3. **Portfolio Table**: Display all positions with key metrics
100
+ 4. **Position Details**: Show detailed information for a selected position
101
+ 5. **P&L Chart**: Visualize profit/loss scenarios for options strategies
102
+
103
+ Each component is defined in the [components](src/folio/components) directory and registered with callbacks in [app.py](src/folio/app.py).
104
+
105
+ ### Component Interaction
106
+
107
+ Components interact through Dash callbacks:
108
+
109
+ 1. Data is stored in `dcc.Store` components that act as a client-side state
110
+ 2. User interactions trigger callbacks that update the stored data
111
+ 3. Components subscribe to changes in the stored data and update accordingly
112
+ 4. This pattern allows for a reactive UI without page reloads
113
+
114
+ ## Key Modules
115
+
116
+ ### Data Processing
117
+
118
+ - **portfolio.py**: Core portfolio processing logic
119
+ - **portfolio_value.py**: Canonical implementations of portfolio value calculations
120
+ - **options.py**: Option pricing and Greeks calculations
121
+ - **cash_detection.py**: Identification of cash-like positions
122
+
123
+ ### Data Fetching
124
+
125
+ - **stockdata.py**: Common interface for data fetchers
126
+ - **yfinance.py**: Yahoo Finance data fetcher
127
+ - **fmp.py**: Financial Modeling Prep data fetcher
128
+
129
+ ### UI Components
130
+
131
+ - **components/**: UI components for the dashboard
132
+ - **charts.py**: Portfolio visualization charts
133
+ - **portfolio_table.py**: Table of portfolio positions
134
+ - **position_details.py**: Detailed view of a position
135
+ - **pnl_chart.py**: Profit/loss visualization
136
+ - **summary_cards.py**: High-level portfolio metrics
137
+
138
+ ### Application Core
139
+
140
+ - **app.py**: Main Dash application setup and callbacks
141
+ - **data_model.py**: Core data structures
142
+ - **logger.py**: Logging configuration
143
+ - **security.py**: Security utilities for validating user inputs
144
+
145
+ ## Configuration
146
+
147
+ Folio uses a YAML configuration file (`folio.yaml`) for runtime settings:
148
+
149
+ - **Data Source**: Configure which data source to use (Yahoo Finance or FMP)
150
+ - **Cache Settings**: Configure cache directories and TTL
151
+ - **UI Settings**: Configure dashboard appearance and behavior
152
+
153
+ The configuration is loaded at startup and can be overridden by environment variables.
154
+
155
+ ## Error Handling
156
+
157
+ Folio implements robust error handling:
158
+
159
+ 1. **Fail Fast, Fail Transparently**: Errors are raised early and clearly
160
+ 2. **Graceful Degradation**: The application continues to function even if some components fail
161
+ 3. **Structured Logging**: Errors are logged with context for debugging
162
+ 4. **User Feedback**: Error messages are displayed to the user when appropriate
163
+
164
+ ## Testing
165
+
166
+ The codebase includes comprehensive tests:
167
+
168
+ - **Unit Tests**: Test individual functions and classes
169
+ - **Integration Tests**: Test interactions between components
170
+ - **Mock Data**: Use mock data for testing to avoid API calls
171
+
172
+ Tests are organized to mirror the structure of the source code, with test files corresponding to source files.
173
+
174
+ ## Development Workflow
175
+
176
+ To add new features to Folio:
177
+
178
+ 1. **UI Components**: Add new components in the `components/` directory
179
+ 2. **Data Processing**: Extend the data model in `data_model.py` and processing logic in `utils.py`
180
+ 3. **Callbacks**: Add new callbacks in `app.py` to handle user interactions
181
+ 4. **Testing**: Add tests for new functionality
182
+
183
+ ## Conclusion
184
+
185
+ Folio is designed with a clean separation of concerns:
186
+
187
+ - Data fetching is abstracted behind interfaces
188
+ - Data processing is separated from UI components
189
+ - UI components are modular and reusable
190
+ - Configuration is externalized for flexibility
191
+
192
+ This architecture makes the codebase maintainable, testable, and extensible, allowing for easy addition of new features and improvements.
scripts/check_beta.py DELETED
@@ -1,175 +0,0 @@
1
- """
2
- Beta Calculation Validation Script
3
-
4
- This script fetches historical data and calculates beta values for a predefined list of symbols
5
- to validate how the beta calculation works in practice. It uses raw beta calculation without
6
- any of the special case handling found in the portfolio processing code.
7
-
8
- Beta measures the volatility of a security in relation to the market (using SPY as proxy).
9
- - Beta > 1: More volatile than the market
10
- - Beta = 1: Same volatility as the market
11
- - Beta < 1: Less volatile than the market
12
- - Beta < 0: Moves in the opposite direction as the market
13
-
14
- Latest Beta Values (as of 2025-04-01):
15
- SPAXX**: Not available (money market fund, no market data)
16
- FMPXX: Not available (money market fund, no market data)
17
- FFRHX: 0.0553 (money market fund)
18
- TLT: -0.0145 (long-term treasury ETF, negative correlation)
19
- SHY: 0.0107 (short-term treasury ETF)
20
- BIL: 0.0005 (1-3 month T-bill ETF, extremely low beta)
21
- MCHI: 0.7130 (China ETF, significant market exposure)
22
- IEFA: 0.7862 (International ETF, significant market exposure)
23
- SPY: 1.0000 (S&P 500 ETF, market benchmark)
24
- AAPL: 1.2029 (Tech stock, higher volatility than market)
25
- GOOGL: 1.2695 (Tech stock, higher volatility than market)
26
- INVALID: Not available (invalid symbol for testing error handling)
27
-
28
- Usage:
29
- python scripts/check_beta.py
30
-
31
- Note: This script calculates raw beta values without the additional logic that might be
32
- applied in the main application, such as fallbacks for cash-like positions or special
33
- handling of missing data.
34
- """
35
-
36
- import os
37
- import sys
38
-
39
- import pandas as pd
40
-
41
- # Adjust path to import from src
42
- if __name__ == "__main__":
43
- script_dir = os.path.dirname(__file__)
44
- project_root = os.path.abspath(os.path.join(script_dir, ".."))
45
- sys.path.insert(0, project_root)
46
-
47
- from src.fmp import DataFetcher
48
- from src.folio.logger import logger # Use the same logger if desired
49
- from src.folio.utils import is_cash_or_short_term
50
-
51
-
52
- def calculate_raw_beta(
53
- ticker: str, fetcher: DataFetcher, market_data: pd.DataFrame | None
54
- ) -> float | str:
55
- """Fetches data and calculates raw beta without special handling."""
56
- # Early validation
57
- if market_data is None:
58
- return "Error: Market data not available"
59
-
60
- try:
61
- # Fetch and validate stock data
62
- logger.info(f"Fetching data for {ticker}...")
63
- stock_data = fetcher.fetch_data(ticker)
64
-
65
- # Data validation checks
66
- error_msg = _validate_data(ticker, stock_data)
67
- if error_msg:
68
- return error_msg
69
-
70
- # Calculate returns
71
- logger.info(f"Calculating returns for {ticker}...")
72
- stock_returns = stock_data["Close"].pct_change().dropna()
73
- market_returns = market_data["Close"].pct_change().dropna()
74
-
75
- # Align data by index
76
- aligned_stock, aligned_market = stock_returns.align(
77
- market_returns, join="inner"
78
- )
79
-
80
- # Validate aligned data
81
- if aligned_stock.empty or len(aligned_stock) < 2:
82
- return f"Error: Not enough overlapping data points after alignment for {ticker} (need >= 2)"
83
-
84
- # Calculate beta
85
- logger.info(f"Calculating variance/covariance for {ticker}...")
86
- market_variance = aligned_market.var()
87
- covariance = aligned_stock.cov(aligned_market)
88
-
89
- # Validate variance and covariance
90
- error_msg = _validate_variance_covariance(market_variance, covariance)
91
- if error_msg:
92
- return error_msg
93
-
94
- # Calculate and return beta
95
- beta = covariance / market_variance
96
- return beta
97
-
98
- except Exception as e:
99
- return f"Error calculating beta for {ticker}: {e}"
100
-
101
-
102
- def _validate_data(ticker: str, stock_data: pd.DataFrame | None) -> str | None:
103
- """Validates stock data and returns error message if invalid."""
104
- if stock_data is None or stock_data.empty:
105
- return f"Error: No data fetched for {ticker}"
106
- if len(stock_data) < 2:
107
- return f"Error: Not enough data points for {ticker} (need >= 2)"
108
- return None
109
-
110
-
111
- def _validate_variance_covariance(
112
- market_variance: float, covariance: float
113
- ) -> str | None:
114
- """Validates variance and covariance calculations and returns error message if invalid."""
115
- if pd.isna(market_variance) or abs(market_variance) < 1e-12:
116
- return f"Error: Market variance is zero or near-zero ({market_variance})"
117
- if pd.isna(covariance):
118
- return "Error: Covariance calculation resulted in NaN"
119
- return None
120
-
121
-
122
- if __name__ == "__main__":
123
- symbols_to_check = [
124
- "SPAXX**",
125
- "FMPXX",
126
- "FFRHX",
127
- "TLT", # 20+ Year Treasury Bond ETF
128
- "SHY", # 1-3 Year Treasury Bond ETF
129
- "BIL", # 1-3 Month T-Bill ETF
130
- "MCHI", # iShares MSCI China ETF
131
- "IEFA", # iShares Core MSCI EAFE ETF
132
- "SPY", # S&P 500 ETF
133
- "AAPL", # Apple Stock
134
- "GOOGL", # Google Stock
135
- "INVALID", # Test an invalid ticker
136
- ]
137
-
138
- try:
139
- fetcher = DataFetcher()
140
- if fetcher is None:
141
- raise RuntimeError("Fetcher initialization failed")
142
- # Fetch market data once
143
- market_data = (
144
- fetcher.fetch_market_data()
145
- ) # Assumes this fetches S&P500 or similar
146
- if market_data is None or market_data.empty:
147
- sys.exit(1)
148
-
149
- except Exception:
150
- sys.exit(1)
151
-
152
- # Calculate beta for each symbol and store results
153
- results = {}
154
- for symbol in symbols_to_check:
155
- beta_result = calculate_raw_beta(symbol, fetcher, market_data)
156
- results[symbol] = beta_result
157
-
158
- # Display results in a formatted table
159
-
160
- for symbol, result in results.items():
161
- if isinstance(result, float):
162
- is_cash = is_cash_or_short_term(symbol, beta=result)
163
- classification = "CASH-LIKE" if is_cash else "MARKET-CORRELATED"
164
- else:
165
- # Error case
166
- logger.error(f"Error for {symbol}: {result}")
167
-
168
- # Summary statistics
169
- success_count = sum(1 for r in results.values() if isinstance(r, float))
170
- error_count = len(results) - success_count
171
- cash_like_count = sum(
172
- 1
173
- for s, r in results.items()
174
- if isinstance(r, float) and is_cash_or_short_term(s, beta=r)
175
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/fmp.py DELETED
@@ -1,243 +0,0 @@
1
- """
2
- Data fetcher for stock data using Financial Modeling Prep API
3
- """
4
-
5
- import logging
6
- import os
7
- from datetime import datetime, timedelta
8
-
9
- import pandas as pd
10
- import requests
11
-
12
- from src.stockdata import DataFetcherInterface
13
-
14
- # Setup logging
15
- logging.basicConfig(level=logging.INFO)
16
- logger = logging.getLogger(__name__)
17
-
18
- # Constants
19
- HTTP_SUCCESS = 200
20
-
21
-
22
- class DataFetcher(DataFetcherInterface):
23
- """Class to fetch stock data from Financial Modeling Prep API"""
24
-
25
- # Default period for beta calculations
26
- beta_period = "3m"
27
-
28
- def __init__(self, cache_dir=".cache_fmp"):
29
- """Initialize with cache directory"""
30
- self.cache_dir = cache_dir
31
- self.api_key = os.environ.get("FMP_API_KEY")
32
-
33
- # If not in environment, try to get from config
34
- if not self.api_key:
35
- try:
36
- from src.v2.config import config
37
-
38
- self.api_key = config.get("data.fmp.api_key")
39
- except ImportError:
40
- logger.warning(
41
- "Could not import config from src.v2.config, will rely on environment variable"
42
- )
43
-
44
- self.cache_ttl = 86400 # Default to 1 day
45
-
46
- # Try to get cache TTL from config if available
47
- try:
48
- from src.v2.config import config
49
-
50
- self.cache_ttl = config.get("app.cache.ttl", 86400)
51
- except ImportError:
52
- logger.warning(
53
- "Could not import config from src.v2.config, using default cache TTL"
54
- )
55
-
56
- # Create cache directory if it doesn't exist
57
- os.makedirs(cache_dir, exist_ok=True)
58
-
59
- # Check for API key
60
- if not self.api_key:
61
- raise ValueError(
62
- "No API key found. Please set the FMP_API_KEY environment variable or "
63
- "configure it in the config file."
64
- )
65
-
66
- def fetch_data(self, ticker, period="3m", interval="1d"):
67
- """
68
- Fetch stock data for a ticker
69
-
70
- Args:
71
- ticker (str): Stock ticker symbol
72
- period (str): Time period ('3m', '6m', '1y', etc.)
73
- interval (str): Data interval ('1d', '1wk', etc.)
74
-
75
- Returns:
76
- pandas.DataFrame: DataFrame with stock data
77
-
78
- Raises:
79
- ValueError: If no data is returned from API
80
- """
81
- # Check cache first
82
- cache_file = os.path.join(self.cache_dir, f"{ticker}_{period}_{interval}.csv")
83
-
84
- # Use the centralized cache validation logic
85
- from src.stockdata import should_use_cache
86
-
87
- should_use, reason = should_use_cache(cache_file, self.cache_ttl)
88
-
89
- if should_use:
90
- logger.debug(f"Loading cached data for {ticker}: {reason}")
91
- return pd.read_csv(cache_file, index_col=0, parse_dates=True)
92
- else:
93
- logger.debug(f"Cache for {ticker} is not valid: {reason}")
94
-
95
- # Try to fetch from API
96
- try:
97
- logger.info(f"Fetching data for {ticker} from API")
98
- df = self._fetch_from_api(ticker, period)
99
-
100
- if df is not None and not df.empty:
101
- # Save to cache
102
- df.to_csv(cache_file)
103
- return df
104
- else:
105
- # This is a valid case - API returned no data for a valid ticker
106
- logger.warning(f"No data returned from API for {ticker}")
107
- # Raise a specific error instead of returning an empty DataFrame
108
- raise ValueError(f"No historical data found for {ticker}")
109
- except (ValueError, requests.exceptions.RequestException) as e:
110
- # These are expected errors that can happen with valid inputs
111
- # For example, a valid ticker that has no data available or network issues
112
- logger.warning(f"Data fetch error for {ticker}: {e}")
113
-
114
- # Only use expired cache for expected data errors, not for programming errors
115
- if os.path.exists(cache_file):
116
- logger.warning(f"Using expired cache for {ticker} as fallback")
117
- try:
118
- return pd.read_csv(cache_file, index_col=0, parse_dates=True)
119
- except (pd.errors.ParserError, pd.errors.EmptyDataError) as cache_e:
120
- logger.error(f"Error reading cache for {ticker}: {cache_e}")
121
- # If we can't read the cache, re-raise the original error
122
- raise e from cache_e
123
-
124
- # If this is a "No historical data" error and we have no cache,
125
- # it's reasonable to return an empty DataFrame with the expected structure
126
- if "No historical data found" in str(e):
127
- logger.warning(
128
- f"No historical data found for {ticker} and no cache available"
129
- )
130
- return pd.DataFrame(columns=["Open", "High", "Low", "Close", "Volume"])
131
-
132
- # For other data errors with no cache, re-raise
133
- raise
134
- except (ImportError, NameError, AttributeError, TypeError, SyntaxError) as e:
135
- # These are programming errors that should never be caught silently
136
- logger.critical(f"Critical error in data fetcher: {e}", exc_info=True)
137
- raise
138
- except Exception as e:
139
- # For other unexpected errors, log and re-raise
140
- logger.error(
141
- f"Unexpected error fetching data for {ticker}: {e}", exc_info=True
142
- )
143
- raise
144
-
145
- def fetch_market_data(self, market_index="SPY", period=None, interval="1d"):
146
- """
147
- Fetch market index data for beta calculations.
148
-
149
- Args:
150
- market_index (str): Market index ticker symbol (default: 'SPY' for S&P 500 ETF)
151
- period (str, optional): Time period. If None, uses beta_period.
152
- interval (str): Data interval ('1d', '1wk', etc.)
153
-
154
- Returns:
155
- pandas.DataFrame: DataFrame with market index data
156
- """
157
- # Use the class beta_period if period is None
158
- if period is None:
159
- period = self.beta_period
160
- logger.info(f"Using default beta period: {period}")
161
-
162
- logger.debug(f"Fetching market data for {market_index}")
163
- return self.fetch_data(market_index, period, interval)
164
-
165
- def _fetch_from_api(self, ticker, period="5y"):
166
- """Fetch data from Financial Modeling Prep API"""
167
- # Determine date range based on period
168
- end_date = datetime.now()
169
-
170
- if period.endswith("y"):
171
- years = int(period[:-1])
172
- start_date = end_date - timedelta(days=365 * years)
173
- elif period.endswith("m"):
174
- months = int(period[:-1])
175
- start_date = end_date - timedelta(days=30 * months)
176
- else:
177
- # Default to 1 year
178
- start_date = end_date - timedelta(days=365)
179
-
180
- # Format dates for API
181
- start_str = start_date.strftime("%Y-%m-%d")
182
- end_str = end_date.strftime("%Y-%m-%d")
183
-
184
- # Construct API URL
185
- base_url = "https://financialmodelingprep.com/api/v3/historical-price-full"
186
- url = f"{base_url}/{ticker}?from={start_str}&to={end_str}&apikey={self.api_key}"
187
-
188
- # Make request
189
- response = requests.get(url)
190
-
191
- if response.status_code != HTTP_SUCCESS:
192
- raise ValueError(
193
- f"API request failed with status code {response.status_code}: {response.text}"
194
- )
195
-
196
- # Parse response
197
- data = response.json()
198
-
199
- if "historical" not in data:
200
- # This is not a critical error - just log a warning and return empty DataFrame
201
- logger.warning(f"No historical data found for {ticker}")
202
- return pd.DataFrame(columns=["Open", "High", "Low", "Close", "Volume"])
203
-
204
- # Convert to DataFrame
205
- df = pd.DataFrame(data["historical"])
206
-
207
- # Convert date to datetime and set as index
208
- df["date"] = pd.to_datetime(df["date"])
209
- df = df.set_index("date")
210
-
211
- # Sort by date (ascending)
212
- df = df.sort_index()
213
-
214
- # Rename columns to match expected format
215
- df = df.rename(
216
- columns={
217
- "open": "Open",
218
- "high": "High",
219
- "low": "Low",
220
- "close": "Close",
221
- "volume": "Volume",
222
- }
223
- )
224
-
225
- return df
226
-
227
- def _fetch_data(self, url, params=None):
228
- try:
229
- response = requests.get(url, params=params)
230
- if response.status_code == HTTP_SUCCESS:
231
- return response.json()
232
- else:
233
- logger.error(f"Failed to fetch data: {response.status_code}")
234
- return None
235
- except Exception as e:
236
- logger.error(f"Error fetching data: {e}")
237
- return None
238
-
239
-
240
- if __name__ == "__main__":
241
- # Simple test
242
- fetcher = DataFetcher()
243
- data = fetcher.fetch_data("AAPL", period="1y")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/folio/README.md DELETED
@@ -1,119 +0,0 @@
1
- # Folio - Portfolio Dashboard
2
-
3
- ## Overview
4
-
5
- Folio is a web-based dashboard for analyzing and visualizing investment portfolios. It provides a comprehensive view of your portfolio's composition, risk metrics, and exposure analysis with a focus on stocks and options.
6
-
7
- ## Features
8
-
9
- - **Portfolio Analysis**: View your entire portfolio with key metrics like value, beta, and exposure
10
- - **Position Grouping**: Automatically groups stocks with their related options
11
- - **Risk Metrics**: Calculates beta and beta-adjusted exposure for all positions
12
- - **Options Analysis**: Provides delta exposure and other option-specific metrics
13
- - **Interactive UI**: Filter, sort, and search your portfolio with real-time updates
14
- - **Position Details**: Drill down into specific positions for detailed analysis
15
- - **CSV Import**: Upload portfolio data from CSV exports (compatible with Fidelity exports)
16
- - **Auto-Refresh**: Periodically refreshes data to keep metrics current
17
-
18
- ## Getting Started
19
-
20
- ### Prerequisites
21
-
22
- - Python 3.9+
23
- - Required packages (see `requirements.txt` in the project root)
24
-
25
- ### Running the Dashboard
26
-
27
- ```bash
28
- # From the project root directory:
29
-
30
- # Start with default settings (will prompt for file upload)
31
- make folio
32
-
33
- # Start with a specific portfolio file
34
- make folio portfolio=path/to/portfolio.csv
35
-
36
- # Or run directly with Python
37
- python -m src.folio --portfolio path/to/portfolio.csv --port 8051
38
- ```
39
-
40
- The dashboard will be available at http://127.0.0.1:8051/ (or your specified port).
41
-
42
- ## Project Structure
43
-
44
- ```
45
- src/folio/
46
- ├── __init__.py # Package initialization
47
- ├── __main__.py # Entry point for running as a module
48
- ├── app.py # Main Dash application setup and callbacks
49
- ├── components/ # UI components
50
- │ ├── __init__.py
51
- │ ├── portfolio_table.py # Portfolio table component
52
- │ └── position_details.py # Position details modal
53
- ├── data_model.py # Data models and type definitions
54
- ├── logger.py # Logging configuration
55
- └── utils.py # Utility functions for data processing
56
- ```
57
-
58
- ## Data Model
59
-
60
- The application uses the following key data structures:
61
-
62
- - **Position**: Base class for all positions (stocks and options)
63
- - **StockPosition**: Represents a stock position
64
- - **OptionPosition**: Represents an option position with strike, expiry, etc.
65
- - **PortfolioGroup**: Groups a stock with its related options
66
- - **PortfolioSummary**: Contains aggregated metrics for the entire portfolio
67
- - **ExposureBreakdown**: Detailed breakdown of exposure metrics
68
-
69
- ## Development Guide
70
-
71
- ### Adding New Features
72
-
73
- 1. **UI Components**: Add new components in the `components/` directory
74
- 2. **Data Processing**: Extend the data model in `data_model.py` and processing logic in `utils.py`
75
- 3. **Callbacks**: Add new callbacks in `app.py` to handle user interactions
76
-
77
- ### Coding Standards
78
-
79
- - Use type hints for all functions and methods
80
- - Document functions with docstrings (Google style)
81
- - Log important operations and errors using the logger
82
- - Handle exceptions gracefully with appropriate error messages
83
- - Follow the existing pattern for callback registration
84
-
85
- ### Testing
86
-
87
- While there's no formal test suite yet, you can test your changes by:
88
-
89
- 1. Running the application with a sample portfolio
90
- 2. Verifying that all UI components render correctly
91
- 3. Checking that calculations produce expected results
92
- 4. Testing edge cases (empty portfolio, invalid data, etc.)
93
-
94
- ## Troubleshooting
95
-
96
- ### Common Issues
97
-
98
- - **Missing Data**: Ensure your CSV has all required columns (Symbol, Description, Quantity, etc.)
99
- - **Port Conflicts**: If the default port is in use, specify a different port with `--port`
100
- - **Data Fetching Errors**: Check network connectivity for beta data retrieval
101
-
102
- ### Logging
103
-
104
- Logs are stored in the `logs/` directory with timestamps. Check these logs for detailed error information.
105
-
106
- ## Future Improvements
107
-
108
- - Add unit tests for core functionality
109
- - Implement additional portfolio metrics (Sharpe ratio, VaR, etc.)
110
- - Add visualization components (charts, graphs)
111
- - Support for additional data sources beyond CSV
112
- - Enhanced options analytics with Greeks (gamma, theta, vega)
113
-
114
- ## Contributing
115
-
116
- 1. Follow the existing code style and patterns
117
- 2. Document your changes thoroughly
118
- 3. Test your changes with various portfolio data
119
- 4. Submit a pull request with a clear description of your changes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/folio/data_fetcher_singleton.py DELETED
@@ -1,97 +0,0 @@
1
- """Singleton module for data fetcher.
2
-
3
- This module provides a singleton instance of the data fetcher to ensure
4
- it's only initialized once across the application.
5
- """
6
-
7
- import os
8
-
9
- import yaml
10
-
11
- from src.stockdata import create_data_fetcher
12
-
13
- from .logger import logger
14
-
15
-
16
- class DataFetcherSingleton:
17
- """Singleton class for data fetcher."""
18
-
19
- _instance = None
20
- _initialized = False
21
-
22
- @classmethod
23
- def get_instance(cls):
24
- """Get the singleton instance of the data fetcher.
25
-
26
- Returns:
27
- DataFetcherInterface: The data fetcher instance.
28
- """
29
- if cls._instance is None:
30
- cls._instance = cls._initialize_data_fetcher()
31
- return cls._instance
32
-
33
- @classmethod
34
- def _initialize_data_fetcher(cls):
35
- """Initialize the data fetcher.
36
-
37
- Returns:
38
- DataFetcherInterface: The initialized data fetcher.
39
-
40
- Raises:
41
- RuntimeError: If the data fetcher initialization fails.
42
- """
43
- if cls._initialized:
44
- return cls._instance
45
-
46
- # Load configuration
47
- config = cls._load_config()
48
-
49
- try:
50
- # Get data source from config (default to "yfinance" if not specified)
51
- data_source = config.get("app", {}).get("data_source", "yfinance")
52
- logger.info(f"Using data source: {data_source}")
53
-
54
- # Create data fetcher using factory
55
- data_fetcher = create_data_fetcher(source=data_source)
56
-
57
- if data_fetcher is None:
58
- raise RuntimeError(
59
- "Data fetcher initialization failed but didn't raise an exception"
60
- )
61
-
62
- cls._initialized = True
63
- return data_fetcher
64
- except ValueError as e:
65
- logger.error(f"Failed to initialize data fetcher: {e}")
66
- # Re-raise to fail fast rather than continuing with a null reference
67
- raise RuntimeError(
68
- f"Critical component data fetcher could not be initialized: {e}"
69
- ) from e
70
-
71
- @staticmethod
72
- def _load_config():
73
- """Load configuration from folio.yaml.
74
-
75
- Returns:
76
- dict: The configuration dictionary.
77
- """
78
- config_path = os.path.join(os.path.dirname(__file__), "folio.yaml")
79
- if os.path.exists(config_path):
80
- try:
81
- with open(config_path) as f:
82
- return yaml.safe_load(f) or {}
83
- except Exception as e:
84
- logger.warning(
85
- f"Failed to load folio.yaml: {e}. Using default configuration."
86
- )
87
- return {}
88
-
89
-
90
- # Convenience function to get the data fetcher instance
91
- def get_data_fetcher():
92
- """Get the singleton instance of the data fetcher.
93
-
94
- Returns:
95
- DataFetcherInterface: The data fetcher instance.
96
- """
97
- return DataFetcherSingleton.get_instance()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/folio/folio.yaml CHANGED
@@ -2,8 +2,6 @@
2
  # TODO: this isn't being used yet. Please update the TODOs to individual sections below as you implement them
3
 
4
  app:
5
- # Data source configuration
6
- data_source: "yfinance" # Options: "fmp", "yfinance"
7
 
8
  # Cache configuration
9
  cache:
 
2
  # TODO: this isn't being used yet. Please update the TODOs to individual sections below as you implement them
3
 
4
  app:
 
 
5
 
6
  # Cache configuration
7
  cache:
src/folio/portfolio.py CHANGED
@@ -7,10 +7,7 @@ This module provides core functionality for portfolio analysis, including:
7
  - Portfolio metrics and summary calculations
8
  """
9
 
10
- import os
11
-
12
  import pandas as pd
13
- import yaml
14
 
15
  from src.stockdata import get_data_fetcher
16
 
@@ -33,18 +30,8 @@ from .portfolio_value import (
33
  )
34
  from .utils import clean_currency_value, get_beta
35
 
36
- # Load configuration
37
- config_path = os.path.join(os.path.dirname(__file__), "folio.yaml")
38
- config = {}
39
- if os.path.exists(config_path):
40
- try:
41
- with open(config_path) as f:
42
- config = yaml.safe_load(f) or {}
43
- except Exception as e:
44
- logger.warning(f"Failed to load folio.yaml: {e}. Using default configuration.")
45
-
46
  # Get the singleton data fetcher instance
47
- data_fetcher = get_data_fetcher(config=config)
48
 
49
 
50
  def process_portfolio_data(
 
7
  - Portfolio metrics and summary calculations
8
  """
9
 
 
 
10
  import pandas as pd
 
11
 
12
  from src.stockdata import get_data_fetcher
13
 
 
30
  )
31
  from .utils import clean_currency_value, get_beta
32
 
 
 
 
 
 
 
 
 
 
 
33
  # Get the singleton data fetcher instance
34
+ data_fetcher = get_data_fetcher()
35
 
36
 
37
  def process_portfolio_data(
src/folio/roadmap.md DELETED
@@ -1,262 +0,0 @@
1
- # Folio Product Roadmap
2
-
3
- ## Overview
4
-
5
- This roadmap outlines the strategic direction for Folio, our portfolio dashboard application. Features are prioritized based on their estimated Return on Investment (ROI), considering development effort, user impact, and alignment with our core value proposition of providing comprehensive portfolio analysis and risk management.
6
-
7
- ## Priority Matrix
8
-
9
- | Priority | Feature | Effort | Impact | ROI |
10
- |----------|---------|--------|--------|-----|
11
- | 1 | Enhanced Options Analytics | Medium | High | ★★★★★ |
12
- | 2 | Portfolio Visualization | Medium | High | ★★★★★ |
13
- | 3 | Performance Tracking | Medium | High | ★★★★☆ |
14
- | 4 | Scenario Analysis & Stress Testing | High | High | ★★★★☆ |
15
- | 5 | Additional Portfolio Metrics | Low | Medium | ★★★★☆ |
16
- | 6 | Multi-Source Data Import | Medium | Medium | ★★★☆☆ |
17
- | 7 | Portfolio Optimization | High | Medium | ★★★☆☆ |
18
- | 8 | Mobile Responsiveness | Medium | Medium | ★★★☆☆ |
19
- | 9 | User Accounts & Cloud Sync | High | Medium | ★★☆☆☆ |
20
- | 10 | API Service | High | Low | ★★☆☆☆ |
21
-
22
- ## Detailed Feature Descriptions
23
-
24
- ### 1. Enhanced Options Analytics (★★★★★)
25
-
26
- **Description:** Extend the current options analysis with comprehensive Greeks calculations and visualization.
27
-
28
- **Components:**
29
- - Complete implementation of all Greeks (Delta, Gamma, Theta, Vega, Rho)
30
- - Options strategy identification and analysis
31
- - Implied volatility surface visualization
32
- - Options expiration calendar view
33
-
34
- **Business Value:**
35
- - Provides deeper insights for options traders
36
- - Differentiates from basic portfolio trackers
37
- - Addresses current TODOs in the codebase
38
- - Builds on existing foundation with high leverage
39
-
40
- **Implementation Effort:** Medium (3-4 weeks)
41
-
42
- ---
43
-
44
- ### 2. Portfolio Visualization (★★★★★)
45
-
46
- **Description:** Add comprehensive data visualization components to provide visual insights into portfolio composition and risk.
47
-
48
- **Components:**
49
- - Asset allocation pie/treemap charts
50
- - Exposure breakdown visualizations
51
- - Risk metrics dashboards
52
- - Position correlation heatmaps
53
- - Historical performance charts
54
-
55
- **Business Value:**
56
- - Dramatically improves user experience and insights
57
- - Makes complex data more accessible
58
- - Leverages existing Plotly/Dash capabilities
59
- - High visual impact for demos and marketing
60
-
61
- **Implementation Effort:** Medium (3-4 weeks)
62
-
63
- ---
64
-
65
- ### 3. Performance Tracking (★★★★☆)
66
-
67
- **Description:** Implement historical performance tracking to monitor portfolio changes over time.
68
-
69
- **Components:**
70
- - Historical snapshots of portfolio state
71
- - Performance metrics calculation (returns, drawdowns)
72
- - Benchmark comparison
73
- - Attribution analysis (which positions drove performance)
74
- - Customizable time period selection
75
-
76
- **Business Value:**
77
- - Enables users to track investment performance
78
- - Provides accountability for investment decisions
79
- - Creates stickier product with historical data value
80
- - Complements existing risk analysis features
81
-
82
- **Implementation Effort:** Medium (4-5 weeks)
83
-
84
- ---
85
-
86
- ### 4. Scenario Analysis & Stress Testing (★★★★☆)
87
-
88
- **Description:** Allow users to model portfolio behavior under different market scenarios.
89
-
90
- **Components:**
91
- - Market shock simulations (e.g., -20% market crash)
92
- - Interest rate change scenarios
93
- - Volatility spike modeling
94
- - Custom scenario builder
95
- - Historical scenario replay (e.g., 2008 crash, 2020 COVID)
96
-
97
- **Business Value:**
98
- - Provides forward-looking risk assessment
99
- - Highly valuable for risk management
100
- - Differentiates from basic portfolio trackers
101
- - Appeals to sophisticated investors
102
-
103
- **Implementation Effort:** High (6-8 weeks)
104
-
105
- ---
106
-
107
- ### 5. Additional Portfolio Metrics (★★★★☆)
108
-
109
- **Description:** Expand the set of portfolio metrics beyond current beta and exposure analysis.
110
-
111
- **Components:**
112
- - Sharpe ratio, Sortino ratio, and other risk-adjusted return metrics
113
- - Value at Risk (VaR) calculations
114
- - Factor exposure analysis (size, value, momentum, etc.)
115
- - Sector/industry exposure breakdown
116
- - Correlation metrics with major indices
117
-
118
- **Business Value:**
119
- - Enhances risk assessment capabilities
120
- - Relatively easy to implement with high value
121
- - Builds on existing data model
122
- - Addresses TODOs in current codebase
123
-
124
- **Implementation Effort:** Low (2-3 weeks)
125
-
126
- ---
127
-
128
- ### 6. Multi-Source Data Import (★★★☆☆)
129
-
130
- **Description:** Expand beyond CSV imports to support multiple brokerage data sources.
131
-
132
- **Components:**
133
- - Direct API connections to major brokerages
134
- - Support for additional CSV/Excel formats
135
- - Automated mapping of different data formats
136
- - Manual position entry interface
137
- - Data validation and error handling
138
-
139
- **Business Value:**
140
- - Reduces friction in user onboarding
141
- - Expands potential user base
142
- - Improves data accuracy and freshness
143
- - Addresses limitation in current implementation
144
-
145
- **Implementation Effort:** Medium (4-6 weeks)
146
-
147
- ---
148
-
149
- ### 7. Portfolio Optimization (���★★☆☆)
150
-
151
- **Description:** Provide recommendations for portfolio improvements based on modern portfolio theory.
152
-
153
- **Components:**
154
- - Efficient frontier calculation
155
- - Optimization for different objectives (max return, min risk, etc.)
156
- - Position sizing recommendations
157
- - Hedging suggestions
158
- - Tax-efficient rebalancing recommendations
159
-
160
- **Business Value:**
161
- - Moves from analysis to actionable recommendations
162
- - Significant value-add for users
163
- - Potential premium feature
164
- - Differentiator from competitors
165
-
166
- **Implementation Effort:** High (8-10 weeks)
167
-
168
- ---
169
-
170
- ### 8. Mobile Responsiveness (★★★☆☆)
171
-
172
- **Description:** Optimize the UI for mobile and tablet devices.
173
-
174
- **Components:**
175
- - Responsive layout redesign
176
- - Touch-friendly controls
177
- - Mobile-optimized tables and charts
178
- - Progressive web app capabilities
179
- - Offline mode for basic functionality
180
-
181
- **Business Value:**
182
- - Expands usage contexts
183
- - Improves accessibility
184
- - Meets modern user expectations
185
- - Potential for mobile app distribution
186
-
187
- **Implementation Effort:** Medium (3-5 weeks)
188
-
189
- ---
190
-
191
- ### 9. User Accounts & Cloud Sync (★★☆☆☆)
192
-
193
- **Description:** Implement user authentication and cloud storage for portfolios.
194
-
195
- **Components:**
196
- - User registration and authentication
197
- - Secure portfolio data storage
198
- - Multi-portfolio support
199
- - Sharing and collaboration features
200
- - Premium account tiers
201
-
202
- **Business Value:**
203
- - Enables monetization strategies
204
- - Creates persistent user relationships
205
- - Allows for multi-device access
206
- - Foundation for social/collaborative features
207
-
208
- **Implementation Effort:** High (6-8 weeks)
209
-
210
- ---
211
-
212
- ### 10. API Service (★★☆☆☆)
213
-
214
- **Description:** Create a public API for programmatic access to Folio analytics.
215
-
216
- **Components:**
217
- - RESTful API design
218
- - Authentication and rate limiting
219
- - Documentation and SDK
220
- - Webhook support for portfolio updates
221
- - Integration examples
222
-
223
- **Business Value:**
224
- - Enables integration with other tools
225
- - Potential for developer ecosystem
226
- - Additional monetization channel
227
- - Automation capabilities for power users
228
-
229
- **Implementation Effort:** High (6-8 weeks)
230
-
231
- ## Implementation Phases
232
-
233
- ### Phase 1: Core Enhancement (Q2 2025)
234
- - Enhanced Options Analytics
235
- - Portfolio Visualization
236
- - Additional Portfolio Metrics
237
-
238
- ### Phase 2: Advanced Analytics (Q3 2025)
239
- - Performance Tracking
240
- - Scenario Analysis & Stress Testing
241
- - Multi-Source Data Import
242
-
243
- ### Phase 3: Platform Expansion (Q4 2025)
244
- - Portfolio Optimization
245
- - Mobile Responsiveness
246
- - User Accounts & Cloud Sync
247
- - API Service
248
-
249
- ## Success Metrics
250
-
251
- For each feature, we will track:
252
- - User adoption rate
253
- - Time spent using the feature
254
- - User feedback and satisfaction
255
- - Impact on key performance indicators
256
- - Technical stability and performance
257
-
258
- ## Conclusion
259
-
260
- This roadmap focuses on building upon Folio's core strengths in portfolio analysis while expanding into new capabilities that enhance user value. The highest ROI features leverage our existing data model and technical foundation while addressing clear user needs for deeper analytics and visualization.
261
-
262
- By prioritizing enhanced options analytics, visualization, and performance tracking in the near term, we can deliver significant value quickly while building toward more ambitious features like scenario analysis and portfolio optimization.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/folio/utils.py CHANGED
@@ -28,11 +28,8 @@ def load_config():
28
  return {}
29
 
30
 
31
- # Get configuration
32
- config = load_config()
33
-
34
  # Get the singleton data fetcher instance
35
- data_fetcher = get_data_fetcher(config=config)
36
 
37
 
38
  def get_beta(ticker: str, description: str = "") -> float:
 
28
  return {}
29
 
30
 
 
 
 
31
  # Get the singleton data fetcher instance
32
+ data_fetcher = get_data_fetcher()
33
 
34
 
35
  def get_beta(ticker: str, description: str = "") -> float:
src/stockdata.py CHANGED
@@ -59,21 +59,17 @@ class DataFetcherInterface(ABC):
59
  pass
60
 
61
 
62
- def create_data_fetcher(source="yfinance", cache_dir=None):
63
  """
64
- Factory function to create the appropriate data fetcher.
65
 
66
  Args:
67
- source (str): Data source to use ('yfinance' or 'fmp')
68
  cache_dir (str, optional): Cache directory. If None, uses default.
69
 
70
  Returns:
71
- DataFetcherInterface: An instance of the appropriate data fetcher
72
-
73
- Raises:
74
- ValueError: If the specified source is not supported
75
  """
76
- # Set default cache directories based on data source and environment
77
  # In Hugging Face Spaces, use /tmp for cache
78
  is_huggingface = (
79
  os.environ.get("HF_SPACE") == "1" or os.environ.get("SPACE_ID") is not None
@@ -82,23 +78,15 @@ def create_data_fetcher(source="yfinance", cache_dir=None):
82
  if cache_dir is None:
83
  if is_huggingface:
84
  # Use /tmp directory for Hugging Face
85
- cache_dir = "/tmp/cache_yf" if source == "yfinance" else "/tmp/cache_fmp"
86
  else:
87
  # Use local directory for other environments
88
- cache_dir = ".cache_yf" if source == "yfinance" else ".cache_fmp"
89
-
90
- if source == "yfinance":
91
- from src.yfinance import YFinanceDataFetcher
92
 
93
- logger.info(f"Creating YFinance data fetcher with cache dir: {cache_dir}")
94
- return YFinanceDataFetcher(cache_dir=cache_dir)
95
- elif source == "fmp":
96
- from src.fmp import DataFetcher
97
 
98
- logger.info(f"Creating FMP data fetcher with cache dir: {cache_dir}")
99
- return DataFetcher(cache_dir=cache_dir)
100
- else:
101
- raise ValueError(f"Unknown data source: {source}")
102
 
103
 
104
  # Singleton data fetcher class
@@ -106,9 +94,10 @@ class DataFetcherSingleton:
106
  """Singleton class for data fetcher."""
107
 
108
  _instance = None
 
109
 
110
  @classmethod
111
- def get_instance(cls, source=None, cache_dir=None, config=None):
112
  """
113
  Get the singleton instance of the data fetcher.
114
 
@@ -116,11 +105,7 @@ class DataFetcherSingleton:
116
  the application, preventing duplicate initialization.
117
 
118
  Args:
119
- source (str, optional): Data source to use ('yfinance' or 'fmp').
120
- If None, uses the value from config or defaults to 'yfinance'.
121
  cache_dir (str, optional): Cache directory. If None, uses default.
122
- config (dict, optional): Configuration dictionary. If provided,
123
- used to determine the data source if source is None.
124
 
125
  Returns:
126
  DataFetcherInterface: The singleton data fetcher instance.
@@ -131,22 +116,16 @@ class DataFetcherSingleton:
131
  if cls._instance is not None:
132
  return cls._instance
133
 
134
- # Determine the data source
135
- if source is None:
136
- if config is not None:
137
- source = config.get("app", {}).get("data_source", "yfinance")
138
- else:
139
- source = "yfinance"
140
-
141
  try:
142
- logger.info(f"Using data source: {source}")
143
- cls._instance = create_data_fetcher(source=source, cache_dir=cache_dir)
144
 
145
  if cls._instance is None:
146
  raise RuntimeError(
147
  "Data fetcher initialization failed but didn't raise an exception"
148
  )
149
 
 
150
  return cls._instance
151
  except ValueError as e:
152
  logger.error(f"Failed to initialize data fetcher: {e}")
@@ -157,7 +136,7 @@ class DataFetcherSingleton:
157
 
158
 
159
  # Convenience function to maintain backward compatibility
160
- def get_data_fetcher(source=None, cache_dir=None, config=None):
161
  """
162
  Get the singleton instance of the data fetcher.
163
 
@@ -165,16 +144,13 @@ def get_data_fetcher(source=None, cache_dir=None, config=None):
165
  for backward compatibility.
166
 
167
  Args:
168
- source (str, optional): Data source to use ('yfinance' or 'fmp').
169
- If None, uses the value from config or defaults to 'yfinance'.
170
  cache_dir (str, optional): Cache directory. If None, uses default.
171
- config (dict, optional): Configuration dictionary. If provided,
172
- used to determine the data source if source is None.
173
 
174
  Returns:
175
  DataFetcherInterface: The singleton data fetcher instance.
176
  """
177
- return DataFetcherSingleton.get_instance(source, cache_dir, config)
178
 
179
 
180
  # Cache management functions
 
59
  pass
60
 
61
 
62
+ def create_data_fetcher(cache_dir=None):
63
  """
64
+ Factory function to create a YFinance data fetcher.
65
 
66
  Args:
 
67
  cache_dir (str, optional): Cache directory. If None, uses default.
68
 
69
  Returns:
70
+ DataFetcherInterface: An instance of YFinanceDataFetcher
 
 
 
71
  """
72
+ # Set default cache directory based on environment
73
  # In Hugging Face Spaces, use /tmp for cache
74
  is_huggingface = (
75
  os.environ.get("HF_SPACE") == "1" or os.environ.get("SPACE_ID") is not None
 
78
  if cache_dir is None:
79
  if is_huggingface:
80
  # Use /tmp directory for Hugging Face
81
+ cache_dir = "/tmp/cache_yf"
82
  else:
83
  # Use local directory for other environments
84
+ cache_dir = ".cache_yf"
 
 
 
85
 
86
+ from src.yfinance import YFinanceDataFetcher
 
 
 
87
 
88
+ logger.info(f"Creating YFinance data fetcher with cache dir: {cache_dir}")
89
+ return YFinanceDataFetcher(cache_dir=cache_dir)
 
 
90
 
91
 
92
  # Singleton data fetcher class
 
94
  """Singleton class for data fetcher."""
95
 
96
  _instance = None
97
+ _initialized = False
98
 
99
  @classmethod
100
+ def get_instance(cls, cache_dir=None):
101
  """
102
  Get the singleton instance of the data fetcher.
103
 
 
105
  the application, preventing duplicate initialization.
106
 
107
  Args:
 
 
108
  cache_dir (str, optional): Cache directory. If None, uses default.
 
 
109
 
110
  Returns:
111
  DataFetcherInterface: The singleton data fetcher instance.
 
116
  if cls._instance is not None:
117
  return cls._instance
118
 
 
 
 
 
 
 
 
119
  try:
120
+ logger.info("Initializing YFinance data fetcher")
121
+ cls._instance = create_data_fetcher(cache_dir=cache_dir)
122
 
123
  if cls._instance is None:
124
  raise RuntimeError(
125
  "Data fetcher initialization failed but didn't raise an exception"
126
  )
127
 
128
+ cls._initialized = True
129
  return cls._instance
130
  except ValueError as e:
131
  logger.error(f"Failed to initialize data fetcher: {e}")
 
136
 
137
 
138
  # Convenience function to maintain backward compatibility
139
+ def get_data_fetcher(cache_dir=None, **kwargs):
140
  """
141
  Get the singleton instance of the data fetcher.
142
 
 
144
  for backward compatibility.
145
 
146
  Args:
 
 
147
  cache_dir (str, optional): Cache directory. If None, uses default.
148
+ **kwargs: Additional arguments that are ignored (for backward compatibility)
 
149
 
150
  Returns:
151
  DataFetcherInterface: The singleton data fetcher instance.
152
  """
153
+ return DataFetcherSingleton.get_instance(cache_dir)
154
 
155
 
156
  # Cache management functions
tests/fetch_sample_data.py DELETED
@@ -1,110 +0,0 @@
1
- """
2
- Script to fetch sample data from the FMP API for testing purposes.
3
-
4
- This script fetches data for a few representative tickers and saves it to JSON files
5
- for reference when creating mock data and tests.
6
- """
7
-
8
- import json
9
- import os
10
- import sys
11
-
12
- # Add the project root to the Python path
13
- sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
14
-
15
-
16
- from src.fmp import DataFetcher
17
-
18
- # Create output directory
19
- OUTPUT_DIR = "tests/test_data"
20
- os.makedirs(OUTPUT_DIR, exist_ok=True)
21
-
22
- # List of tickers to fetch data for
23
- TICKERS = [
24
- "SPY", # S&P 500 ETF (market benchmark)
25
- "AAPL", # High beta tech stock
26
- "GOOGL", # Another high beta tech stock
27
- "SO", # Low volatility utility stock
28
- "TLT", # Treasury ETF (negative correlation with market)
29
- "BIL", # Short-term treasury (very low volatility)
30
- "EFA", # International ETF
31
- "EEM", # Emerging markets ETF
32
- ]
33
-
34
- # Periods to fetch
35
- PERIODS = ["1y", "5y"]
36
-
37
- def main():
38
- """Fetch sample data and save to files."""
39
-
40
- # Initialize data fetcher
41
- fetcher = DataFetcher()
42
-
43
- # Fetch data for each ticker and period
44
- for ticker in TICKERS:
45
- for period in PERIODS:
46
- try:
47
-
48
- # Fetch data
49
- df = fetcher.fetch_data(ticker, period=period)
50
-
51
- if df is not None and not df.empty:
52
- # Save to CSV
53
- csv_path = os.path.join(OUTPUT_DIR, f"{ticker}_{period}.csv")
54
- df.to_csv(csv_path)
55
-
56
- # Save first 5 rows to JSON for reference
57
- json_path = os.path.join(OUTPUT_DIR, f"{ticker}_{period}_sample.json")
58
- sample_data = df.head(5).reset_index().to_dict(orient="records")
59
- with open(json_path, "w") as f:
60
- json.dump(sample_data, f, indent=2, default=str)
61
- else:
62
- pass
63
-
64
- except Exception:
65
- pass
66
-
67
- # Calculate and save beta values
68
- betas = {}
69
-
70
- # Use 5-year data for more accurate beta calculation
71
- market_data = fetcher.fetch_market_data("SPY", period="5y")
72
- market_returns = market_data["Close"].pct_change().dropna()
73
-
74
- for ticker in TICKERS:
75
- try:
76
- # Skip SPY (beta = 1.0 by definition)
77
- if ticker == "SPY":
78
- betas[ticker] = 1.0
79
- continue
80
-
81
- # Fetch data and calculate beta
82
- stock_data = fetcher.fetch_data(ticker, period="5y")
83
- stock_returns = stock_data["Close"].pct_change().dropna()
84
-
85
- # Align data
86
- common_dates = stock_returns.index.intersection(market_returns.index)
87
- if len(common_dates) < 30: # Require at least 30 data points
88
- continue
89
-
90
- aligned_stock = stock_returns.loc[common_dates]
91
- aligned_market = market_returns.loc[common_dates]
92
-
93
- # Calculate beta
94
- covariance = aligned_stock.cov(aligned_market)
95
- market_variance = aligned_market.var()
96
- beta = covariance / market_variance
97
-
98
- betas[ticker] = beta
99
-
100
- except Exception:
101
- pass
102
-
103
- # Save beta values
104
- beta_path = os.path.join(OUTPUT_DIR, "beta_values.json")
105
- with open(beta_path, "w") as f:
106
- json.dump(betas, f, indent=2)
107
-
108
-
109
- if __name__ == "__main__":
110
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_data_fetcher.py DELETED
@@ -1,471 +0,0 @@
1
- """
2
- Tests for the DataFetcher class in src/fmp.py
3
-
4
- These tests verify the core functionality of the DataFetcher class, including:
5
- 1. Initialization and configuration
6
- 2. Data fetching and caching
7
- 3. Error handling
8
- 4. Data format and structure
9
-
10
- The tests use mocking to avoid actual API calls and to provide consistent test data.
11
- """
12
-
13
- import os
14
- import sys
15
- import time
16
- from datetime import datetime, timedelta
17
- from unittest.mock import MagicMock, patch
18
-
19
- import pandas as pd
20
- import pytest
21
- import requests
22
-
23
- # Add the project root to the Python path
24
- sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
25
-
26
- from src.fmp import DataFetcher
27
-
28
- # Import mock data utilities
29
- from tests.test_data.mock_stock_data import (
30
- get_mock_raw_data,
31
- get_real_beta,
32
- get_real_data,
33
- )
34
-
35
-
36
- @pytest.fixture
37
- def mock_response():
38
- """Create a mock response object for requests with real data."""
39
- mock = MagicMock()
40
- mock.status_code = 200
41
-
42
- # Use real data structure from our collected samples
43
- mock.json.return_value = get_mock_raw_data("AAPL", "1y")
44
- return mock
45
-
46
-
47
- @pytest.fixture
48
- def mock_spy_response():
49
- """Create a mock response object for SPY data."""
50
- mock = MagicMock()
51
- mock.status_code = 200
52
-
53
- # Use real data structure from our collected samples
54
- mock.json.return_value = get_mock_raw_data("SPY", "1y")
55
- return mock
56
-
57
-
58
- @pytest.fixture
59
- def mock_empty_response():
60
- """Create a mock response with no historical data."""
61
- mock = MagicMock()
62
- mock.status_code = 200
63
- mock.json.return_value = {"symbol": "INVALID", "historical": []}
64
- return mock
65
-
66
-
67
- @pytest.fixture
68
- def mock_error_response():
69
- """Create a mock response with an error status code."""
70
- mock = MagicMock()
71
- mock.status_code = 401
72
- mock.text = "Unauthorized: Invalid API key"
73
- return mock
74
-
75
-
76
- @pytest.fixture
77
- def temp_cache_dir(tmpdir):
78
- """Create a temporary directory for cache files."""
79
- cache_dir = tmpdir.mkdir("test_cache")
80
- return str(cache_dir)
81
-
82
-
83
- @pytest.fixture
84
- def sample_dataframe():
85
- """Create a sample DataFrame with the expected structure using real data."""
86
- return get_real_data("AAPL", "1y").head(5)
87
-
88
-
89
- class TestDataFetcherInitialization:
90
- """Tests for DataFetcher initialization and configuration."""
91
-
92
- def test_init_with_default_cache_dir(self):
93
- """Test initialization with default cache directory."""
94
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
95
- fetcher = DataFetcher()
96
- assert fetcher.cache_dir == ".cache_fmp"
97
- assert fetcher.api_key == "test_key"
98
- assert fetcher.cache_ttl == 86400 # Default TTL
99
-
100
- def test_init_with_custom_cache_dir(self, temp_cache_dir):
101
- """Test initialization with custom cache directory."""
102
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
103
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
104
- assert fetcher.cache_dir == temp_cache_dir
105
- assert os.path.exists(temp_cache_dir) # Directory should be created
106
-
107
- def test_init_with_config_api_key(self):
108
- """Test initialization with API key from config."""
109
- # Clear environment variable to ensure we use config
110
- with patch.dict(os.environ, {}, clear=True):
111
- with patch("src.v2.config.config.get", return_value="config_key"):
112
- fetcher = DataFetcher()
113
- assert fetcher.api_key == "config_key"
114
-
115
- def test_init_with_env_api_key_precedence(self):
116
- """Test that environment variable takes precedence over config."""
117
- with patch.dict(os.environ, {"FMP_API_KEY": "env_key"}):
118
- with patch("src.v2.config.config.get", return_value="config_key"):
119
- fetcher = DataFetcher()
120
- assert fetcher.api_key == "env_key"
121
-
122
- def test_init_with_custom_ttl(self):
123
- """Test initialization with custom cache TTL from config."""
124
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
125
- with patch(
126
- "src.v2.config.config.get",
127
- side_effect=lambda key, default=None: 3600
128
- if key == "app.cache.ttl"
129
- else default,
130
- ):
131
- fetcher = DataFetcher()
132
- assert fetcher.cache_ttl == 3600
133
-
134
- def test_init_without_api_key(self):
135
- """Test initialization without API key raises ValueError."""
136
- with patch.dict(os.environ, {}, clear=True):
137
- with patch("src.v2.config.config.get", return_value=None):
138
- with pytest.raises(ValueError, match="No API key found"):
139
- DataFetcher()
140
-
141
-
142
- class TestDataFetching:
143
- """Tests for data fetching functionality."""
144
-
145
- def test_fetch_data_api_call(self, mock_response, temp_cache_dir):
146
- """Test fetching data from API."""
147
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
148
- with patch("requests.get", return_value=mock_response):
149
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
150
- df = fetcher.fetch_data("AAPL", period="1y")
151
-
152
- # Check DataFrame structure
153
- assert isinstance(df, pd.DataFrame)
154
- assert len(df) > 0 # Don't check exact length as it may vary
155
-
156
- # Check that required columns exist
157
- required_columns = ["Open", "High", "Low", "Close", "Volume"]
158
- for col in required_columns:
159
- assert col in df.columns, f"Column {col} not found in DataFrame"
160
-
161
- assert df.index.name == "date"
162
- assert pd.api.types.is_datetime64_dtype(df.index)
163
-
164
- def test_fetch_data_cache_creation(self, mock_response, temp_cache_dir):
165
- """Test that data is cached after fetching."""
166
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
167
- with patch("requests.get", return_value=mock_response):
168
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
169
- fetcher.fetch_data("AAPL", period="1y")
170
-
171
- # Check that cache file was created
172
- cache_file = os.path.join(temp_cache_dir, "AAPL_1y_1d.csv")
173
- assert os.path.exists(cache_file)
174
-
175
- def test_fetch_data_from_cache(
176
- self, mock_response, temp_cache_dir, sample_dataframe
177
- ):
178
- """Test fetching data from cache."""
179
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
180
- # Create cache file
181
- cache_file = os.path.join(temp_cache_dir, "AAPL_1y_1d.csv")
182
- sample_dataframe.to_csv(cache_file)
183
-
184
- # Set modification time to be recent (within cache TTL)
185
- os.utime(cache_file, (time.time(), time.time()))
186
-
187
- with patch("requests.get", return_value=mock_response) as mock_get:
188
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
189
- df = fetcher.fetch_data("AAPL", period="1y")
190
-
191
- # API should not be called
192
- mock_get.assert_not_called()
193
-
194
- # Data should match sample
195
- pd.testing.assert_frame_equal(df, sample_dataframe)
196
-
197
- def test_fetch_data_expired_cache(
198
- self, mock_response, temp_cache_dir, sample_dataframe
199
- ):
200
- """Test fetching data with expired cache."""
201
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
202
- # Create cache file
203
- cache_file = os.path.join(temp_cache_dir, "AAPL_1y_1d.csv")
204
- sample_dataframe.to_csv(cache_file)
205
-
206
- # Set modification time to be old (beyond cache TTL)
207
- old_time = time.time() - 100000 # Well beyond default TTL
208
- os.utime(cache_file, (old_time, old_time))
209
-
210
- with patch("requests.get", return_value=mock_response) as mock_get:
211
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
212
- fetcher.fetch_data("AAPL", period="1y")
213
-
214
- # API should be called
215
- mock_get.assert_called_once()
216
-
217
- def test_fetch_market_data(self, mock_response, temp_cache_dir):
218
- """Test fetching market data."""
219
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
220
- with patch("requests.get", return_value=mock_response):
221
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
222
- df = fetcher.fetch_market_data(market_index="SPY", period="1y")
223
-
224
- # Check DataFrame structure
225
- assert isinstance(df, pd.DataFrame)
226
- assert len(df) > 0 # Don't check exact length as it may vary
227
-
228
- # Check that required columns exist
229
- required_columns = ["Open", "High", "Low", "Close", "Volume"]
230
- for col in required_columns:
231
- assert col in df.columns, f"Column {col} not found in DataFrame"
232
-
233
-
234
- class TestErrorHandling:
235
- """Tests for error handling in DataFetcher."""
236
-
237
- def test_api_error_response(self, mock_error_response, temp_cache_dir):
238
- """Test handling of API error responses."""
239
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
240
- with patch("requests.get", return_value=mock_error_response):
241
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
242
- with pytest.raises(ValueError, match="API request failed"):
243
- fetcher.fetch_data("AAPL", period="1y")
244
-
245
- def test_empty_data_response(self, mock_empty_response, temp_cache_dir):
246
- """Test handling of empty data responses."""
247
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
248
- # Update the mock response to not include 'historical' key
249
- mock_empty_response.json.return_value = {"symbol": "INVALID"}
250
-
251
- with patch("requests.get", return_value=mock_empty_response):
252
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
253
- # Now we expect an empty DataFrame instead of an exception
254
- result = fetcher.fetch_data("INVALID", period="1y")
255
- assert isinstance(result, pd.DataFrame)
256
- assert result.empty or len(result) == 0
257
- assert "Open" in result.columns
258
- assert "Close" in result.columns
259
-
260
- def test_network_error_with_fallback(self, temp_cache_dir, sample_dataframe):
261
- """Test fallback to expired cache on network error."""
262
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
263
- # Create cache file
264
- cache_file = os.path.join(temp_cache_dir, "AAPL_1y_1d.csv")
265
- sample_dataframe.to_csv(cache_file)
266
-
267
- # Set modification time to be old (beyond cache TTL)
268
- old_time = time.time() - 100000 # Well beyond default TTL
269
- os.utime(cache_file, (old_time, old_time))
270
-
271
- # Simulate network error
272
- with patch(
273
- "requests.get",
274
- side_effect=requests.exceptions.ConnectionError("Network error"),
275
- ):
276
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
277
- df = fetcher.fetch_data("AAPL", period="1y")
278
-
279
- # Should fall back to cache
280
- pd.testing.assert_frame_equal(df, sample_dataframe)
281
-
282
- def test_network_error_without_fallback(self, temp_cache_dir):
283
- """Test network error without cache fallback raises exception."""
284
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
285
- # Simulate network error with no cache
286
- with patch(
287
- "requests.get",
288
- side_effect=requests.exceptions.ConnectionError("Network error"),
289
- ):
290
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
291
- with pytest.raises(requests.exceptions.ConnectionError):
292
- fetcher.fetch_data("AAPL", period="1y")
293
-
294
-
295
- class TestDataFormat:
296
- """Tests for data format and structure."""
297
-
298
- def test_date_parsing(self, mock_response, temp_cache_dir):
299
- """Test that dates are properly parsed and set as index."""
300
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
301
- with patch("requests.get", return_value=mock_response):
302
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
303
- df = fetcher.fetch_data("AAPL", period="1y")
304
-
305
- # Check index is datetime
306
- assert pd.api.types.is_datetime64_dtype(df.index)
307
- assert df.index.name == "date"
308
- # Don't check exact date as it may vary
309
-
310
- def test_column_renaming(self, mock_response, temp_cache_dir):
311
- """Test that columns are properly renamed."""
312
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
313
- with patch("requests.get", return_value=mock_response):
314
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
315
- df = fetcher.fetch_data("AAPL", period="1y")
316
-
317
- # Check that required columns exist
318
- required_columns = ["Open", "High", "Low", "Close", "Volume"]
319
- for col in required_columns:
320
- assert col in df.columns, f"Column {col} not found in DataFrame"
321
-
322
- def test_data_sorting(self, temp_cache_dir):
323
- """Test that data is sorted by date in ascending order."""
324
- # Modify mock response to have unsorted dates
325
- unsorted_response = MagicMock()
326
- unsorted_response.status_code = 200
327
- unsorted_response.json.return_value = {
328
- "symbol": "AAPL",
329
- "historical": [
330
- {
331
- "date": "2023-01-05",
332
- "open": 127.13,
333
- "high": 127.77,
334
- "low": 124.76,
335
- "close": 125.02,
336
- "volume": 80829500,
337
- },
338
- {
339
- "date": "2023-01-03",
340
- "open": 130.28,
341
- "high": 130.9,
342
- "low": 124.17,
343
- "close": 125.07,
344
- "volume": 112117500,
345
- },
346
- {
347
- "date": "2023-01-04",
348
- "open": 126.89,
349
- "high": 128.66,
350
- "low": 125.08,
351
- "close": 126.36,
352
- "volume": 88883500,
353
- },
354
- ],
355
- }
356
-
357
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
358
- with patch("requests.get", return_value=unsorted_response):
359
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
360
- df = fetcher.fetch_data("AAPL", period="1y")
361
-
362
- # Check sorting
363
- assert df.index[0] < df.index[1] < df.index[2]
364
- assert df.index[0] == pd.Timestamp("2023-01-03")
365
- assert df.index[1] == pd.Timestamp("2023-01-04")
366
- assert df.index[2] == pd.Timestamp("2023-01-05")
367
-
368
-
369
- class TestPeriodHandling:
370
- """Tests for period handling in date range calculation."""
371
-
372
- def test_period_years(self, mock_response, temp_cache_dir):
373
- """Test period handling for years."""
374
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
375
- with patch("requests.get", return_value=mock_response) as mock_get:
376
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
377
- fetcher.fetch_data("AAPL", period="2y")
378
-
379
- # Extract URL from the call
380
- url = mock_get.call_args[0][0]
381
-
382
- # Check that date range is approximately 2 years
383
- today = datetime.now().strftime("%Y-%m-%d")
384
- two_years_ago = (datetime.now() - timedelta(days=365 * 2)).strftime(
385
- "%Y-%m"
386
- )
387
-
388
- assert today in url
389
- assert two_years_ago in url
390
-
391
- def test_period_months(self, mock_response, temp_cache_dir):
392
- """Test period handling for months."""
393
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
394
- with patch("requests.get", return_value=mock_response) as mock_get:
395
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
396
- fetcher.fetch_data("AAPL", period="6m")
397
-
398
- # Extract URL from the call
399
- url = mock_get.call_args[0][0]
400
-
401
- # Check that date range is approximately 6 months
402
- today = datetime.now().strftime("%Y-%m-%d")
403
- six_months_ago = (datetime.now() - timedelta(days=30 * 6)).strftime(
404
- "%Y-%m"
405
- )
406
-
407
- assert today in url
408
- assert six_months_ago in url
409
-
410
- def test_period_default(self, mock_response, temp_cache_dir):
411
- """Test default period handling."""
412
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
413
- with patch("requests.get", return_value=mock_response) as mock_get:
414
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
415
- fetcher.fetch_data("AAPL", period="invalid")
416
-
417
- # Extract URL from the call
418
- url = mock_get.call_args[0][0]
419
-
420
- # Check that date range is approximately 1 year (default)
421
- today = datetime.now().strftime("%Y-%m-%d")
422
- one_year_ago = (datetime.now() - timedelta(days=365)).strftime("%Y-%m")
423
-
424
- assert today in url
425
- assert one_year_ago in url
426
-
427
-
428
- class TestBetaCalculation:
429
- """Tests for beta calculation using the DataFetcher."""
430
-
431
- def test_beta_calculation(self, mock_response, mock_spy_response, temp_cache_dir):
432
- """Test beta calculation with mock data."""
433
- with patch.dict(os.environ, {"FMP_API_KEY": "test_key"}):
434
- with patch(
435
- "requests.get",
436
- side_effect=lambda url, params=None: mock_spy_response
437
- if "SPY" in url
438
- else mock_response,
439
- ):
440
- fetcher = DataFetcher(cache_dir=temp_cache_dir)
441
-
442
- # Get stock and market data
443
- stock_data = fetcher.fetch_data("AAPL", period="1y")
444
- market_data = fetcher.fetch_market_data("SPY", period="1y")
445
-
446
- # Calculate beta manually
447
- stock_returns = stock_data["Close"].pct_change().dropna()
448
- market_returns = market_data["Close"].pct_change().dropna()
449
-
450
- # Align data
451
- common_dates = stock_returns.index.intersection(market_returns.index)
452
- stock_returns = stock_returns.loc[common_dates]
453
- market_returns = market_returns.loc[common_dates]
454
-
455
- # Calculate beta
456
- covariance = stock_returns.cov(market_returns)
457
- market_variance = market_returns.var()
458
- beta = covariance / market_variance
459
-
460
- # Compare with expected beta from real data
461
- get_real_beta("AAPL")
462
-
463
- # Beta should be within a reasonable range of the expected value
464
- # The exact value will differ due to the mock data and date ranges
465
- assert 0.5 < beta < 2.0, f"Beta {beta} is outside reasonable range"
466
-
467
- # For information only - not a strict test
468
-
469
-
470
- if __name__ == "__main__":
471
- pytest.main(["-v", "test_data_fetcher.py"])