Dmitry Beresnev commited on
Commit
e6b8a0f
·
1 Parent(s): d848cbc

add core modules

Browse files
Files changed (9) hide show
  1. .dockerignore +39 -0
  2. .gitignore +45 -0
  3. Dockerfile +34 -0
  4. README.md +105 -3
  5. app.py +360 -0
  6. formula_generator.py +395 -0
  7. ocr_parser.py +175 -0
  8. portfolio_calculator.py +316 -0
  9. requirements.txt +8 -0
.dockerignore ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Git
2
+ .git
3
+ .gitignore
4
+ .gitattributes
5
+
6
+ # Python cache
7
+ __pycache__
8
+ *.pyc
9
+ *.pyo
10
+ *.pyd
11
+ .Python
12
+
13
+ # Virtual environments
14
+ .venv
15
+ venv
16
+ env
17
+
18
+ # Logs
19
+ *.log
20
+
21
+ # OS files
22
+ .DS_Store
23
+ Thumbs.db
24
+
25
+ # Documentation (not needed in Docker image)
26
+ *.md
27
+ README.md
28
+
29
+ # Test files (optional - remove if you want to include test images)
30
+ test_*.png
31
+
32
+ # IDE
33
+ .vscode
34
+ .idea
35
+
36
+ # Misc
37
+ *.swp
38
+ *.swo
39
+ *~
.gitignore ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Node modules
2
+ node_modules/
3
+ npm-debug.log
4
+ yarn-error.log
5
+
6
+ # Python virtual environment and caches
7
+ __pycache__/
8
+ *.pyc
9
+ *.pyo
10
+ *.pyd
11
+ venv/
12
+ env/
13
+ .venv/
14
+ .Python
15
+
16
+ # HF Space build artifacts
17
+ *.log
18
+ *.lock
19
+ *.db
20
+ *.sqlite
21
+ *.cache
22
+ /dist/
23
+ .build/
24
+
25
+ # Docker
26
+ *.env
27
+ Dockerfile.*.swp
28
+ docker-compose.override.yml
29
+
30
+ # Vault local changes (if you want only committed notes to stay)
31
+ vault/*.md
32
+ vault/**/*.md
33
+
34
+ # VSCode / IDEs
35
+ .vscode/
36
+ .idea/
37
+ *.sublime-workspace
38
+ *.sublime-project
39
+
40
+ # OS files
41
+ .DS_Store
42
+ Thumbs.db
43
+
44
+ #
45
+ test_*
Dockerfile ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+
3
+ # Install system dependencies for tesseract OCR and image processing
4
+ RUN apt-get update && apt-get install -y \
5
+ tesseract-ocr \
6
+ tesseract-ocr-eng \
7
+ libtesseract-dev \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ # Set working directory
11
+ WORKDIR /app
12
+
13
+ # Copy requirements first for better Docker layer caching
14
+ COPY requirements.txt .
15
+
16
+ # Install Python dependencies
17
+ RUN pip install --no-cache-dir -r requirements.txt
18
+
19
+ # Copy application files
20
+ COPY . .
21
+
22
+ # Expose Streamlit port (HF Spaces default)
23
+ EXPOSE 7860
24
+
25
+ # Set environment variables for Streamlit
26
+ ENV STREAMLIT_SERVER_PORT=7860
27
+ ENV STREAMLIT_SERVER_ADDRESS=0.0.0.0
28
+ ENV STREAMLIT_SERVER_HEADLESS=true
29
+
30
+ # Health check
31
+ HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health || exit 1
32
+
33
+ # Run the application
34
+ CMD ["streamlit", "run", "app.py", "--server.port=7860", "--server.address=0.0.0.0"]
README.md CHANGED
@@ -1,11 +1,113 @@
1
  ---
2
- title: Financial Analyst
3
- emoji: 🐢
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
  pinned: false
8
- short_description: on the way to the financial analytics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: Portfolio Volatility Analyzer
3
+ emoji: 📊
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
  pinned: false
8
+ short_description: Investment portfolio risk analysis with OCR and LaTeX formulas
9
+ ---
10
+
11
+ # 📊 Portfolio Volatility Analyzer
12
+
13
+ Analyze your investment portfolio risk using **Modern Portfolio Theory** with OCR, interactive visualizations, and beautiful mathematical formulas.
14
+
15
+ ## Features
16
+
17
+ - 📸 **OCR Portfolio Parsing**: Upload screenshots of your portfolio and automatically extract tickers and amounts
18
+ - ✏️ **Editable JSON**: Correct OCR errors with an intuitive JSON editor
19
+ - 📈 **Historical Data**: Automatically fetch 1 year of price data from Yahoo Finance
20
+ - 🧮 **Full Calculations**:
21
+ - Portfolio weights
22
+ - Log returns
23
+ - Covariance matrix
24
+ - Portfolio variance and volatility
25
+ - 📐 **Beautiful LaTeX Formulas**: See every calculation step with symbolic and numerical formulas
26
+ - 📊 **Detailed Variance Expansion**: Step-by-step breakdown showing how each asset contributes to portfolio risk
27
+ - 🎚️ **Interactive Rebalancing**: Adjust portfolio amounts with sliders and see volatility update in real-time
28
+
29
+ ## How to Use
30
+
31
+ 1. **Upload Portfolio Screenshot**: Take a screenshot of your portfolio (must show ticker symbols and dollar amounts)
32
+ 2. **Edit Portfolio JSON**: Review and correct any OCR errors in the JSON editor
33
+ 3. **Validate Portfolio**: Click "Validate Portfolio" to start analysis
34
+ 4. **View Results**: See historical data, covariance matrix, and detailed formulas
35
+ 5. **Rebalance**: Use interactive sliders to adjust positions and see impact on volatility
36
+
37
+ ## Technical Details
38
+
39
+ ### Formula Highlights
40
+
41
+ **Portfolio Variance:**
42
+ ```
43
+ σ²_p = w^T × Σ × w
44
+ ```
45
+
46
+ Where:
47
+ - `w` = vector of portfolio weights
48
+ - `Σ` = covariance matrix (annualized)
49
+
50
+ **Portfolio Volatility:**
51
+ ```
52
+ σ_p = √(σ²_p)
53
+ ```
54
+
55
+ ### Architecture
56
+
57
+ - **Frontend**: Streamlit
58
+ - **OCR**: Tesseract (pytesseract)
59
+ - **Financial Data**: yfinance (Yahoo Finance)
60
+ - **Math**: NumPy, Pandas, SymPy
61
+ - **Deployment**: Docker on Hugging Face Spaces
62
+
63
+ ## Local Development
64
+
65
+ ### Prerequisites
66
+ - Python 3.11+
67
+ - Tesseract OCR installed
68
+
69
+ ### Setup
70
+ ```bash
71
+ # Install dependencies
72
+ pip install -r requirements.txt
73
+
74
+ # Run the app
75
+ streamlit run app.py
76
+ ```
77
+
78
+ ### Docker Build
79
+ ```bash
80
+ # Build
81
+ docker build -t portfolio-analyzer .
82
+
83
+ # Run
84
+ docker run -p 7860:7860 portfolio-analyzer
85
+ ```
86
+
87
+ ## Example Portfolio
88
+
89
+ Test the app with this JSON:
90
+ ```json
91
+ {
92
+ "AAPL": 5000,
93
+ "GOOGL": 3000,
94
+ "MSFT": 2000
95
+ }
96
+ ```
97
+
98
+ ## Notes
99
+
100
+ - Uses 252 trading days for annualization
101
+ - Calculates log returns: ln(P_t / P_{t-1})
102
+ - Smart truncation for portfolios with 4+ tickers
103
+ - 1-hour cache for historical data to reduce API calls
104
+
105
+ ## Built With
106
+
107
+ - Modern Portfolio Theory
108
+ - LaTeX mathematical notation
109
+ - Real-time financial data
110
+
111
  ---
112
 
113
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Portfolio Volatility Analyzer - Main Streamlit Application
3
+
4
+ Features:
5
+ - OCR parsing of portfolio screenshots
6
+ - Editable portfolio JSON
7
+ - Financial calculations (weights, returns, covariance, variance, volatility)
8
+ - Beautiful LaTeX formula displays for all calculations
9
+ - Interactive sliders for portfolio rebalancing
10
+ - Real-time recalculation
11
+ """
12
+
13
+ import streamlit as st
14
+ from PIL import Image
15
+ import json
16
+
17
+ # Import our modules
18
+ import ocr_parser
19
+ import portfolio_calculator
20
+ import formula_generator
21
+
22
+
23
+ # Page configuration
24
+ st.set_page_config(
25
+ page_title="Portfolio Volatility Analyzer",
26
+ page_icon="📊",
27
+ layout="wide",
28
+ initial_sidebar_state="expanded"
29
+ )
30
+
31
+
32
+ # Initialize session state
33
+ if 'portfolio_data' not in st.session_state:
34
+ st.session_state.portfolio_data = None
35
+ if 'portfolio_validated' not in st.session_state:
36
+ st.session_state.portfolio_validated = False
37
+ if 'metrics' not in st.session_state:
38
+ st.session_state.metrics = None
39
+ if 'show_all_terms' not in st.session_state:
40
+ st.session_state.show_all_terms = False
41
+
42
+
43
+ # Main title and description
44
+ st.title("📊 Portfolio Volatility Analyzer with OCR")
45
+ st.markdown("""
46
+ Analyze your investment portfolio risk using **modern portfolio theory**.
47
+
48
+ **Features:**
49
+ - 📸 Upload portfolio screenshot for automatic OCR parsing
50
+ - ✏️ Edit portfolio data as JSON
51
+ - 📈 Fetch historical price data automatically
52
+ - 🧮 Calculate portfolio volatility with detailed mathematical formulas
53
+ - 🎚️ Interactive sliders for real-time portfolio rebalancing
54
+ """)
55
+
56
+ st.divider()
57
+
58
+
59
+ # ========================================
60
+ # Section 1: Portfolio Input
61
+ # ========================================
62
+
63
+ st.header("1️⃣ Portfolio Input")
64
+
65
+ # Create two columns for upload and manual entry
66
+ col1, col2 = st.columns([1, 1])
67
+
68
+ with col1:
69
+ st.subheader("📸 Upload Screenshot")
70
+ uploaded_file = st.file_uploader(
71
+ "Upload portfolio screenshot (PNG, JPG, JPEG)",
72
+ type=["png", "jpg", "jpeg"],
73
+ help="Upload a screenshot of your portfolio with ticker symbols and amounts"
74
+ )
75
+
76
+ if uploaded_file:
77
+ # Display uploaded image
78
+ image = Image.open(uploaded_file)
79
+ st.image(image, caption="Uploaded Portfolio Screenshot", use_container_width=True)
80
+
81
+ # OCR processing
82
+ with st.spinner("Extracting text from image..."):
83
+ text, error = ocr_parser.extract_text_from_image(image)
84
+
85
+ if error:
86
+ st.error(f"❌ {error}")
87
+ else:
88
+ # Show extracted text
89
+ with st.expander("📄 Extracted Text"):
90
+ st.text(text)
91
+
92
+ # Parse portfolio
93
+ portfolio = ocr_parser.parse_portfolio(text)
94
+
95
+ if portfolio:
96
+ st.success(f"✅ Found {len(portfolio)} tickers")
97
+ st.session_state.portfolio_data = portfolio
98
+ else:
99
+ st.warning("⚠️ No valid tickers found. Please edit manually below.")
100
+ st.session_state.portfolio_data = {}
101
+
102
+ with col2:
103
+ st.subheader("✏️ Edit Portfolio (JSON)")
104
+
105
+ # Get initial JSON value
106
+ if st.session_state.portfolio_data is not None:
107
+ initial_json = ocr_parser.format_portfolio_json(st.session_state.portfolio_data)
108
+ else:
109
+ # Default example
110
+ initial_json = json.dumps({
111
+ "AAPL": 5000,
112
+ "GOOGL": 3000,
113
+ "MSFT": 2000
114
+ }, indent=2)
115
+
116
+ # Editable text area
117
+ edited_json = st.text_area(
118
+ "Portfolio (JSON format)",
119
+ value=initial_json,
120
+ height=300,
121
+ help="Edit the portfolio in JSON format: {\"TICKER\": amount, ...}"
122
+ )
123
+
124
+ # Validate button
125
+ if st.button("✅ Validate Portfolio", type="primary"):
126
+ is_valid, portfolio, error = ocr_parser.validate_portfolio_json(edited_json)
127
+
128
+ if is_valid:
129
+ st.session_state.portfolio_data = portfolio
130
+ st.session_state.portfolio_validated = True
131
+ st.success(f"✅ Portfolio validated! {len(portfolio)} tickers ready for analysis.")
132
+ else:
133
+ st.error(f"❌ {error}")
134
+ st.session_state.portfolio_validated = False
135
+
136
+ st.divider()
137
+
138
+
139
+ # ========================================
140
+ # Section 2: Portfolio Analysis
141
+ # ========================================
142
+
143
+ if st.session_state.portfolio_validated and st.session_state.portfolio_data:
144
+
145
+ st.header("2️⃣ Portfolio Analysis")
146
+
147
+ portfolio = st.session_state.portfolio_data
148
+ tickers = list(portfolio.keys())
149
+
150
+ # Display current portfolio
151
+ st.subheader("Current Portfolio")
152
+ col1, col2, col3 = st.columns(3)
153
+ with col1:
154
+ st.metric("Tickers", len(tickers))
155
+ with col2:
156
+ total_value = sum(portfolio.values())
157
+ st.metric("Total Value", f"${total_value:,.2f}")
158
+ with col3:
159
+ st.metric("Data Period", "1 year")
160
+
161
+ # Fetch data and calculate metrics
162
+ with st.spinner("🔄 Fetching historical data and calculating metrics..."):
163
+ metrics, error = portfolio_calculator.get_portfolio_metrics(portfolio, period="1y")
164
+
165
+ if error:
166
+ st.error(f"❌ {error}")
167
+ st.stop()
168
+
169
+ # Store metrics in session state
170
+ st.session_state.metrics = metrics
171
+
172
+ st.success("✅ Analysis complete!")
173
+
174
+ st.divider()
175
+
176
+ # ========================================
177
+ # Section 3: Data Display
178
+ # ========================================
179
+
180
+ st.header("3️⃣ Historical Data")
181
+
182
+ # Portfolio Weights
183
+ st.subheader("📊 Portfolio Weights")
184
+ weights_df = [(ticker, f"{weight*100:.2f}%") for ticker, weight in metrics['weights'].items()]
185
+ st.table(weights_df)
186
+
187
+ # Historical Prices
188
+ st.subheader("📈 Historical Prices (Last 5 Days)")
189
+ st.dataframe(metrics['prices'].tail(), use_container_width=True)
190
+
191
+ # Returns
192
+ with st.expander("📉 Daily Log Returns (Last 5 Days)"):
193
+ st.dataframe(metrics['returns'].tail(), use_container_width=True)
194
+
195
+ # Covariance Matrix
196
+ st.subheader("🔢 Covariance Matrix (Annualized)")
197
+ st.dataframe(metrics['cov_matrix'] * 252, use_container_width=True)
198
+
199
+ st.divider()
200
+
201
+ # ========================================
202
+ # Section 4: Mathematical Formulas
203
+ # ========================================
204
+
205
+ st.header("4️⃣ Mathematical Formulas")
206
+
207
+ # Generate all formulas
208
+ formulas = formula_generator.generate_all_formulas(
209
+ amounts=portfolio,
210
+ weights=metrics['weights'],
211
+ cov_matrix=metrics['cov_matrix'],
212
+ variance=metrics['variance'],
213
+ volatility=metrics['volatility'],
214
+ variance_breakdown=metrics['variance_breakdown']
215
+ )
216
+
217
+ # Weight Formulas
218
+ st.subheader("⚖️ Portfolio Weights")
219
+ st.markdown("**Symbolic Formula:**")
220
+ st.latex(formulas['weights_symbolic'])
221
+ st.markdown("**Numerical Calculation:**")
222
+ st.latex(formulas['weights_numerical'])
223
+
224
+ # Covariance Matrix
225
+ st.subheader("📊 Covariance Matrix (Annualized)")
226
+ st.latex(formulas['covariance_matrix'])
227
+
228
+ # Correlation Matrix
229
+ with st.expander("🔗 Correlation Matrix"):
230
+ st.latex(formulas['correlation_matrix'])
231
+
232
+ # Variance Formula
233
+ st.subheader("📐 Portfolio Variance")
234
+ st.markdown("**Symbolic Formula:**")
235
+ st.latex(formulas['variance_symbolic'])
236
+
237
+ st.markdown("**Detailed Expansion:**")
238
+ st.latex(formulas['variance_expanded'])
239
+
240
+ # Toggle for full expansion
241
+ if st.checkbox("🔍 Show all variance terms (no truncation)", value=False):
242
+ st.markdown("**Complete Expansion (All Terms):**")
243
+ st.latex(formulas['variance_expanded_full'])
244
+
245
+ # Volatility Formula
246
+ st.subheader("📊 Portfolio Volatility")
247
+ st.markdown("**Symbolic Formula:**")
248
+ st.latex(formulas['volatility_symbolic'])
249
+ st.markdown("**Numerical Result:**")
250
+ st.latex(formulas['volatility_numerical'])
251
+
252
+ st.divider()
253
+
254
+ # ========================================
255
+ # Section 5: Final Results
256
+ # ========================================
257
+
258
+ st.header("5️⃣ Final Results")
259
+
260
+ col1, col2, col3 = st.columns(3)
261
+
262
+ with col1:
263
+ st.metric(
264
+ label="Portfolio Variance",
265
+ value=f"{metrics['variance']:.6f}",
266
+ help="Annualized portfolio variance"
267
+ )
268
+
269
+ with col2:
270
+ st.metric(
271
+ label="Portfolio Volatility",
272
+ value=f"{metrics['volatility']:.4f}",
273
+ help="Annualized portfolio standard deviation (σ)"
274
+ )
275
+
276
+ with col3:
277
+ st.metric(
278
+ label="Volatility (%)",
279
+ value=f"{metrics['volatility']*100:.2f}%",
280
+ help="Annualized volatility as percentage"
281
+ )
282
+
283
+ st.divider()
284
+
285
+ # ========================================
286
+ # Section 6: Interactive Rebalancing
287
+ # ========================================
288
+
289
+ st.header("6️⃣ Interactive Portfolio Rebalancing")
290
+
291
+ st.markdown("""
292
+ **Adjust portfolio amounts** using the sliders below to see how volatility changes in real-time.
293
+ """)
294
+
295
+ # Create sliders for each ticker
296
+ new_amounts = {}
297
+ slider_cols = st.columns(min(len(tickers), 3)) # Max 3 columns
298
+
299
+ for idx, ticker in enumerate(tickers):
300
+ col_idx = idx % len(slider_cols)
301
+ with slider_cols[col_idx]:
302
+ original_amount = portfolio[ticker]
303
+ new_amount = st.slider(
304
+ f"{ticker}",
305
+ min_value=0.0,
306
+ max_value=original_amount * 3, # Allow up to 3x original
307
+ value=original_amount,
308
+ step=100.0,
309
+ format="$%.0f",
310
+ key=f"slider_{ticker}"
311
+ )
312
+ new_amounts[ticker] = new_amount
313
+
314
+ # Check if amounts changed
315
+ amounts_changed = any(new_amounts[t] != portfolio[t] for t in tickers)
316
+
317
+ if amounts_changed:
318
+ st.subheader("🔄 Recalculated Metrics")
319
+
320
+ # Recalculate with new amounts
321
+ with st.spinner("Recalculating..."):
322
+ new_metrics, error = portfolio_calculator.get_portfolio_metrics(new_amounts, period="1y")
323
+
324
+ if error:
325
+ st.error(f"❌ {error}")
326
+ else:
327
+ # Display new results
328
+ col1, col2 = st.columns(2)
329
+
330
+ with col1:
331
+ st.markdown("**New Portfolio Weights:**")
332
+ for ticker, weight in new_metrics['weights'].items():
333
+ st.write(f"{ticker}: {weight*100:.2f}%")
334
+
335
+ with col2:
336
+ st.markdown("**New Volatility:**")
337
+ st.metric(
338
+ label="Updated Volatility",
339
+ value=f"{new_metrics['volatility']*100:.2f}%",
340
+ delta=f"{(new_metrics['volatility'] - metrics['volatility'])*100:.2f}%",
341
+ delta_color="inverse" # Lower volatility is better
342
+ )
343
+
344
+ else:
345
+ # Show instructions if portfolio not validated
346
+ st.info("👆 Please upload a portfolio screenshot or enter portfolio data above, then click 'Validate Portfolio' to begin analysis.")
347
+
348
+ st.divider()
349
+
350
+ # ========================================
351
+ # Footer
352
+ # ========================================
353
+
354
+ st.markdown("---")
355
+ st.markdown("""
356
+ <div style='text-align: center; color: gray;'>
357
+ <p>Built with ❤️ using Streamlit | Powered by Modern Portfolio Theory</p>
358
+ <p><small>Data source: Yahoo Finance (yfinance) | OCR: Tesseract</small></p>
359
+ </div>
360
+ """, unsafe_allow_html=True)
formula_generator.py ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LaTeX formula generation module using sympy.
3
+
4
+ Handles:
5
+ - Generating symbolic mathematical formulas
6
+ - Creating LaTeX representations for all calculations
7
+ - Detailed variance expansion with smart truncation
8
+ - Both symbolic and numerical formula variants
9
+ """
10
+
11
+ from typing import Dict, List, Tuple
12
+ import pandas as pd
13
+ import numpy as np
14
+ from sympy import symbols, Matrix, sqrt, latex
15
+
16
+
17
+ def generate_weight_formulas(
18
+ weights: Dict[str, float],
19
+ amounts: Dict[str, float]
20
+ ) -> Tuple[str, str]:
21
+ """
22
+ Generate weight calculation formulas.
23
+
24
+ Returns both symbolic and numerical versions.
25
+
26
+ Args:
27
+ weights: Calculated weights {ticker: weight}
28
+ amounts: Original amounts {ticker: amount}
29
+
30
+ Returns:
31
+ Tuple of (symbolic_latex, numerical_latex)
32
+ """
33
+ tickers = list(weights.keys())
34
+ total = sum(amounts.values())
35
+
36
+ # Symbolic formula
37
+ symbolic = r"w_i = \frac{\text{amount}_i}{\sum_j \text{amount}_j}"
38
+
39
+ # Numerical formula with actual values
40
+ numerical_lines = []
41
+ for ticker in tickers:
42
+ amt = amounts[ticker]
43
+ wt = weights[ticker]
44
+ line = f"w_{{{ticker}}} = \\frac{{{amt:.2f}}}{{{total:.2f}}} = {wt:.4f}"
45
+ numerical_lines.append(line)
46
+
47
+ numerical = "\\begin{aligned}\n"
48
+ numerical += " \\\\\n".join(numerical_lines)
49
+ numerical += "\n\\end{aligned}"
50
+
51
+ return symbolic, numerical
52
+
53
+
54
+ def generate_covariance_matrix_latex(
55
+ cov_matrix: pd.DataFrame,
56
+ annualized: bool = True
57
+ ) -> str:
58
+ """
59
+ Generate LaTeX representation of covariance matrix.
60
+
61
+ Args:
62
+ cov_matrix: Covariance matrix DataFrame
63
+ annualized: Whether to show annualized values
64
+
65
+ Returns:
66
+ LaTeX string for the matrix
67
+ """
68
+ tickers = list(cov_matrix.columns)
69
+ n = len(tickers)
70
+
71
+ # Multiply by 252 if annualized
72
+ if annualized:
73
+ cov_values = cov_matrix.values * 252
74
+ else:
75
+ cov_values = cov_matrix.values
76
+
77
+ # Build LaTeX matrix
78
+ latex_str = r"\Sigma = \begin{bmatrix}" + "\n"
79
+
80
+ for i in range(n):
81
+ row_values = []
82
+ for j in range(n):
83
+ value = cov_values[i, j]
84
+ row_values.append(f"{value:.6f}")
85
+ latex_str += " & ".join(row_values)
86
+ if i < n - 1:
87
+ latex_str += r" \\" + "\n"
88
+
89
+ latex_str += "\n" + r"\end{bmatrix}"
90
+
91
+ return latex_str
92
+
93
+
94
+ def generate_variance_formula_symbolic(tickers: List[str]) -> str:
95
+ """
96
+ Generate symbolic variance formula using matrix notation.
97
+
98
+ Formula: σ²_p = w^T × Σ × w
99
+
100
+ Args:
101
+ tickers: List of ticker symbols
102
+
103
+ Returns:
104
+ LaTeX string for symbolic variance formula
105
+ """
106
+ # Matrix form
107
+ matrix_form = r"\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}"
108
+
109
+ # Expanded form
110
+ expanded_form = r"\sigma_p^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} w_i w_j \sigma_{ij}"
111
+
112
+ # Combine both
113
+ latex_str = "\\begin{aligned}\n"
114
+ latex_str += matrix_form + r" \\" + "\n"
115
+ latex_str += expanded_form + "\n"
116
+ latex_str += "\\end{aligned}"
117
+
118
+ return latex_str
119
+
120
+
121
+ def generate_variance_formula_expanded(
122
+ weights: Dict[str, float],
123
+ cov_matrix: pd.DataFrame,
124
+ variance_breakdown: List[Tuple[str, str, float, float, float, float]],
125
+ smart_truncation: bool = True,
126
+ truncation_threshold: int = 4
127
+ ) -> str:
128
+ """
129
+ Generate detailed variance expansion showing all terms.
130
+
131
+ This is the most complex formula generation function.
132
+
133
+ Shows:
134
+ 1. Symbolic expansion term by term
135
+ 2. Numerical substitution
136
+ 3. Intermediate calculations
137
+ 4. Final result
138
+
139
+ With smart truncation: shows first 3-4 terms + "..." + last 2 terms for readability
140
+
141
+ Args:
142
+ weights: Portfolio weights
143
+ cov_matrix: Covariance matrix
144
+ variance_breakdown: List of (ticker_i, ticker_j, w_i, w_j, cov_ij, contribution)
145
+ smart_truncation: Whether to truncate long expansions
146
+ truncation_threshold: Number of tickers before truncation kicks in
147
+
148
+ Returns:
149
+ LaTeX string with full variance expansion
150
+ """
151
+ tickers = list(weights.keys())
152
+ n = len(tickers)
153
+
154
+ # Determine if we should truncate
155
+ should_truncate = smart_truncation and n >= truncation_threshold
156
+
157
+ # Step 1: Build symbolic terms
158
+ symbolic_terms = []
159
+ for ticker_i, ticker_j, w_i, w_j, cov_ij, contrib in variance_breakdown:
160
+ if ticker_i == ticker_j:
161
+ # Diagonal term: w_i^2 × σ_ii
162
+ term = f"w_{{{ticker_i}}}^2 \\sigma_{{{ticker_i}{ticker_j}}}"
163
+ else:
164
+ # Off-diagonal term: w_i × w_j × σ_ij
165
+ term = f"w_{{{ticker_i}}} w_{{{ticker_j}}} \\sigma_{{{ticker_i}{ticker_j}}}"
166
+ symbolic_terms.append(term)
167
+
168
+ # Step 2: Build numerical substitution terms
169
+ numerical_terms = []
170
+ for ticker_i, ticker_j, w_i, w_j, cov_ij, contrib in variance_breakdown:
171
+ if ticker_i == ticker_j:
172
+ # Diagonal: (w_i)^2 × cov_ij
173
+ num = f"({w_i:.4f})^2 \\times {cov_ij:.6f}"
174
+ else:
175
+ # Off-diagonal: w_i × w_j × cov_ij
176
+ num = f"({w_i:.4f}) \\times ({w_j:.4f}) \\times {cov_ij:.6f}"
177
+ numerical_terms.append(num)
178
+
179
+ # Step 3: Build intermediate values
180
+ intermediate_values = [f"{contrib:.6f}" for (_, _, _, _, _, contrib) in variance_breakdown]
181
+
182
+ # Step 4: Calculate total
183
+ total_variance = sum(contrib for (_, _, _, _, _, contrib) in variance_breakdown)
184
+
185
+ # Apply smart truncation if needed
186
+ if should_truncate:
187
+ # Show first 3-4 terms, ..., last 2 terms
188
+ num_show_start = 3
189
+ num_show_end = 2
190
+
191
+ symbolic_display = (
192
+ symbolic_terms[:num_show_start]
193
+ + [r"\cdots"]
194
+ + symbolic_terms[-num_show_end:]
195
+ )
196
+
197
+ numerical_display = (
198
+ numerical_terms[:num_show_start]
199
+ + [r"\cdots"]
200
+ + numerical_terms[-num_show_end:]
201
+ )
202
+
203
+ intermediate_display = (
204
+ intermediate_values[:num_show_start]
205
+ + [r"\cdots"]
206
+ + intermediate_values[-num_show_end:]
207
+ )
208
+ else:
209
+ symbolic_display = symbolic_terms
210
+ numerical_display = numerical_terms
211
+ intermediate_display = intermediate_values
212
+
213
+ # Build the aligned LaTeX
214
+ latex_str = "\\begin{aligned}\n"
215
+
216
+ # Line 1: Symbolic expansion
217
+ latex_str += r"\sigma_p^2 &= " + " + ".join(symbolic_display) + r" \\" + "\n"
218
+
219
+ # Line 2: Numerical substitution
220
+ latex_str += r" &= " + " + ".join(numerical_display) + r" \\" + "\n"
221
+
222
+ # Line 3: Intermediate calculations
223
+ latex_str += r" &= " + " + ".join(intermediate_display) + r" \\" + "\n"
224
+
225
+ # Line 4: Final result
226
+ latex_str += f" &= {total_variance:.6f}\n"
227
+
228
+ latex_str += "\\end{aligned}"
229
+
230
+ return latex_str
231
+
232
+
233
+ def generate_variance_formula_expanded_full(
234
+ weights: Dict[str, float],
235
+ cov_matrix: pd.DataFrame,
236
+ variance_breakdown: List[Tuple[str, str, float, float, float, float]]
237
+ ) -> str:
238
+ """
239
+ Generate FULL variance expansion without truncation.
240
+
241
+ Use this for "Show all terms" toggle.
242
+
243
+ Args:
244
+ weights: Portfolio weights
245
+ cov_matrix: Covariance matrix
246
+ variance_breakdown: List of (ticker_i, ticker_j, w_i, w_j, cov_ij, contribution)
247
+
248
+ Returns:
249
+ LaTeX string with complete variance expansion
250
+ """
251
+ # Just call the main function with truncation disabled
252
+ return generate_variance_formula_expanded(
253
+ weights,
254
+ cov_matrix,
255
+ variance_breakdown,
256
+ smart_truncation=False
257
+ )
258
+
259
+
260
+ def generate_volatility_formulas(
261
+ variance: float,
262
+ volatility: float
263
+ ) -> Tuple[str, str]:
264
+ """
265
+ Generate volatility calculation formulas.
266
+
267
+ Returns both symbolic and numerical versions.
268
+
269
+ Args:
270
+ variance: Calculated portfolio variance
271
+ volatility: Calculated portfolio volatility
272
+
273
+ Returns:
274
+ Tuple of (symbolic_latex, numerical_latex)
275
+ """
276
+ # Symbolic formula
277
+ symbolic = r"\sigma_p = \sqrt{\sigma_p^2}"
278
+
279
+ # Numerical formula
280
+ numerical = f"\\sigma_p = \\sqrt{{{variance:.6f}}} = {volatility:.6f} = {volatility*100:.2f}\\%"
281
+
282
+ return symbolic, numerical
283
+
284
+
285
+ def generate_correlation_matrix_latex(cov_matrix: pd.DataFrame) -> str:
286
+ """
287
+ Generate correlation matrix from covariance matrix.
288
+
289
+ Correlation: ρ_ij = σ_ij / (σ_i × σ_j)
290
+
291
+ Args:
292
+ cov_matrix: Covariance matrix
293
+
294
+ Returns:
295
+ LaTeX string for correlation matrix
296
+ """
297
+ # Calculate correlation matrix
298
+ std_devs = np.sqrt(np.diag(cov_matrix))
299
+ corr_matrix = cov_matrix / np.outer(std_devs, std_devs)
300
+
301
+ tickers = list(cov_matrix.columns)
302
+ n = len(tickers)
303
+
304
+ # Build LaTeX matrix
305
+ latex_str = r"\text{Correlation Matrix} = \begin{bmatrix}" + "\n"
306
+
307
+ for i in range(n):
308
+ row_values = []
309
+ for j in range(n):
310
+ value = corr_matrix.iloc[i, j]
311
+ row_values.append(f"{value:.4f}")
312
+ latex_str += " & ".join(row_values)
313
+ if i < n - 1:
314
+ latex_str += r" \\" + "\n"
315
+
316
+ latex_str += "\n" + r"\end{bmatrix}"
317
+
318
+ return latex_str
319
+
320
+
321
+ def generate_all_formulas(
322
+ amounts: Dict[str, float],
323
+ weights: Dict[str, float],
324
+ cov_matrix: pd.DataFrame,
325
+ variance: float,
326
+ volatility: float,
327
+ variance_breakdown: List[Tuple[str, str, float, float, float, float]]
328
+ ) -> Dict[str, str]:
329
+ """
330
+ Generate all LaTeX formulas for the portfolio analysis.
331
+
332
+ This is the orchestrator function that generates all formula variants.
333
+
334
+ Args:
335
+ amounts: Portfolio amounts {ticker: amount}
336
+ weights: Portfolio weights {ticker: weight}
337
+ cov_matrix: Covariance matrix
338
+ variance: Portfolio variance
339
+ volatility: Portfolio volatility
340
+ variance_breakdown: Detailed variance breakdown
341
+
342
+ Returns:
343
+ Dictionary of LaTeX strings:
344
+ {
345
+ 'weights_symbolic': str,
346
+ 'weights_numerical': str,
347
+ 'covariance_matrix': str,
348
+ 'correlation_matrix': str,
349
+ 'variance_symbolic': str,
350
+ 'variance_expanded': str,
351
+ 'variance_expanded_full': str,
352
+ 'volatility_symbolic': str,
353
+ 'volatility_numerical': str
354
+ }
355
+ """
356
+ tickers = list(weights.keys())
357
+
358
+ # Generate all formula components
359
+ weights_symbolic, weights_numerical = generate_weight_formulas(weights, amounts)
360
+
361
+ covariance_matrix = generate_covariance_matrix_latex(cov_matrix, annualized=True)
362
+
363
+ correlation_matrix = generate_correlation_matrix_latex(cov_matrix)
364
+
365
+ variance_symbolic = generate_variance_formula_symbolic(tickers)
366
+
367
+ variance_expanded = generate_variance_formula_expanded(
368
+ weights,
369
+ cov_matrix,
370
+ variance_breakdown,
371
+ smart_truncation=True
372
+ )
373
+
374
+ variance_expanded_full = generate_variance_formula_expanded_full(
375
+ weights,
376
+ cov_matrix,
377
+ variance_breakdown
378
+ )
379
+
380
+ volatility_symbolic, volatility_numerical = generate_volatility_formulas(
381
+ variance,
382
+ volatility
383
+ )
384
+
385
+ return {
386
+ 'weights_symbolic': weights_symbolic,
387
+ 'weights_numerical': weights_numerical,
388
+ 'covariance_matrix': covariance_matrix,
389
+ 'correlation_matrix': correlation_matrix,
390
+ 'variance_symbolic': variance_symbolic,
391
+ 'variance_expanded': variance_expanded,
392
+ 'variance_expanded_full': variance_expanded_full,
393
+ 'volatility_symbolic': volatility_symbolic,
394
+ 'volatility_numerical': volatility_numerical,
395
+ }
ocr_parser.py ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ OCR and portfolio parsing module.
3
+
4
+ Handles:
5
+ - Text extraction from portfolio screenshots using Tesseract OCR
6
+ - Parsing tickers and amounts using regex
7
+ - JSON validation for user-edited portfolio data
8
+ """
9
+
10
+ import re
11
+ import json
12
+ from typing import Dict, Tuple, Optional
13
+ from PIL import Image
14
+ import pytesseract
15
+
16
+
17
+ # Regex pattern for ticker extraction: ([A-Z]{1,5})\s+([\d,.]+)
18
+ # Matches: 1-5 uppercase letters followed by whitespace and a number (with optional commas)
19
+ TICKER_PATTERN = r'([A-Z]{1,5})\s+([\d,.]+)'
20
+
21
+
22
+ def extract_text_from_image(image: Image.Image) -> Tuple[Optional[str], Optional[str]]:
23
+ """
24
+ Extract text from uploaded portfolio screenshot using Tesseract OCR.
25
+
26
+ Args:
27
+ image: PIL Image object
28
+
29
+ Returns:
30
+ Tuple of (extracted_text, error_message)
31
+ - If successful: (text, None)
32
+ - If failed: (None, error_message)
33
+ """
34
+ try:
35
+ # Verify tesseract is available
36
+ pytesseract.get_tesseract_version()
37
+
38
+ # Extract text
39
+ text = pytesseract.image_to_string(image)
40
+
41
+ # Check if any text was detected
42
+ if not text.strip():
43
+ return None, "No text detected in image. Please upload a clearer screenshot."
44
+
45
+ return text, None
46
+
47
+ except pytesseract.TesseractNotFoundError:
48
+ return None, "OCR engine (Tesseract) not available. Please check installation."
49
+ except Exception as e:
50
+ return None, f"OCR failed: {str(e)}"
51
+
52
+
53
+ def parse_portfolio(text: str) -> Dict[str, float]:
54
+ """
55
+ Parse portfolio from extracted text using regex.
56
+
57
+ Pattern: ([A-Z]{1,5})\\s+([\\d,.]+)
58
+ Extracts ticker symbols (1-5 uppercase letters) and amounts (numbers with optional commas).
59
+
60
+ Args:
61
+ text: Extracted text from OCR
62
+
63
+ Returns:
64
+ Dictionary mapping tickers to amounts: {ticker: amount}
65
+ Returns empty dict if no valid tickers found
66
+ """
67
+ if not text:
68
+ return {}
69
+
70
+ # Find all matches of pattern
71
+ matches = re.findall(TICKER_PATTERN, text)
72
+
73
+ if not matches:
74
+ return {}
75
+
76
+ portfolio = {}
77
+
78
+ for ticker, amount_str in matches:
79
+ try:
80
+ # Remove commas from numbers (e.g., "1,234.56" -> "1234.56")
81
+ clean_amount = amount_str.replace(",", "")
82
+ amount = float(clean_amount)
83
+
84
+ # Only include positive amounts
85
+ if amount > 0:
86
+ portfolio[ticker] = amount
87
+
88
+ except ValueError:
89
+ # Skip invalid number formats
90
+ continue
91
+
92
+ return portfolio
93
+
94
+
95
+ def validate_portfolio_json(json_str: str) -> Tuple[bool, Optional[Dict[str, float]], str]:
96
+ """
97
+ Validate user-edited portfolio JSON.
98
+
99
+ Expected format: {"AAPL": 5000, "GOOGL": 3000, ...}
100
+
101
+ Args:
102
+ json_str: JSON string to validate
103
+
104
+ Returns:
105
+ Tuple of (is_valid, parsed_dict, error_message)
106
+ - If valid: (True, portfolio_dict, "")
107
+ - If invalid: (False, None, error_message)
108
+ """
109
+ if not json_str or not json_str.strip():
110
+ return False, None, "JSON is empty"
111
+
112
+ try:
113
+ # Parse JSON
114
+ data = json.loads(json_str)
115
+
116
+ # Validate it's a dictionary
117
+ if not isinstance(data, dict):
118
+ return False, None, "JSON must be a dictionary/object, not a list or other type"
119
+
120
+ # Validate all keys are strings and all values are numbers
121
+ portfolio = {}
122
+ for ticker, amount in data.items():
123
+ # Check ticker is string
124
+ if not isinstance(ticker, str):
125
+ return False, None, f"Ticker '{ticker}' must be a string"
126
+
127
+ # Check ticker is uppercase (optional validation)
128
+ if not ticker.isupper():
129
+ return False, None, f"Ticker '{ticker}' should be uppercase (e.g., 'AAPL' not 'aapl')"
130
+
131
+ # Check ticker length (1-5 characters is typical)
132
+ if len(ticker) < 1 or len(ticker) > 10:
133
+ return False, None, f"Ticker '{ticker}' length should be between 1-10 characters"
134
+
135
+ # Check amount is numeric
136
+ try:
137
+ amount_float = float(amount)
138
+ except (TypeError, ValueError):
139
+ return False, None, f"Amount for {ticker} must be a number, got: {amount}"
140
+
141
+ # Check amount is positive
142
+ if amount_float <= 0:
143
+ return False, None, f"Amount for {ticker} must be positive, got: {amount_float}"
144
+
145
+ portfolio[ticker] = amount_float
146
+
147
+ # Check we have at least one ticker
148
+ if len(portfolio) == 0:
149
+ return False, None, "Portfolio must contain at least one ticker"
150
+
151
+ # Check we don't exceed maximum tickers (optional limit)
152
+ MAX_TICKERS = 20
153
+ if len(portfolio) > MAX_TICKERS:
154
+ return False, None, f"Portfolio exceeds maximum of {MAX_TICKERS} tickers"
155
+
156
+ return True, portfolio, ""
157
+
158
+ except json.JSONDecodeError as e:
159
+ return False, None, f"Invalid JSON format: {str(e)}"
160
+ except Exception as e:
161
+ return False, None, f"Validation error: {str(e)}"
162
+
163
+
164
+ def format_portfolio_json(portfolio: Dict[str, float], indent: int = 2) -> str:
165
+ """
166
+ Format portfolio dictionary as pretty-printed JSON.
167
+
168
+ Args:
169
+ portfolio: Dictionary of {ticker: amount}
170
+ indent: Number of spaces for indentation
171
+
172
+ Returns:
173
+ Formatted JSON string
174
+ """
175
+ return json.dumps(portfolio, indent=indent, sort_keys=True)
portfolio_calculator.py ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Portfolio financial calculations module.
3
+
4
+ Handles:
5
+ - Fetching historical price data from yfinance
6
+ - Calculating portfolio weights
7
+ - Calculating log returns
8
+ - Computing covariance matrix
9
+ - Calculating portfolio variance and volatility
10
+ - Generating variance breakdown for detailed formulas
11
+ """
12
+
13
+ from typing import Dict, List, Tuple, Optional
14
+ import numpy as np
15
+ import pandas as pd
16
+ import yfinance as yf
17
+ import streamlit as st
18
+
19
+
20
+ # Constants
21
+ TRADING_DAYS_PER_YEAR = 252
22
+ MIN_DATA_POINTS = 30
23
+ MAX_TICKERS = 20
24
+
25
+
26
+ @st.cache_data(ttl=3600) # Cache for 1 hour
27
+ def fetch_historical_data(
28
+ tickers: Tuple[str, ...], # Tuple for hashability (caching requirement)
29
+ period: str = "1y"
30
+ ) -> Tuple[Optional[pd.DataFrame], Optional[str]]:
31
+ """
32
+ Fetch historical price data using yfinance.
33
+
34
+ Args:
35
+ tickers: Tuple of ticker symbols (e.g., ('AAPL', 'GOOGL', 'MSFT'))
36
+ period: Time period for historical data (default: '1y')
37
+
38
+ Returns:
39
+ Tuple of (prices_dataframe, error_message)
40
+ - If successful: (DataFrame, None)
41
+ - If failed: (None, error_message)
42
+ """
43
+ try:
44
+ # Convert tuple back to list for yfinance
45
+ ticker_list = list(tickers)
46
+
47
+ # Download data (progress=False to avoid console output in Streamlit)
48
+ data = yf.download(ticker_list, period=period, progress=False)
49
+
50
+ # Check if data was returned
51
+ if data.empty:
52
+ return None, "No data returned from yfinance. Please check ticker symbols."
53
+
54
+ # Extract 'Adj Close' prices
55
+ if len(ticker_list) == 1:
56
+ # Single ticker: yfinance returns different structure
57
+ prices = data[['Adj Close']].copy()
58
+ prices.columns = ticker_list
59
+ else:
60
+ # Multiple tickers
61
+ prices = data['Adj Close'].copy()
62
+
63
+ # Check for missing data
64
+ missing_count = prices.isnull().sum()
65
+ if missing_count.sum() > 0:
66
+ missing_tickers = missing_count[missing_count > 0]
67
+ warning = f"Warning: Missing data detected - {dict(missing_tickers)}"
68
+ # Don't fail, just warn
69
+ st.warning(warning)
70
+
71
+ # Drop rows with NaN values
72
+ prices = prices.dropna()
73
+
74
+ # Check we have enough data points
75
+ if len(prices) < MIN_DATA_POINTS:
76
+ return None, f"Insufficient data: only {len(prices)} days available (minimum {MIN_DATA_POINTS} required)"
77
+
78
+ return prices, None
79
+
80
+ except Exception as e:
81
+ return None, f"Failed to fetch data: {str(e)}"
82
+
83
+
84
+ def calculate_log_returns(prices: pd.DataFrame) -> pd.DataFrame:
85
+ """
86
+ Calculate log returns from price data.
87
+
88
+ Formula: r_t = ln(P_t / P_{t-1})
89
+
90
+ Args:
91
+ prices: DataFrame of historical prices (columns = tickers, index = dates)
92
+
93
+ Returns:
94
+ DataFrame of log returns (first row will be dropped due to NaN)
95
+ """
96
+ # Calculate log returns: ln(price_t / price_{t-1})
97
+ returns = np.log(prices / prices.shift(1))
98
+
99
+ # Drop the first row (NaN)
100
+ returns = returns.dropna()
101
+
102
+ return returns
103
+
104
+
105
+ def calculate_portfolio_weights(amounts: Dict[str, float]) -> Dict[str, float]:
106
+ """
107
+ Calculate portfolio weights from position amounts.
108
+
109
+ Formula: w_i = amount_i / sum(amounts)
110
+
111
+ Args:
112
+ amounts: Dictionary mapping tickers to dollar amounts
113
+
114
+ Returns:
115
+ Dictionary mapping tickers to weights (percentages as decimals)
116
+ """
117
+ total = sum(amounts.values())
118
+
119
+ if total <= 0:
120
+ raise ValueError("Total portfolio amount must be positive")
121
+
122
+ weights = {ticker: amount / total for ticker, amount in amounts.items()}
123
+
124
+ # Validate weights sum to 1.0 (accounting for floating point errors)
125
+ weight_sum = sum(weights.values())
126
+ if not np.isclose(weight_sum, 1.0, atol=1e-6):
127
+ # Normalize to ensure exact sum = 1.0
128
+ weights = {ticker: w / weight_sum for ticker, w in weights.items()}
129
+
130
+ return weights
131
+
132
+
133
+ def calculate_covariance_matrix(returns: pd.DataFrame, annualized: bool = False) -> pd.DataFrame:
134
+ """
135
+ Calculate covariance matrix of returns.
136
+
137
+ Args:
138
+ returns: DataFrame of log returns
139
+ annualized: If True, multiply by TRADING_DAYS_PER_YEAR (default: False)
140
+
141
+ Returns:
142
+ DataFrame of covariance matrix (tickers × tickers)
143
+ """
144
+ cov_matrix = returns.cov()
145
+
146
+ if annualized:
147
+ cov_matrix = cov_matrix * TRADING_DAYS_PER_YEAR
148
+
149
+ return cov_matrix
150
+
151
+
152
+ def calculate_portfolio_variance(
153
+ weights: Dict[str, float],
154
+ cov_matrix: pd.DataFrame,
155
+ annualized: bool = True
156
+ ) -> float:
157
+ """
158
+ Calculate portfolio variance.
159
+
160
+ Formula: σ²_p = w^T × Σ × w
161
+
162
+ Where:
163
+ - w = vector of weights
164
+ - Σ = covariance matrix (annualized)
165
+
166
+ Args:
167
+ weights: Dictionary of portfolio weights
168
+ cov_matrix: Covariance matrix (daily, will be annualized if annualized=True)
169
+ annualized: If True, annualize the covariance matrix (default: True)
170
+
171
+ Returns:
172
+ Portfolio variance (annualized if annualized=True)
173
+ """
174
+ # Ensure tickers are in same order
175
+ tickers = list(weights.keys())
176
+
177
+ # Create weight vector (as numpy array)
178
+ w = np.array([weights[ticker] for ticker in tickers])
179
+
180
+ # Get covariance matrix for these tickers
181
+ cov = cov_matrix.loc[tickers, tickers].values
182
+
183
+ # Annualize if requested
184
+ if annualized:
185
+ cov = cov * TRADING_DAYS_PER_YEAR
186
+
187
+ # Calculate variance: w^T × Σ × w
188
+ variance = w @ cov @ w
189
+
190
+ return float(variance)
191
+
192
+
193
+ def calculate_portfolio_volatility(variance: float) -> float:
194
+ """
195
+ Calculate portfolio volatility (standard deviation).
196
+
197
+ Formula: σ_p = √(σ²_p)
198
+
199
+ Args:
200
+ variance: Portfolio variance
201
+
202
+ Returns:
203
+ Portfolio volatility (standard deviation)
204
+ """
205
+ return float(np.sqrt(variance))
206
+
207
+
208
+ def get_variance_breakdown(
209
+ weights: Dict[str, float],
210
+ cov_matrix: pd.DataFrame,
211
+ annualized: bool = True
212
+ ) -> List[Tuple[str, str, float, float, float, float]]:
213
+ """
214
+ Generate detailed breakdown of variance calculation.
215
+
216
+ Returns a list of all variance components for the detailed formula expansion.
217
+
218
+ Args:
219
+ weights: Dictionary of portfolio weights
220
+ cov_matrix: Covariance matrix (daily)
221
+ annualized: If True, use annualized covariance (default: True)
222
+
223
+ Returns:
224
+ List of tuples: (ticker_i, ticker_j, w_i, w_j, cov_ij, contribution)
225
+ where contribution = w_i × w_j × cov_ij
226
+ """
227
+ tickers = list(weights.keys())
228
+ n = len(tickers)
229
+
230
+ breakdown = []
231
+
232
+ for i, ticker_i in enumerate(tickers):
233
+ for j, ticker_j in enumerate(tickers):
234
+ w_i = weights[ticker_i]
235
+ w_j = weights[ticker_j]
236
+
237
+ # Get covariance value
238
+ cov_ij = cov_matrix.loc[ticker_i, ticker_j]
239
+
240
+ # Annualize if requested
241
+ if annualized:
242
+ cov_ij = cov_ij * TRADING_DAYS_PER_YEAR
243
+
244
+ # Calculate contribution to total variance
245
+ contribution = w_i * w_j * cov_ij
246
+
247
+ breakdown.append((ticker_i, ticker_j, w_i, w_j, cov_ij, contribution))
248
+
249
+ return breakdown
250
+
251
+
252
+ def get_portfolio_metrics(
253
+ amounts: Dict[str, float],
254
+ period: str = "1y"
255
+ ) -> Tuple[Optional[Dict], Optional[str]]:
256
+ """
257
+ Calculate all portfolio metrics in one go.
258
+
259
+ This is a convenience function that orchestrates all calculations.
260
+
261
+ Args:
262
+ amounts: Dictionary of {ticker: amount}
263
+ period: Historical data period (default: '1y')
264
+
265
+ Returns:
266
+ Tuple of (metrics_dict, error_message)
267
+
268
+ metrics_dict contains:
269
+ - weights: Dict[str, float]
270
+ - prices: pd.DataFrame
271
+ - returns: pd.DataFrame
272
+ - cov_matrix: pd.DataFrame
273
+ - variance: float
274
+ - volatility: float
275
+ - variance_breakdown: List[Tuple]
276
+ """
277
+ try:
278
+ tickers = list(amounts.keys())
279
+
280
+ # 1. Calculate weights
281
+ weights = calculate_portfolio_weights(amounts)
282
+
283
+ # 2. Fetch historical data (convert to tuple for caching)
284
+ prices, error = fetch_historical_data(tuple(tickers), period)
285
+ if error:
286
+ return None, error
287
+
288
+ # 3. Calculate returns
289
+ returns = calculate_log_returns(prices)
290
+
291
+ # 4. Calculate covariance matrix
292
+ cov_matrix = calculate_covariance_matrix(returns, annualized=False)
293
+
294
+ # 5. Calculate variance
295
+ variance = calculate_portfolio_variance(weights, cov_matrix, annualized=True)
296
+
297
+ # 6. Calculate volatility
298
+ volatility = calculate_portfolio_volatility(variance)
299
+
300
+ # 7. Get variance breakdown
301
+ variance_breakdown = get_variance_breakdown(weights, cov_matrix, annualized=True)
302
+
303
+ metrics = {
304
+ 'weights': weights,
305
+ 'prices': prices,
306
+ 'returns': returns,
307
+ 'cov_matrix': cov_matrix,
308
+ 'variance': variance,
309
+ 'volatility': volatility,
310
+ 'variance_breakdown': variance_breakdown,
311
+ }
312
+
313
+ return metrics, None
314
+
315
+ except Exception as e:
316
+ return None, f"Error calculating portfolio metrics: {str(e)}"
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.32.0
2
+ pytesseract==0.3.10
3
+ Pillow==10.2.0
4
+ yfinance==0.2.36
5
+ pandas==2.2.0
6
+ numpy==1.26.3
7
+ sympy==1.12
8
+ matplotlib==3.8.2