FinanceCoach / create_finance_knowledge_base.py
Ralitza Mondal
Add Finance Coach multi-agent system
6d2558a
"""
Create Finance Knowledge Base
Builds a FAISS vector database with comprehensive financial knowledge including:
- Financial concepts (stocks, bonds, ETFs, mutual funds, etc.)
- Investment strategies and portfolio management
- Tax concepts and regulations
- Market analysis and indicators
- Financial planning and goal setting
"""
import os
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
load_dotenv()
# Initialize embeddings
embeddings = OpenAIEmbeddings(api_key=os.getenv("OPENAI_API_KEY"))
# ==================== KNOWLEDGE BASE CONTENT ====================
# Financial Concepts
financial_concepts = [
"""
**Stock** is a type of security that represents ownership in a corporation.
- Stock owners (shareholders) are entitled to a portion of the company's assets and earnings
- Two main types: Common stock (voting rights) and Preferred stock (priority dividends)
- Traded on stock exchanges like NYSE, NASDAQ
- Price determined by supply and demand, company performance, market conditions
- Capital gains occur when stock price increases; dividends provide regular income
- Risk: Stock prices can be volatile and may lose value
""",
"""
**Bond** is a fixed-income investment representing a loan made by an investor to a borrower.
- Issuer (borrower) pays periodic interest (coupon) and returns principal at maturity
- Types: Government bonds (Treasury), Corporate bonds, Municipal bonds
- Generally lower risk than stocks but lower potential returns
- Bond prices inversely related to interest rates
- Credit rating affects bond yield and risk (AAA is highest quality)
- Used for income generation and portfolio diversification
""",
"""
**ETF (Exchange-Traded Fund)** is a basket of securities that trades on an exchange like a stock.
- Offers diversification by holding multiple stocks, bonds, or other assets
- Lower fees than mutual funds (typically 0.03% - 0.25%)
- Can be bought/sold throughout trading day at market price
- Popular types: S&P 500 ETFs (SPY, VOO), Bond ETFs, Sector ETFs
- Tax-efficient due to creation/redemption mechanism
- Examples: VOO (Vanguard S&P 500), QQQ (Nasdaq 100), AGG (Bond Aggregate)
""",
"""
**Mutual Fund** is a pooled investment vehicle managed by professional fund managers.
- Investors buy shares of the fund, which invests in diversified portfolio
- Priced once per day after market close (NAV - Net Asset Value)
- Higher fees than ETFs (expense ratios typically 0.5% - 2%)
- Active management (managers pick stocks) or passive (index funds)
- Minimum investment requirements (often $1,000 - $3,000)
- Examples: Vanguard 500 Index Fund, Fidelity Contrafund
""",
"""
**Diversification** is the practice of spreading investments across different assets to reduce risk.
- "Don't put all eggs in one basket" - reduces exposure to any single investment
- Diversify across asset classes (stocks, bonds, real estate, commodities)
- Diversify within asset classes (different sectors, company sizes, countries)
- Modern Portfolio Theory: Optimal diversification improves risk-adjusted returns
- Reduces unsystematic risk (company-specific) but not systematic risk (market-wide)
- Recommended: 60/40 stock/bond allocation for moderate risk, adjust based on age
""",
"""
**Asset Allocation** is the strategy of dividing investment portfolio among different asset categories.
- Primary asset classes: Stocks (equities), Bonds (fixed income), Cash, Real estate
- Determines 90% of portfolio performance variation over time
- Age-based rule: Stock allocation = 110 - your age (e.g., age 30 β†’ 80% stocks)
- Aggressive: 80-100% stocks, Moderate: 60% stocks/40% bonds, Conservative: 40% stocks/60% bonds
- Rebalance annually to maintain target allocation
- Adjust based on risk tolerance, time horizon, financial goals
""",
"""
**Dollar-Cost Averaging (DCA)** is investing fixed amounts at regular intervals regardless of price.
- Reduces impact of market volatility and timing risk
- Example: Invest $500 monthly instead of $6,000 lump sum
- Buys more shares when prices low, fewer when prices high
- Psychologically easier than trying to time the market
- Ideal for 401(k) contributions and regular investing
- May underperform lump-sum investing in rising markets
""",
"""
**Index Fund** is a mutual fund or ETF designed to track a specific market index.
- Passive management: Simply matches index holdings, no active stock picking
- Very low fees (often 0.03% - 0.20%)
- Examples: S&P 500 index funds, Total Stock Market index funds
- Consistently outperforms most actively managed funds over long term
- Warren Buffett recommends index funds for most investors
- Popular: Vanguard Total Stock Market (VTI), S&P 500 (VOO, SPY)
""",
"""
**401(k)** is an employer-sponsored retirement savings plan with tax advantages.
- Contributions reduce taxable income (traditional) or grow tax-free (Roth)
- 2024 contribution limit: $23,000 ($30,500 if age 50+)
- Many employers offer matching contributions (free money!)
- Withdrawal penalty if taken before age 59Β½ (10% + taxes)
- Required Minimum Distributions (RMDs) start at age 73
- Always contribute enough to get full employer match
""",
"""
**IRA (Individual Retirement Account)** is a tax-advantaged retirement account.
- Traditional IRA: Tax-deductible contributions, taxed upon withdrawal
- Roth IRA: After-tax contributions, tax-free qualified withdrawals
- 2024 contribution limit: $7,000 ($8,000 if age 50+)
- Roth IRA income limits: $161,000 single, $240,000 married (2024)
- Withdrawals before 59Β½ may incur penalties
- Roth IRA has no RMDs during owner's lifetime
""",
"""
**Compound Interest** is earning interest on both principal and accumulated interest.
- "The most powerful force in the universe" - Einstein (allegedly)
- Formula: A = P(1 + r/n)^(nt) where P=principal, r=rate, n=compounds/year, t=years
- Example: $10,000 at 8% for 30 years = $100,627
- Time is crucial: Starting 10 years earlier can double retirement savings
- Applies to investments, savings, and debt (credit cards compound against you)
- Key to long-term wealth building
""",
"""
**Expense Ratio** is the annual fee charged by mutual funds and ETFs.
- Expressed as percentage of assets (e.g., 0.10% = $10 per $10,000 invested)
- Directly reduces investment returns every year
- 1% difference in fees can cost hundreds of thousands over lifetime
- Index funds: 0.03% - 0.20%, Active funds: 0.50% - 2.00%
- Lower is better - every 0.10% matters over decades
- Total market ETFs often have lowest ratios (0.03% - 0.04%)
""",
"""
**Capital Gains** is profit from selling an asset for more than purchase price.
- Short-term: Held ≀1 year, taxed as ordinary income (10%-37%)
- Long-term: Held >1 year, preferential tax rates (0%, 15%, or 20%)
- 2024 long-term rates: 0% up to $44k single/$89k married, 15% up to $492k/$553k, 20% above
- Capital losses can offset gains (tax-loss harvesting strategy)
- $3,000 annual limit for deducting losses against ordinary income
- Hold investments >1 year when possible for tax efficiency
""",
"""
**Dividend** is a portion of company profits paid to shareholders.
- Paid quarterly by many large, established companies
- Dividend Yield = Annual dividend / Stock price (e.g., 3% yield)
- Qualified dividends taxed at capital gains rates (0%, 15%, 20%)
- Dividend aristocrats: Companies increasing dividends 25+ consecutive years
- Dividend stocks provide income but may have lower growth
- Examples: Coca-Cola, Johnson & Johnson, Procter & Gamble
""",
"""
**P/E Ratio (Price-to-Earnings)** measures stock valuation relative to earnings.
- Formula: Stock Price / Earnings Per Share
- S&P 500 historical average: ~16-17
- High P/E (>25): Stock may be overvalued or high-growth expectations
- Low P/E (<15): Stock may be undervalued or facing challenges
- Compare P/E to industry peers and company's historical average
- Limitations: Doesn't account for growth, debt, or industry differences
""",
"""
**Market Capitalization** is total value of company's outstanding shares.
- Formula: Current Stock Price Γ— Total Shares Outstanding
- Large-cap: >$10B (Microsoft, Apple), more stable, lower growth
- Mid-cap: $2B-$10B, balance of growth and stability
- Small-cap: $300M-$2B, higher growth potential, higher volatility
- Micro-cap: <$300M, very high risk
- Diversify across market caps for balanced portfolio
- S&P 500 contains large-cap stocks
""",
"""
**Bull Market** is a period of rising stock prices and investor optimism.
- Typically defined as 20%+ increase from recent low
- Characterized by strong economy, low unemployment, investor confidence
- Average bull market lasts 4-5 years (historically)
- Strategy: Stay invested, maintain discipline, avoid overconfidence
- Don't try to predict the top - remain diversified
- Historic examples: 2009-2020 (longest bull market)
""",
"""
**Bear Market** is a period of falling stock prices and pessimism.
- Typically defined as 20%+ decline from recent high
- Caused by recession, high inflation, economic crisis, pandemic
- Average bear market lasts 9-18 months
- Strategy: Don't panic sell, continue investing (stocks on sale)
- Historically, markets always recover and reach new highs
- Historic examples: 2008 financial crisis, 2020 COVID crash
""",
"""
**Recession** is a significant decline in economic activity lasting several months.
- Technical definition: Two consecutive quarters of negative GDP growth
- Indicators: Rising unemployment, declining consumer spending, business closures
- Average recession lasts 10-18 months
- Stock market often declines before recession begins (leading indicator)
- Investment strategy: Maintain course, consider defensive stocks, bonds
- Historically occurs every 5-10 years
""",
"""
**Inflation** is the rate at which general prices for goods and services rise.
- Measured by CPI (Consumer Price Index) and PCE (Personal Consumption Expenditures)
- Federal Reserve targets 2% annual inflation
- High inflation (>4%) erodes purchasing power and savings
- Deflation (negative inflation) can harm economy
- Stocks and real estate historically outpace inflation long-term
- TIPS (Treasury Inflation-Protected Securities) adjust for inflation
""",
"""
**Federal Reserve (The Fed)** is the central banking system of the United States.
- Sets monetary policy to promote maximum employment and price stability
- Primary tool: Federal Funds Rate (interest rate for bank lending)
- Raising rates: Slows economy, fights inflation, may lower stock prices
- Lowering rates: Stimulates economy, may boost stocks, increases inflation risk
- FOMC (Federal Open Market Committee) meets 8 times per year
- Fed decisions significantly impact all financial markets
""",
"""
**Interest Rate** is the cost of borrowing money or return on savings/bonds.
- Set by Federal Reserve (Fed Funds Rate) influences all other rates
- Higher rates: Savings accounts pay more, bonds attractive, stocks may fall
- Lower rates: Cheaper borrowing, stocks more attractive, savings earn less
- Mortgage rates, auto loans, credit cards all follow Fed rate direction
- Bond prices inversely related to rates (rates ↑, bond prices ↓)
- Affects your investment returns, loan costs, retirement planning
""",
"""
**Risk Tolerance** is your ability and willingness to endure investment losses.
- Factors: Age, income, financial goals, personality, investment timeline
- High risk tolerance: Can handle volatility, longer time horizon (20+ years)
- Low risk tolerance: Prefer stability, shorter timeline, near retirement
- Questionnaire helps determine: conservative, moderate, or aggressive
- Risk capacity (financial ability) may differ from risk appetite (emotional comfort)
- Match portfolio allocation to risk tolerance (aggressive = more stocks)
""",
"""
**Emergency Fund** is savings reserved for unexpected expenses or income loss.
- Recommended: 3-6 months of living expenses
- Keep in high-yield savings account (HYSA) - currently 4-5% APY
- Must be liquid (easily accessible) - not invested in stocks
- First financial priority before investing
- Prevents needing to sell investments at wrong time or use high-interest debt
- Examples: Job loss, medical emergency, car/home repairs
""",
"""
**Rebalancing** is adjusting portfolio back to target asset allocation.
- Example: 60/40 stocks/bonds becomes 70/30 after stocks rise β†’ sell stocks, buy bonds
- Enforces "buy low, sell high" discipline
- Recommended frequency: Annually or when allocation drifts 5%+
- Methods: Sell winners/buy losers, or direct new contributions to lagging assets
- In tax-advantaged accounts (401k, IRA) to avoid capital gains taxes
- Maintains desired risk level and prevents overexposure
""",
"""
**Tax-Loss Harvesting** is selling investments at a loss to offset capital gains.
- Reduces tax burden by offsetting gains with losses
- Can deduct up to $3,000 in net losses against ordinary income annually
- Excess losses carry forward to future years indefinitely
- Wash sale rule: Can't buy "substantially identical" security within 30 days
- Best done in taxable brokerage accounts (not IRAs/401ks)
- Automated by robo-advisors like Betterment, Wealthfront
""",
"""
**Roth Conversion** is transferring money from traditional IRA to Roth IRA.
- Pay taxes now on converted amount at current tax rate
- Future withdrawals completely tax-free (qualified distributions)
- Strategic during low-income years or when tax rates low
- No income limits for conversions (unlike Roth IRA contributions)
- Consider tax bracket impact - may push into higher bracket
- Five-year rule: Each conversion has 5-year clock for penalty-free withdrawal
""",
"""
**HSA (Health Savings Account)** is a triple-tax-advantaged medical savings account.
- Requires high-deductible health plan (HDHP)
- 2024 limits: $4,150 individual, $8,300 family (+$1,000 if 55+)
- Triple tax benefit: Tax-deductible contributions, tax-free growth, tax-free medical withdrawals
- Can invest HSA funds in stocks/bonds like IRA
- No "use it or lose it" - funds roll over indefinitely
- After 65, can withdraw for non-medical (taxed like traditional IRA)
""",
"""
**Backdoor Roth IRA** is a strategy for high-income earners to contribute to Roth IRA.
- Step 1: Contribute to traditional IRA (non-deductible)
- Step 2: Immediately convert to Roth IRA
- Bypasses Roth IRA income limits ($161k single, $240k married in 2024)
- Pro-rata rule: If you have pre-tax IRA money, conversion partially taxable
- Requires careful execution and tax reporting
- Legal strategy explicitly allowed by IRS
""",
"""
**Mega Backdoor Roth** is advanced strategy to contribute large amounts to Roth accounts.
- Use after-tax 401(k) contributions beyond normal $23,000 limit
- Total 401(k) limit: $69,000 in 2024 (including employer match)
- Convert after-tax contributions to Roth 401(k) or Roth IRA
- Not all 401(k) plans allow this - check plan rules
- Can contribute $46,000+ additional to Roth annually if available
- Complex but powerful for high earners with access
""",
]
# Investment Strategies
investment_strategies = [
"""
**Buy and Hold Strategy** is investing for long term without frequent trading.
- Core principle: Time in market beats timing the market
- Reduces transaction costs, taxes, and emotional decisions
- Ignore short-term volatility and market noise
- S&P 500 historical return: ~10% annually over long periods
- Best for index fund investors and retirement accounts
- Warren Buffett's favorite strategy
""",
"""
**Three-Fund Portfolio** is a simple diversification strategy using just 3 index funds.
- Fund 1: Total US Stock Market (e.g., VTI, VTSAX) - 60%
- Fund 2: Total International Stock (e.g., VXUS, VTIAX) - 30%
- Fund 3: Total Bond Market (e.g., BND, VBTLX) - 10%
- Adjust percentages based on age and risk tolerance
- Extremely low cost, simple to manage, well-diversified
- Popularized by Bogleheads investment philosophy
""",
"""
**Target-Date Fund** is an all-in-one fund that adjusts allocation as target date approaches.
- Automatically becomes more conservative over time (glide path)
- Example: 2060 Target Date fund (for ~2060 retirement)
- Young: 90% stocks, 10% bonds β†’ Near retirement: 30% stocks, 70% bonds
- Extremely simple - one fund for entire retirement
- Slightly higher fees than DIY three-fund portfolio (0.12% vs 0.04%)
- Popular in 401(k) plans - good "set and forget" option
""",
"""
**Value Investing** is buying undervalued stocks trading below intrinsic value.
- Focus on low P/E ratios, high dividend yields, strong fundamentals
- Philosophy: Market overreacts, creating opportunities
- Famous practitioners: Warren Buffett, Benjamin Graham
- Look for strong balance sheets, consistent earnings, competitive advantages
- Requires patience - may underperform during growth stock rallies
- Value stocks historically outperform long-term but more cyclical
""",
"""
**Growth Investing** is buying companies expected to grow faster than market average.
- Characteristics: High P/E ratios, revenue growth, innovation
- Often tech companies (historically): Amazon, Apple, Google, Tesla
- Higher risk and volatility than value stocks
- May not pay dividends (reinvest profits for growth)
- Can outperform dramatically in bull markets
- More sensitive to interest rate changes
""",
"""
**4% Rule** is retirement withdrawal strategy to make savings last 30 years.
- Withdraw 4% of portfolio in first year of retirement
- Adjust subsequent withdrawals for inflation
- Example: $1M portfolio β†’ $40,000 first year withdrawal
- Based on historical success rates (95% success over 30 years)
- More conservative: 3-3.5% rule for longer retirements
- Adjust based on market conditions and spending needs
""",
]
# Tax Concepts
tax_concepts = [
"""
**Standard Deduction** is the fixed amount reducing taxable income without itemizing.
- 2024: $14,600 single, $29,200 married filing jointly
- Most taxpayers use standard deduction (simpler)
- Itemize only if deductions exceed standard deduction
- Common itemized deductions: Mortgage interest, state taxes (SALT), charitable donations
- Standard deduction indexed to inflation annually
- Simplifies tax filing for most Americans
""",
"""
**Tax Brackets** are ranges of income taxed at increasing rates (progressive taxation).
- 2024 Federal brackets: 10%, 12%, 22%, 24%, 32%, 35%, 37%
- Only income within bracket taxed at that rate (marginal tax rate)
- Example: Single filer earning $50k pays 10% on first $11k, 12% on $11k-$47k, 22% on rest
- Moving into higher bracket doesn't increase tax on lower income
- Effective tax rate (average) always lower than marginal rate
- Long-term capital gains have separate preferential brackets
""",
"""
**Tax-Advantaged Accounts** are investment accounts with special tax benefits.
- Traditional 401(k)/IRA: Tax deduction now, taxed later
- Roth 401(k)/IRA: Taxed now, tax-free forever
- HSA: Triple tax advantage (deductible, grows tax-free, withdrawals tax-free for medical)
- 529: Tax-free growth and withdrawals for education
- Priority: Max 401(k) match > IRA/Roth IRA > Max 401(k) > HSA > Taxable brokerage
- Massive long-term tax savings through compound growth
""",
"""
**FICA Taxes** are payroll taxes funding Social Security and Medicare.
- Social Security: 6.2% employee + 6.2% employer (12.4% total)
- Medicare: 1.45% employee + 1.45% employer (2.9% total)
- 2024 Social Security wage cap: $168,600 (Medicare uncapped)
- Additional 0.9% Medicare tax on high earners ($200k+ single, $250k+ married)
- Self-employed pay full 15.3% but get deduction for half
- Separate from income tax - automatically withheld from paychecks
""",
]
# Market Indicators
market_indicators = [
"""
**S&P 500** is stock market index tracking 500 largest US companies.
- Represents ~80% of total US stock market value
- Market-cap weighted (larger companies have more influence)
- Top holdings: Apple, Microsoft, Amazon, Google, NVIDIA
- Considered best gauge of overall US stock market health
- Historical return: ~10% annually including dividends
- Popular investment vehicles: SPY, VOO, IVV index funds
""",
"""
**Dow Jones Industrial Average (DJIA)** tracks 30 large US blue-chip companies.
- Price-weighted (higher stock price = more influence, unusual)
- Oldest US market index (created 1896)
- Less representative than S&P 500 (only 30 stocks)
- Includes: Apple, Microsoft, Boeing, Disney, Goldman Sachs
- Frequently cited in media but less useful for investors
- Popular ETF: DIA
""",
"""
**NASDAQ Composite** is index of all stocks listed on NASDAQ exchange.
- Heavy technology concentration (~50% tech stocks)
- Includes: Apple, Microsoft, Amazon, Google, Tesla, Meta
- More volatile than S&P 500 due to growth stock concentration
- Strong indicator of technology sector performance
- Popular tech-focused ETF: QQQ (Nasdaq-100)
- Higher growth potential but higher risk
""",
"""
**VIX (Volatility Index)** measures expected stock market volatility.
- Called "fear gauge" or "fear index"
- Based on S&P 500 options prices (30-day forward looking)
- Normal: 12-20, Elevated: 20-30, High: 30+, Extreme: 40+
- Typically spikes during market crashes and uncertainly
- 2020 COVID: VIX hit 80+, 2008 crisis: 80+
- Inverse correlation with stocks (VIX up = stocks down usually)
""",
"""
**Treasury Yield Curve** plots yields of US Treasury bonds across maturities.
- Normal curve: Long-term yields higher than short-term (upward sloping)
- Inverted curve: Short-term yields exceed long-term (downward sloping)
- Inverted curve predicts recession within 6-24 months (historically accurate)
- 2s/10s spread most watched (2-year vs 10-year Treasury)
- Reflects investor expectations for interest rates and economy
- Fed controls short end, market controls long end
""",
]
# Financial Planning
financial_planning = [
"""
**FIRE (Financial Independence, Retire Early)** is movement to achieve financial independence through aggressive saving.
- Save 50-70% of income, invest in low-cost index funds
- Target: 25x annual expenses invested (4% rule)
- Example: Need $40k/year β†’ Save $1M β†’ Retire early
- Variations: Lean FIRE (<$40k/year), Fat FIRE ($100k+/year), Barista FIRE (part-time work)
- Requires high income, low expenses, aggressive investing
- Typical timeline: 10-20 years depending on savings rate
""",
"""
**Net Worth** is total assets minus total liabilities.
- Assets: Cash, investments, home equity, retirement accounts
- Liabilities: Mortgage, student loans, credit cards, car loans
- Formula: Net Worth = Assets - Liabilities
- Track quarterly or annually to measure financial progress
- Average US net worth: $192k (median: $121k)
- Focus on increasing assets and decreasing liabilities over time
""",
"""
**Debt Avalanche** is debt repayment strategy targeting highest interest rate first.
- Pay minimums on all debts, extra to highest interest rate debt
- Mathematically optimal - saves most on interest
- Example: Pay off 20% credit card before 6% car loan before 4% student loan
- Slower psychological wins than debt snowball method
- Best for financially disciplined individuals
- Can save thousands in interest over time
""",
"""
**Debt Snowball** is debt repayment strategy targeting smallest balance first.
- Pay minimums on all debts, extra to smallest balance
- Quick psychological wins motivate continued progress
- Example: Pay off $500 credit card before $5,000 car loan before $30,000 student loan
- May cost more in interest than avalanche method
- Popularized by Dave Ramsey
- Better for those needing motivation and quick wins
""",
"""
**Credit Score** is numerical representation of creditworthiness (300-850).
- Factors: Payment history (35%), amounts owed (30%), length of history (15%), new credit (10%), types (10%)
- Excellent: 750+, Good: 700-749, Fair: 650-699, Poor: <650
- Affects: Loan approval, interest rates, insurance premiums, apartment rentals
- Improve by: Pay on time, keep utilization <30%, don't close old cards, limit new applications
- Check free annually: AnnualCreditReport.com (official site)
- Takes time to build - length of history matters
""",
]
# Combine all content
all_documents = (
financial_concepts +
investment_strategies +
tax_concepts +
market_indicators +
financial_planning
)
# Create Document objects
documents = [
Document(page_content=content.strip(), metadata={"source": "finance_knowledge"})
for content in all_documents
]
print(f"πŸ“š Creating knowledge base with {len(documents)} documents...")
# Create FAISS vector store
db = FAISS.from_documents(documents, embeddings)
# Save to disk
output_path = "./knowledge_base/faiss_index"
os.makedirs(output_path, exist_ok=True)
db.save_local(output_path)
print(f"βœ… Knowledge base created successfully!")
print(f"πŸ“ Saved to: {output_path}")
print(f"πŸ“Š Total documents: {len(documents)}")
print(f"πŸ’Ύ Vector database ready for use!")