Spaces:
Running
IRIS-AI Ticker Validation System
Developer reference for the multi-layer ticker validation system introduced in the
feat: add comprehensive error codes, edge case handling, and graceful degradation
commit series.
Table of Contents
- Architecture Overview
- Validation Flow
- Error Codes
- API Reference
- Local Ticker Database
- Configuration
- Troubleshooting
- Testing
Architecture Overview
User Input
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 0 β Input Sanitisation β
β ticker_validator.sanitize_ticker_input() β
β β’ Strip $/#/ticker: prefixes β
β β’ Remove trailing "stock"/"etf"/"shares" β
β β’ Collapse internal whitespace β
β β’ Enforce 20-char hard cap β
β β’ Uppercase β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β cleaned string
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1 β Format Validation (instant) β
β ticker_validator.validate_ticker_format() β
β β’ Regex: ^[A-Z]{1,5}(\.[A-Z]{1,2})?$ β
β β’ Rejects crypto tickers (BTC, ETH, β¦) β
β β’ Rejects reserved words (NULL, TEST, β¦) β
β β’ No network I/O β always fast β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β valid format
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 2 β Local SEC Database β
β ticker_db.is_known_ticker() β
β β’ In-memory set loaded from β
β data/valid_tickers.json β
β β’ ~13 000 SEC-registered tickers β
β β’ Refreshed every 24 h in the background β
β β’ Thread-safe reads (threading.RLock) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β lookup result
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 3 β Live yfinance API (cached) β
β ticker_validator._cached_api_lookup() β
β β’ lru_cache(maxsize=512) β
β β’ Fetches info + 5-day history β
β β’ Detects OTC / pink-sheet listings β
β β’ Graceful degradation if API is down β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β TickerValidationResult
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 4 β Data Guardrails β
β data_fetcher.fetch_market_data() β
β prompt_builder.build_risk_analysis_prompt()β
β β’ Anchors LLM to real price / market-cap β
β β’ Sanity-checks LLM output post-generation β
βββββββββββββββββββββββββββββββββββββββββββββββ
Files:
| File | Role |
|---|---|
ticker_validator.py |
Layers 0β3: sanitisation, format check, DB probe, API lookup |
ticker_db.py |
Local SEC ticker database: load, refresh, search, similarity |
ticker_scheduler.py |
Background 24 h refresh timer |
data_fetcher.py |
Layer 4: real market data for LLM grounding |
prompt_builder.py |
Layer 4: grounded prompt construction + output sanity check |
app.py |
Flask wiring, rate limiter, API endpoints |
static/tickerValidation.js |
Client-side Layers 0β1 (mirrors Python, no network) |
Validation Flow
When a user types a ticker and presses Enter
Client-side (instant)
sanitizeTicker(raw)strips prefixes/spaces, uppercases.validateTickerFormat(cleaned)runs the regex and crypto/reserved-word checks.- If format fails: inline hint shown immediately β no network call.
Server-side
POST /api/validate-ticker- Rate limit checked (30 req / 60 s per IP).
validate_ticker()runs all four layers.- Response includes
valid,ticker,company_name,warning,code,suggestions.
If valid,
GET /api/analyze?ticker=AAPL- Validation runs again server-side (defence-in-depth).
fetch_market_data(ticker)gets live price, market cap, P/E, 52-week range.build_risk_analysis_prompt(ticker, company_name, market_data)produces a data-grounded LLM prompt.iris_app.run_one_ticker(ticker)runs the full analysis pipeline.validate_llm_output(text, market_data)sanity-checks any pre-built insights.- Response includes
market_dataandgrounded_promptalongside analysis.
Graceful degradation fallback chain
yfinance OK?
YES β return API result
NO β
ticker in local DB?
YES β return valid + warning("verified offline")
NO β
local DB available?
YES β return error(API_TIMEOUT / API_ERROR)
NO β return error(API_ERROR, "both services unavailable")
Error Codes
All rejection responses carry a code field for structured handling.
| Code | HTTP | Meaning | Typical user-facing message |
|---|---|---|---|
EMPTY_INPUT |
200 | Ticker string is empty after sanitisation | "Please enter a stock ticker symbol." |
INVALID_FORMAT |
200 | Doesn't match ^[A-Z]{1,5}(\.[A-Z]{1,2})?$ |
"Tickers are 1β5 letters, with an optional class suffix (e.g., BRK.B)." |
RESERVED_WORD |
200 | Crypto ticker or reserved word (NULL, TESTβ¦) | "IRIS-AI analyzes stocks and ETFs. For cryptocurrency analysis, please use a crypto-specific platform." |
TICKER_NOT_FOUND |
200 | Passes format but unknown to both DB and API | "Ticker was not found. Please check the symbol and try again." |
TICKER_DELISTED |
200 | Company found but no recent trading data | "Appears to be delisted or has no recent trading data." |
API_TIMEOUT |
200 | yfinance timed out, ticker not in local DB | "Cannot verify this ticker right now. Please try again." |
API_ERROR |
200 | Network error or both services down | "Validation services are temporarily unavailable." |
RATE_LIMITED |
429 | IP exceeded 30 requests / 60 s | "Too many requests. Please wait before trying again." |
DATA_FETCH_FAILED |
502 | market data fetch failed before LLM call | "Could not retrieve market data. Please try again later." |
INTERNAL_ERROR |
500 | Unhandled exception in analysis pipeline | "An internal error occurred during analysis." |
Python constant: ticker_validator.ErrorCode.FIELD_NAME
JavaScript constant: TickerValidation.ErrorCodes.FIELD_NAME
API Reference
POST /api/validate-ticker
Real-time ticker validation. Returns HTTP 200 for both valid and invalid results (only 429 on rate-limit).
Request
{ "ticker": "AAPL" }
Response β valid
{
"valid": true,
"ticker": "AAPL",
"company_name": "Apple Inc.",
"warning": ""
}
Response β invalid
{
"valid": false,
"error": "Ticker \"XYZZY\" was not found. Please check the symbol and try again.",
"code": "TICKER_NOT_FOUND",
"suggestions": ["XYZ", "XYZT"]
}
Response β rate limited (HTTP 429)
{
"error": "Too many requests. Please wait before trying again.",
"code": "RATE_LIMITED"
}
GET /api/analyze?ticker=AAPL
Full analysis endpoint. Runs all validation layers, fetches market data, calls LLM pipeline.
Query parameters
| Parameter | Default | Description |
|---|---|---|
ticker |
(required) | Stock ticker symbol |
timeframe |
β | Preset: 1D, 5D, 1M, 6M, YTD, 1Y, 5Y |
period |
60d |
yfinance period string (used when timeframe is absent) |
interval |
1d |
yfinance interval string |
Response β success (200)
{
"ticker": "AAPL",
"risk_score": 42,
"llm_insights": { ... },
"market_data": {
"ticker": "AAPL",
"company_name": "Apple Inc.",
"current_price": 185.50,
"market_cap": 2900000000000,
"pe_ratio": 28.5,
"52_week_high": 199.62,
"52_week_low": 124.17
},
"grounded_prompt": "Analyze AAPL (Apple Inc.). Current price: $185.5. ..."
}
Response β validation failure (422)
{
"valid": false,
"error": "...",
"code": "TICKER_NOT_FOUND",
"suggestions": ["..."]
}
GET /api/tickers/search?q=APP
Typeahead autocomplete. Returns up to 8 matching tickers from the local DB.
Response
[
{ "ticker": "AAPL", "name": "Apple Inc.", "exchange": "Nasdaq" },
{ "ticker": "APP", "name": "Applovin Corp", "exchange": "Nasdaq" }
]
GET /api/health
Service health check. Reports ticker DB status, age, and staleness.
Response (200)
{
"status": "healthy",
"ticker_db_loaded": true,
"ticker_count": 13247,
"ticker_db_age_hours": 3.2,
"ticker_db_stale": false
}
POST /api/admin/refresh-ticker-db
Trigger a manual ticker database refresh (downloads fresh SEC data).
Response
{
"status": "success",
"added": 12,
"removed": 3,
"total": 13256
}
Local Ticker Database
How it works
The database is a flat JSON array of uppercase ticker symbols downloaded from the SEC EDGAR company tickers endpoint. It covers all SEC-registered companies (~13 000 symbols).
On first startup, run_startup_checks() detects a missing or severely outdated file and
triggers a download. Subsequent refreshes run in a background daemon thread every 24 hours
via ticker_scheduler.py.
File locations
| File | Purpose |
|---|---|
data/valid_tickers.json |
Canonical ticker set (sorted JSON array) |
data/valid_tickers.lock |
filelock lock file β prevents concurrent writes |
Thread safety
- Reads are protected by
threading.RLock(_cache_lockinticker_db.py). - Writes use a temp file +
os.replace()atomic rename, so a crash mid-write never leaves a corrupt file. - The
filelock.FileLockprevents two processes from writing simultaneously (relevant when running multiple workers under gunicorn).
Manually refreshing
# Via API (running server)
curl -X POST http://localhost:5000/api/admin/refresh-ticker-db
# Via Python
from ticker_db import refresh_ticker_db
result = refresh_ticker_db()
print(result) # {'added': 5, 'removed': 2, 'total': 13250}
Startup integrity checks (run_startup_checks)
| Condition | Action |
|---|---|
valid_tickers.json missing |
Synchronous download (blocks startup briefly) |
| File older than 7 days | Background refresh (non-blocking) |
| Fewer than 5 000 tickers loaded | Background re-initialisation |
Configuration
All constants are defined in their respective source files. There are no environment variables specific to the validation system.
ticker_validator.py
| Constant | Value | Description |
|---|---|---|
_MAX_RAW_LENGTH |
20 |
Hard cap on raw input length before sanitisation |
lru_cache(maxsize=...) |
512 |
Maximum cached yfinance lookups |
_TICKER_RE |
^[A-Z]{1,5}(\.[A-Z]{1,2})?$ |
Valid ticker format regex |
To add a new crypto or reserved word, extend _CRYPTO_TICKERS or _RESERVED_WORDS
in ticker_validator.py and the matching sets in static/tickerValidation.js.
app.py
| Constant | Value | Description |
|---|---|---|
_RATE_LIMIT_MAX |
30 |
Max requests per IP per window |
_RATE_LIMIT_WINDOW |
60 |
Window size in seconds |
ticker_scheduler.py
| Constant | Value | Description |
|---|---|---|
_REFRESH_INTERVAL_SECONDS |
86400 (24 h) |
Background DB refresh interval |
ticker_db.py
| Constant | Value | Description |
|---|---|---|
_SEC_URL |
SEC EDGAR endpoint | Source for ticker data |
_DATA_FILE |
data/valid_tickers.json |
Local cache path |
is_db_stale(threshold_hours=48.0) |
48 h | Age at which DB is considered stale |
Troubleshooting
"Validation services are temporarily unavailable"
Both yfinance and the local DB failed. This is rare. Check:
data/valid_tickers.jsonexists and is readable.- No other process is holding
data/valid_tickers.lockindefinitely. - Network connectivity to
sec.govandquery1.finance.yahoo.com.
Ticker DB not loading on startup
startup checks failed: [Errno 13] Permission denied: 'data/valid_tickers.json'
Ensure the process user has read/write access to the data/ directory.
"BTC is not found" instead of crypto rejection message
The _CRYPTO_TICKERS set in ticker_validator.py may be out of sync with
static/tickerValidation.js. Both sets must be kept identical β add/remove
symbols in both files.
yfinance API returning empty info for real tickers
yfinance occasionally returns {} for valid tickers during outages or rate limiting.
When this happens and the ticker is in the local DB, the system degrades gracefully
and returns valid: true with a warning field. The frontend renders this as a
yellow advisory rather than an error.
Rate limit hit during automated testing
The rate limiter is per-IP and in-memory. In tests, clear app._rate_limit_store
in setUp:
from app import _rate_limit_store
_rate_limit_store.clear()
LRU cache serving stale results in tests
Clear the yfinance lookup cache in setUp:
from ticker_validator import _cached_api_lookup
_cached_api_lookup.cache_clear()
Testing
Test files
| File | What it tests | Network required? |
|---|---|---|
tests/test_validation_edge_cases.py |
Unit tests: sanitisation, format, error codes, graceful degradation | No β all mocked |
tests/test_e2e_validation.py |
End-to-end: full requestβvalidationβresponse flow via Flask test client | No β all mocked |
Running the tests
# All validation tests
python -m unittest tests/test_validation_edge_cases.py tests/test_e2e_validation.py -v
# Edge cases only
python -m unittest tests/test_validation_edge_cases.py -v
# E2E only
python -m unittest tests/test_e2e_validation.py -v
What each E2E test covers
| Test | Scenario |
|---|---|
test_e2e_valid_ticker_full_flow |
AAPL passes all layers; market data and grounded prompt appear in response |
test_e2e_invalid_ticker_blocked |
XYZZY blocked at Layer 3; LLM never called |
test_e2e_format_error_never_hits_backend |
123!!! blocked at Layer 1; yfinance never called |
test_e2e_suggestion_is_valid |
Typo "AAPPL" returns suggestions; first suggestion itself passes validation |
test_e2e_concurrent_requests |
10 simultaneous requests via asyncio.gather; all succeed without race conditions |
test_e2e_rate_limiting |
35 rapid requests; first 30 return 200, next 5 return 429 |
Writing new tests
Follow these conventions:
Always clear shared state in
setUp:from app import _rate_limit_store from ticker_validator import _cached_api_lookup def setUp(self): _rate_limit_store.clear() _cached_api_lookup.cache_clear()Mock at the module boundary, not inside the function:
# Correct β patches what ticker_validator.py imports with patch("ticker_validator.yf.Ticker", return_value=mock): # Wrong β patches yfinance globally with patch("yfinance.Ticker", return_value=mock):Always mock
ticker_validator.is_known_tickeralongsideyf.Tickerto control which layer the test exercises.For analyze-endpoint tests, mock
app.iris_appto avoid spinning up the full IRIS pipeline (slow, requires model files):mock_iris = MagicMock() mock_iris.run_one_ticker.return_value = {"ticker": "AAPL", ...} with patch("app.iris_app", mock_iris): ...