File size: 16,852 Bytes
7cbea93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
# IRIS-AI Ticker Validation System

Developer reference for the multi-layer ticker validation system introduced in the
`feat: add comprehensive error codes, edge case handling, and graceful degradation`
commit series.

---

## Table of Contents

1. [Architecture Overview](#architecture-overview)
2. [Validation Flow](#validation-flow)
3. [Error Codes](#error-codes)
4. [API Reference](#api-reference)
5. [Local Ticker Database](#local-ticker-database)
6. [Configuration](#configuration)
7. [Troubleshooting](#troubleshooting)
8. [Testing](#testing)

---

## Architecture Overview

```
User Input
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 0 – Input Sanitisation               β”‚
β”‚  ticker_validator.sanitize_ticker_input()   β”‚
β”‚  β€’ Strip $/#/ticker: prefixes               β”‚
β”‚  β€’ Remove trailing "stock"/"etf"/"shares"   β”‚
β”‚  β€’ Collapse internal whitespace             β”‚
β”‚  β€’ Enforce 20-char hard cap                 β”‚
β”‚  β€’ Uppercase                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ cleaned string
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 1 – Format Validation  (instant)     β”‚
β”‚  ticker_validator.validate_ticker_format()  β”‚
β”‚  β€’ Regex: ^[A-Z]{1,5}(\.[A-Z]{1,2})?$      β”‚
β”‚  β€’ Rejects crypto tickers (BTC, ETH, …)     β”‚
β”‚  β€’ Rejects reserved words (NULL, TEST, …)   β”‚
β”‚  β€’ No network I/O β€” always fast             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ valid format
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 2 – Local SEC Database               β”‚
β”‚  ticker_db.is_known_ticker()                β”‚
β”‚  β€’ In-memory set loaded from               β”‚
β”‚    data/valid_tickers.json                  β”‚
β”‚  β€’ ~13 000 SEC-registered tickers           β”‚
β”‚  β€’ Refreshed every 24 h in the background   β”‚
β”‚  β€’ Thread-safe reads (threading.RLock)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ lookup result
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 3 – Live yfinance API  (cached)      β”‚
β”‚  ticker_validator._cached_api_lookup()      β”‚
β”‚  β€’ lru_cache(maxsize=512)                   β”‚
β”‚  β€’ Fetches info + 5-day history             β”‚
β”‚  β€’ Detects OTC / pink-sheet listings        β”‚
β”‚  β€’ Graceful degradation if API is down      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ TickerValidationResult
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 4 – Data Guardrails                  β”‚
β”‚  data_fetcher.fetch_market_data()           β”‚
β”‚  prompt_builder.build_risk_analysis_prompt()β”‚
β”‚  β€’ Anchors LLM to real price / market-cap   β”‚
β”‚  β€’ Sanity-checks LLM output post-generation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Files:**

| File | Role |
|---|---|
| `ticker_validator.py` | Layers 0–3: sanitisation, format check, DB probe, API lookup |
| `ticker_db.py` | Local SEC ticker database: load, refresh, search, similarity |
| `ticker_scheduler.py` | Background 24 h refresh timer |
| `data_fetcher.py` | Layer 4: real market data for LLM grounding |
| `prompt_builder.py` | Layer 4: grounded prompt construction + output sanity check |
| `app.py` | Flask wiring, rate limiter, API endpoints |
| `static/tickerValidation.js` | Client-side Layers 0–1 (mirrors Python, no network) |

---

## Validation Flow

### When a user types a ticker and presses Enter

1. **Client-side (instant)**
   - `sanitizeTicker(raw)` strips prefixes/spaces, uppercases.
   - `validateTickerFormat(cleaned)` runs the regex and crypto/reserved-word checks.
   - If format fails: inline hint shown immediately β€” no network call.

2. **Server-side `POST /api/validate-ticker`**
   - Rate limit checked (30 req / 60 s per IP).
   - `validate_ticker()` runs all four layers.
   - Response includes `valid`, `ticker`, `company_name`, `warning`, `code`, `suggestions`.

3. **If valid, `GET /api/analyze?ticker=AAPL`**
   - Validation runs again server-side (defence-in-depth).
   - `fetch_market_data(ticker)` gets live price, market cap, P/E, 52-week range.
   - `build_risk_analysis_prompt(ticker, company_name, market_data)` produces a
     data-grounded LLM prompt.
   - `iris_app.run_one_ticker(ticker)` runs the full analysis pipeline.
   - `validate_llm_output(text, market_data)` sanity-checks any pre-built insights.
   - Response includes `market_data` and `grounded_prompt` alongside analysis.

### Graceful degradation fallback chain

```
yfinance OK?
  YES β†’ return API result
  NO  β†’
        ticker in local DB?
          YES β†’ return valid + warning("verified offline")
          NO  β†’
                local DB available?
                  YES β†’ return error(API_TIMEOUT / API_ERROR)
                  NO  β†’ return error(API_ERROR, "both services unavailable")
```

---

## Error Codes

All rejection responses carry a `code` field for structured handling.

| Code | HTTP | Meaning | Typical user-facing message |
|---|---|---|---|
| `EMPTY_INPUT` | 200 | Ticker string is empty after sanitisation | "Please enter a stock ticker symbol." |
| `INVALID_FORMAT` | 200 | Doesn't match `^[A-Z]{1,5}(\.[A-Z]{1,2})?$` | "Tickers are 1–5 letters, with an optional class suffix (e.g., BRK.B)." |
| `RESERVED_WORD` | 200 | Crypto ticker or reserved word (NULL, TEST…) | "IRIS-AI analyzes stocks and ETFs. For cryptocurrency analysis, please use a crypto-specific platform." |
| `TICKER_NOT_FOUND` | 200 | Passes format but unknown to both DB and API | "Ticker was not found. Please check the symbol and try again." |
| `TICKER_DELISTED` | 200 | Company found but no recent trading data | "Appears to be delisted or has no recent trading data." |
| `API_TIMEOUT` | 200 | yfinance timed out, ticker not in local DB | "Cannot verify this ticker right now. Please try again." |
| `API_ERROR` | 200 | Network error or both services down | "Validation services are temporarily unavailable." |
| `RATE_LIMITED` | 429 | IP exceeded 30 requests / 60 s | "Too many requests. Please wait before trying again." |
| `DATA_FETCH_FAILED` | 502 | market data fetch failed before LLM call | "Could not retrieve market data. Please try again later." |
| `INTERNAL_ERROR` | 500 | Unhandled exception in analysis pipeline | "An internal error occurred during analysis." |

**Python constant:** `ticker_validator.ErrorCode.FIELD_NAME`
**JavaScript constant:** `TickerValidation.ErrorCodes.FIELD_NAME`

---

## API Reference

### `POST /api/validate-ticker`

Real-time ticker validation. Returns HTTP 200 for both valid and invalid results
(only 429 on rate-limit).

**Request**
```json
{ "ticker": "AAPL" }
```

**Response β€” valid**
```json
{
  "valid": true,
  "ticker": "AAPL",
  "company_name": "Apple Inc.",
  "warning": ""
}
```

**Response β€” invalid**
```json
{
  "valid": false,
  "error": "Ticker \"XYZZY\" was not found. Please check the symbol and try again.",
  "code": "TICKER_NOT_FOUND",
  "suggestions": ["XYZ", "XYZT"]
}
```

**Response β€” rate limited (HTTP 429)**
```json
{
  "error": "Too many requests. Please wait before trying again.",
  "code": "RATE_LIMITED"
}
```

---

### `GET /api/analyze?ticker=AAPL`

Full analysis endpoint. Runs all validation layers, fetches market data, calls LLM pipeline.

**Query parameters**

| Parameter | Default | Description |
|---|---|---|
| `ticker` | *(required)* | Stock ticker symbol |
| `timeframe` | β€” | Preset: `1D`, `5D`, `1M`, `6M`, `YTD`, `1Y`, `5Y` |
| `period` | `60d` | yfinance period string (used when `timeframe` is absent) |
| `interval` | `1d` | yfinance interval string |

**Response β€” success (200)**
```json
{
  "ticker": "AAPL",
  "risk_score": 42,
  "llm_insights": { ... },
  "market_data": {
    "ticker": "AAPL",
    "company_name": "Apple Inc.",
    "current_price": 185.50,
    "market_cap": 2900000000000,
    "pe_ratio": 28.5,
    "52_week_high": 199.62,
    "52_week_low": 124.17
  },
  "grounded_prompt": "Analyze AAPL (Apple Inc.). Current price: $185.5. ..."
}
```

**Response β€” validation failure (422)**
```json
{
  "valid": false,
  "error": "...",
  "code": "TICKER_NOT_FOUND",
  "suggestions": ["..."]
}
```

---

### `GET /api/tickers/search?q=APP`

Typeahead autocomplete. Returns up to 8 matching tickers from the local DB.

**Response**
```json
[
  { "ticker": "AAPL", "name": "Apple Inc.", "exchange": "Nasdaq" },
  { "ticker": "APP",  "name": "Applovin Corp", "exchange": "Nasdaq" }
]
```

---

### `GET /api/health`

Service health check. Reports ticker DB status, age, and staleness.

**Response (200)**
```json
{
  "status": "healthy",
  "ticker_db_loaded": true,
  "ticker_count": 13247,
  "ticker_db_age_hours": 3.2,
  "ticker_db_stale": false
}
```

---

### `POST /api/admin/refresh-ticker-db`

Trigger a manual ticker database refresh (downloads fresh SEC data).

**Response**
```json
{
  "status": "success",
  "added": 12,
  "removed": 3,
  "total": 13256
}
```

---

## Local Ticker Database

### How it works

The database is a flat JSON array of uppercase ticker symbols downloaded from the
[SEC EDGAR company tickers endpoint](https://www.sec.gov/files/company_tickers.json).
It covers all SEC-registered companies (~13 000 symbols).

On first startup, `run_startup_checks()` detects a missing or severely outdated file and
triggers a download. Subsequent refreshes run in a background daemon thread every 24 hours
via `ticker_scheduler.py`.

### File locations

| File | Purpose |
|---|---|
| `data/valid_tickers.json` | Canonical ticker set (sorted JSON array) |
| `data/valid_tickers.lock` | `filelock` lock file β€” prevents concurrent writes |

### Thread safety

- **Reads** are protected by `threading.RLock` (`_cache_lock` in `ticker_db.py`).
- **Writes** use a temp file + `os.replace()` atomic rename, so a crash mid-write never
  leaves a corrupt file.
- The `filelock.FileLock` prevents two processes from writing simultaneously (relevant when
  running multiple workers under gunicorn).

### Manually refreshing

```bash
# Via API (running server)
curl -X POST http://localhost:5000/api/admin/refresh-ticker-db

# Via Python
from ticker_db import refresh_ticker_db
result = refresh_ticker_db()
print(result)  # {'added': 5, 'removed': 2, 'total': 13250}
```

### Startup integrity checks (`run_startup_checks`)

| Condition | Action |
|---|---|
| `valid_tickers.json` missing | Synchronous download (blocks startup briefly) |
| File older than 7 days | Background refresh (non-blocking) |
| Fewer than 5 000 tickers loaded | Background re-initialisation |

---

## Configuration

All constants are defined in their respective source files. There are no environment
variables specific to the validation system.

### `ticker_validator.py`

| Constant | Value | Description |
|---|---|---|
| `_MAX_RAW_LENGTH` | `20` | Hard cap on raw input length before sanitisation |
| `lru_cache(maxsize=...)` | `512` | Maximum cached yfinance lookups |
| `_TICKER_RE` | `^[A-Z]{1,5}(\.[A-Z]{1,2})?$` | Valid ticker format regex |

To add a new crypto or reserved word, extend `_CRYPTO_TICKERS` or `_RESERVED_WORDS`
in `ticker_validator.py` and the matching sets in `static/tickerValidation.js`.

### `app.py`

| Constant | Value | Description |
|---|---|---|
| `_RATE_LIMIT_MAX` | `30` | Max requests per IP per window |
| `_RATE_LIMIT_WINDOW` | `60` | Window size in seconds |

### `ticker_scheduler.py`

| Constant | Value | Description |
|---|---|---|
| `_REFRESH_INTERVAL_SECONDS` | `86400` (24 h) | Background DB refresh interval |

### `ticker_db.py`

| Constant | Value | Description |
|---|---|---|
| `_SEC_URL` | SEC EDGAR endpoint | Source for ticker data |
| `_DATA_FILE` | `data/valid_tickers.json` | Local cache path |
| `is_db_stale(threshold_hours=48.0)` | 48 h | Age at which DB is considered stale |

---

## Troubleshooting

### "Validation services are temporarily unavailable"

Both yfinance **and** the local DB failed. This is rare. Check:
- `data/valid_tickers.json` exists and is readable.
- No other process is holding `data/valid_tickers.lock` indefinitely.
- Network connectivity to `sec.gov` and `query1.finance.yahoo.com`.

### Ticker DB not loading on startup

```
startup checks failed: [Errno 13] Permission denied: 'data/valid_tickers.json'
```

Ensure the process user has read/write access to the `data/` directory.

### "BTC is not found" instead of crypto rejection message

The `_CRYPTO_TICKERS` set in `ticker_validator.py` may be out of sync with
`static/tickerValidation.js`. Both sets must be kept identical β€” add/remove
symbols in both files.

### yfinance API returning empty info for real tickers

yfinance occasionally returns `{}` for valid tickers during outages or rate limiting.
When this happens and the ticker is in the local DB, the system degrades gracefully
and returns `valid: true` with a `warning` field. The frontend renders this as a
yellow advisory rather than an error.

### Rate limit hit during automated testing

The rate limiter is per-IP and in-memory. In tests, clear `app._rate_limit_store`
in `setUp`:

```python
from app import _rate_limit_store
_rate_limit_store.clear()
```

### LRU cache serving stale results in tests

Clear the yfinance lookup cache in `setUp`:

```python
from ticker_validator import _cached_api_lookup
_cached_api_lookup.cache_clear()
```

---

## Testing

### Test files

| File | What it tests | Network required? |
|---|---|---|
| `tests/test_validation_edge_cases.py` | Unit tests: sanitisation, format, error codes, graceful degradation | No β€” all mocked |
| `tests/test_e2e_validation.py` | End-to-end: full request→validation→response flow via Flask test client | No — all mocked |

### Running the tests

```bash
# All validation tests
python -m unittest tests/test_validation_edge_cases.py tests/test_e2e_validation.py -v

# Edge cases only
python -m unittest tests/test_validation_edge_cases.py -v

# E2E only
python -m unittest tests/test_e2e_validation.py -v
```

### What each E2E test covers

| Test | Scenario |
|---|---|
| `test_e2e_valid_ticker_full_flow` | AAPL passes all layers; market data and grounded prompt appear in response |
| `test_e2e_invalid_ticker_blocked` | XYZZY blocked at Layer 3; LLM never called |
| `test_e2e_format_error_never_hits_backend` | `123!!!` blocked at Layer 1; yfinance never called |
| `test_e2e_suggestion_is_valid` | Typo "AAPPL" returns suggestions; first suggestion itself passes validation |
| `test_e2e_concurrent_requests` | 10 simultaneous requests via `asyncio.gather`; all succeed without race conditions |
| `test_e2e_rate_limiting` | 35 rapid requests; first 30 return 200, next 5 return 429 |

### Writing new tests

Follow these conventions:

1. **Always clear shared state in `setUp`:**
   ```python
   from app import _rate_limit_store
   from ticker_validator import _cached_api_lookup

   def setUp(self):
       _rate_limit_store.clear()
       _cached_api_lookup.cache_clear()
   ```

2. **Mock at the module boundary, not inside the function:**
   ```python
   # Correct β€” patches what ticker_validator.py imports
   with patch("ticker_validator.yf.Ticker", return_value=mock):

   # Wrong β€” patches yfinance globally
   with patch("yfinance.Ticker", return_value=mock):
   ```

3. **Always mock `ticker_validator.is_known_ticker`** alongside `yf.Ticker`
   to control which layer the test exercises.

4. **For analyze-endpoint tests, mock `app.iris_app`** to avoid spinning up
   the full IRIS pipeline (slow, requires model files):
   ```python
   mock_iris = MagicMock()
   mock_iris.run_one_ticker.return_value = {"ticker": "AAPL", ...}
   with patch("app.iris_app", mock_iris):
       ...
   ```