Spaces:
Paused
Paused
Update app.py
Browse files
app.py
CHANGED
|
@@ -426,6 +426,158 @@ def create_interface():
|
|
| 426 |
- E-commerce product pages
|
| 427 |
- Financial data sites (Yahoo Finance, MarketWatch)
|
| 428 |
- Research papers and academic sites
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 429 |
""")
|
| 430 |
|
| 431 |
# Event handlers
|
|
|
|
| 426 |
- E-commerce product pages
|
| 427 |
- Financial data sites (Yahoo Finance, MarketWatch)
|
| 428 |
- Research papers and academic sites
|
| 429 |
+
|
| 430 |
+
## π§ͺ **Test Scenarios**
|
| 431 |
+
|
| 432 |
+
### **1. News & Media Sites**
|
| 433 |
+
```
|
| 434 |
+
URL: https://www.bbc.com/news
|
| 435 |
+
Query: Extract the top 5 news headlines with their summaries and create a table with columns: Headline, Category, Summary
|
| 436 |
+
```
|
| 437 |
+
|
| 438 |
+
```
|
| 439 |
+
URL: https://edition.cnn.com
|
| 440 |
+
Query: Find all breaking news items and organize them by topic/region in a structured format
|
| 441 |
+
```
|
| 442 |
+
|
| 443 |
+
### **2. Financial Data Sites**
|
| 444 |
+
```
|
| 445 |
+
URL: https://finance.yahoo.com/quote/AAPL
|
| 446 |
+
Query: Extract Apple stock information including current price, daily change, market cap, and any financial metrics into a summary table
|
| 447 |
+
```
|
| 448 |
+
|
| 449 |
+
```
|
| 450 |
+
URL: https://www.marketwatch.com/investing/stock/tsla
|
| 451 |
+
Query: Create a table with Tesla's key financial metrics: price, change, volume, market cap, P/E ratio
|
| 452 |
+
```
|
| 453 |
+
|
| 454 |
+
### **3. E-commerce & Product Pages**
|
| 455 |
+
```
|
| 456 |
+
URL: https://www.amazon.com/dp/B08N5WRWNW
|
| 457 |
+
Query: Extract product details including name, price, ratings, key features, and specifications in a structured format
|
| 458 |
+
```
|
| 459 |
+
|
| 460 |
+
```
|
| 461 |
+
URL: https://www.ebay.com/itm/123456789
|
| 462 |
+
Query: Extract item details, price, seller information, and shipping details into a comparison-ready table
|
| 463 |
+
```
|
| 464 |
+
|
| 465 |
+
### **4. Educational & Reference Sites**
|
| 466 |
+
```
|
| 467 |
+
URL: https://en.wikipedia.org/wiki/Artificial_intelligence
|
| 468 |
+
Query: Extract the main definition, history timeline, and applications of AI. Create separate sections for each topic.
|
| 469 |
+
```
|
| 470 |
+
|
| 471 |
+
```
|
| 472 |
+
URL: https://en.wikipedia.org/wiki/List_of_countries_by_population
|
| 473 |
+
Query: Extract the population data table and create a new table showing top 10 most populous countries with their population and growth rate
|
| 474 |
+
```
|
| 475 |
+
|
| 476 |
+
### **5. Government & Official Statistics**
|
| 477 |
+
```
|
| 478 |
+
URL: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports
|
| 479 |
+
Query: Extract the latest COVID-19 statistics and create a summary table with key global figures
|
| 480 |
+
```
|
| 481 |
+
|
| 482 |
+
```
|
| 483 |
+
URL: https://www.census.gov/quickfacts
|
| 484 |
+
Query: Extract key demographic statistics for the United States and organize them into categories: Population, Economy, Geography
|
| 485 |
+
```
|
| 486 |
+
|
| 487 |
+
### **6. Technology & Business News**
|
| 488 |
+
```
|
| 489 |
+
URL: https://techcrunch.com
|
| 490 |
+
Query: Find the latest startup funding news and create a table with: Company Name, Funding Amount, Investors, Industry
|
| 491 |
+
```
|
| 492 |
+
|
| 493 |
+
```
|
| 494 |
+
URL: https://www.reuters.com/technology
|
| 495 |
+
Query: Extract top technology news and summarize each story in 2-3 sentences with key points
|
| 496 |
+
```
|
| 497 |
+
|
| 498 |
+
### **7. Scientific & Research Sites**
|
| 499 |
+
```
|
| 500 |
+
URL: https://www.nature.com/articles
|
| 501 |
+
Query: Extract recent scientific article titles, authors, and abstracts. Create a summary table organized by research field
|
| 502 |
+
```
|
| 503 |
+
|
| 504 |
+
```
|
| 505 |
+
URL: https://pubmed.ncbi.nlm.nih.gov/trending
|
| 506 |
+
Query: Find trending medical research topics and create a list with brief descriptions of each study's findings
|
| 507 |
+
```
|
| 508 |
+
|
| 509 |
+
### **8. Sports & Entertainment**
|
| 510 |
+
```
|
| 511 |
+
URL: https://www.espn.com/nba/standings
|
| 512 |
+
Query: Extract NBA team standings and create a table with: Team, Wins, Losses, Win Percentage, Conference Position
|
| 513 |
+
```
|
| 514 |
+
|
| 515 |
+
```
|
| 516 |
+
URL: https://www.imdb.com/chart/top
|
| 517 |
+
Query: Extract the top 10 movies from IMDb's top 250 list with ratings, year, and brief description
|
| 518 |
+
```
|
| 519 |
+
|
| 520 |
+
### **9. Weather & Environmental Data**
|
| 521 |
+
```
|
| 522 |
+
URL: https://weather.com/weather/today
|
| 523 |
+
Query: Extract current weather conditions and forecast data. Create a summary with temperature, conditions, and weekly outlook
|
| 524 |
+
```
|
| 525 |
+
|
| 526 |
+
### **10. Real Estate & Property**
|
| 527 |
+
```
|
| 528 |
+
URL: https://www.zillow.com/homes/for_sale
|
| 529 |
+
Query: Extract property listings with prices, locations, square footage, and key features into a comparison table
|
| 530 |
+
```
|
| 531 |
+
|
| 532 |
+
## π― **Quick Test Samples (Copy & Paste Ready)**
|
| 533 |
+
|
| 534 |
+
### **Simple Test:**
|
| 535 |
+
```
|
| 536 |
+
URL: https://httpbin.org/html
|
| 537 |
+
Query: Extract all text content and identify the page structure
|
| 538 |
+
```
|
| 539 |
+
|
| 540 |
+
### **Table Extraction Test:**
|
| 541 |
+
```
|
| 542 |
+
URL: https://www.w3schools.com/html/html_tables.asp
|
| 543 |
+
Query: Find all HTML tables on this page and convert them to a structured format with proper headers
|
| 544 |
+
```
|
| 545 |
+
|
| 546 |
+
### **Complex Analysis Test:**
|
| 547 |
+
```
|
| 548 |
+
URL: https://www.sec.gov/edgar/browse/?CIK=320193
|
| 549 |
+
Query: Extract Apple Inc.'s recent SEC filings and create a table with: Filing Date, Document Type, Description
|
| 550 |
+
```
|
| 551 |
+
|
| 552 |
+
### **International Site Test:**
|
| 553 |
+
```
|
| 554 |
+
URL: https://www.bbc.co.uk/weather
|
| 555 |
+
Query: Extract UK weather information and create a regional breakdown of current conditions
|
| 556 |
+
```
|
| 557 |
+
|
| 558 |
+
## π **Testing Tips:**
|
| 559 |
+
|
| 560 |
+
1. **Start Simple**: Begin with basic sites like Wikipedia or news sites
|
| 561 |
+
2. **Test Error Handling**: Try invalid URLs to see error messages
|
| 562 |
+
3. **Check Timeouts**: Use slow-loading sites to test timeout handling
|
| 563 |
+
4. **Verify Tables**: Test sites with different table structures
|
| 564 |
+
5. **Content Variety**: Try different content types (news, data, products)
|
| 565 |
+
|
| 566 |
+
## π¨ **Sites That May Have Issues:**
|
| 567 |
+
- Social media sites (require login)
|
| 568 |
+
- Sites with heavy JavaScript (may have limited content)
|
| 569 |
+
- Sites with aggressive bot protection
|
| 570 |
+
- Password-protected pages
|
| 571 |
+
|
| 572 |
+
## β
**Reliable Test Sites:**
|
| 573 |
+
- Wikipedia (excellent for tables and structured content)
|
| 574 |
+
- BBC News (good for text extraction)
|
| 575 |
+
- Government sites (.gov domains)
|
| 576 |
+
- W3Schools (great for HTML table testing)
|
| 577 |
+
- HttpBin (perfect for testing basic functionality)
|
| 578 |
+
|
| 579 |
+
Start with the simpler tests and gradually move to more complex scenarios to fully evaluate your tool's capabilities!
|
| 580 |
+
|
| 581 |
""")
|
| 582 |
|
| 583 |
# Event handlers
|