|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- mistral |
|
|
- lora |
|
|
- merged |
|
|
- gguf |
|
|
- text-generation |
|
|
base_model: mistralai/Mistral-7B-Instruct-v0.2 |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 🌍 Geopolitical Analysis Agent |
|
|
|
|
|
**Advanced strategic forecasting and simulation engine combining RAG, SQLite, ChromaDB, and Claude AI** |
|
|
|
|
|
## What Is This? |
|
|
|
|
|
A production-ready geopolitical analysis system that: |
|
|
- **Answers complex "what-if" questions** about world events |
|
|
- **Models quantitative scenarios** (tank stocks, production rates, timelines) |
|
|
- **Combines structured data + unstructured knowledge** via RAG |
|
|
- **Provides rigorous analysis** like a think tank war games coordinator |
|
|
- **Prepares training data** for fine-tuning specialized models |
|
|
|
|
|
## Key Features |
|
|
|
|
|
### 🧠 Intelligent RAG Architecture |
|
|
- **Vector search** with ChromaDB for semantic retrieval |
|
|
- **Structured database** with SQLite for facts, metrics, inventories |
|
|
- **Hybrid retrieval** combining both sources for comprehensive context |
|
|
|
|
|
### 📊 Quantitative Modeling |
|
|
- Project military inventories over time |
|
|
- Calculate attrition rates and production capacities |
|
|
- Model economic sustainability scenarios |
|
|
- Compare alternative pathways |
|
|
|
|
|
### 💾 Production-Ready Stack |
|
|
- FastAPI backend with async support |
|
|
- SQLAlchemy ORM for database management |
|
|
- Sentence Transformers for embeddings |
|
|
- Claude Sonnet 4 for analysis |
|
|
- Clean HTML/JS frontend |
|
|
|
|
|
### 🎯 Example Queries |
|
|
|
|
|
``` |
|
|
"Where will Russia's tank stock be in 5 years with 15% annual |
|
|
losses and 200 tanks/year production?" |
|
|
|
|
|
"What's China's timeline to semiconductor parity with Taiwan |
|
|
if sanctions continue vs. if they're lifted?" |
|
|
|
|
|
"How long can Iran sustain its proxy network at $60/barrel |
|
|
vs $100/barrel oil prices?" |
|
|
|
|
|
"Model European energy security in 2030 under three scenarios: |
|
|
diversified LNG, accelerated renewables, or partial Russian |
|
|
reconciliation" |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### 1. Install |
|
|
|
|
|
```bash |
|
|
cd geopolitical-agent/backend |
|
|
python3 -m venv venv |
|
|
source venv/bin/activate |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### 2. Configure |
|
|
|
|
|
Create `.env` file: |
|
|
```bash |
|
|
ANTHROPIC_API_KEY=your_key_here |
|
|
``` |
|
|
|
|
|
### 3. Initialize |
|
|
|
|
|
```bash |
|
|
python -c "from models.database import init_db; init_db()" |
|
|
``` |
|
|
|
|
|
### 4. Run |
|
|
|
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
Server starts on http://localhost:8000 |
|
|
|
|
|
### 5. Open Frontend |
|
|
|
|
|
Open `frontend/index.html` in browser or: |
|
|
```bash |
|
|
cd frontend |
|
|
python -m http.server 8080 |
|
|
``` |
|
|
|
|
|
### 6. Load Sample Data |
|
|
|
|
|
Click "Load Sample Data" button in UI or: |
|
|
```bash |
|
|
curl -X POST http://localhost:8000/api/data/load-sample-data |
|
|
``` |
|
|
|
|
|
## Architecture |
|
|
|
|
|
``` |
|
|
┌─────────────────────────────────────────────────┐ |
|
|
│ Frontend (HTML/JS) │ |
|
|
└─────────────────┬───────────────────────────────┘ |
|
|
│ REST API |
|
|
┌─────────────────┴───────────────────────────────┐ |
|
|
│ FastAPI Backend │ |
|
|
│ │ |
|
|
│ ┌──────────────────────────────────────────┐ │ |
|
|
│ │ Analysis Service (Claude + RAG) │ │ |
|
|
│ └─────────┬────────────────────┬───────────┘ │ |
|
|
│ │ │ │ |
|
|
│ ┌────────┴────────┐ ┌──────┴───────────┐ │ |
|
|
│ │ RAG Service │ │ Data Ingestion │ │ |
|
|
│ └────────┬────────┘ └──────┬───────────┘ │ |
|
|
│ │ │ │ |
|
|
│ ┌────────┴────────┐ ┌──────┴───────────┐ │ |
|
|
│ │ ChromaDB │ │ SQLite │ │ |
|
|
│ │ (Vectors) │ │ (Structured) │ │ |
|
|
│ └─────────────────┘ └──────────────────┘ │ |
|
|
└─────────────────────────────────────────────────┘ |
|
|
``` |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
geopolitical-agent/ |
|
|
├── backend/ |
|
|
│ ├── app.py # Main FastAPI app |
|
|
│ ├── config.py # Configuration |
|
|
│ ├── requirements.txt # Dependencies |
|
|
│ ├── models/ |
|
|
│ │ ├── database.py # SQLAlchemy models |
|
|
│ │ └── embeddings.py # ChromaDB manager |
|
|
│ ├── services/ |
|
|
│ │ ├── rag_service.py # RAG orchestration |
|
|
│ │ ├── analysis_service.py # Analysis engine |
|
|
│ │ └── data_ingestion.py # Data loading |
|
|
│ ├── routes/ |
|
|
│ │ ├── query.py # Query endpoints |
|
|
│ │ └── data.py # Data endpoints |
|
|
│ └── data/ |
|
|
│ ├── geopolitical.db # SQLite database |
|
|
│ └── chroma_db/ # Vector store |
|
|
├── frontend/ |
|
|
│ └── index.html # Web interface |
|
|
├── data/ |
|
|
│ ├── sample_data/ # Sample datasets |
|
|
│ └── training/ # Fine-tuning prep |
|
|
└── docs/ |
|
|
├── SETUP.md # Setup guide |
|
|
└── API.md # API documentation |
|
|
``` |
|
|
|
|
|
## Database Schema |
|
|
|
|
|
### Countries |
|
|
- Basic country attributes |
|
|
- GDP, population, military budget |
|
|
- Regional categorization |
|
|
|
|
|
### Military Assets |
|
|
- Equipment inventories (tanks, aircraft, etc.) |
|
|
- Operational rates |
|
|
- Production and attrition rates |
|
|
|
|
|
### Geopolitical Events |
|
|
- Timeline of significant events |
|
|
- Impact scoring |
|
|
- Related countries tracking |
|
|
|
|
|
### Metrics Time Series |
|
|
- Economic indicators |
|
|
- Production statistics |
|
|
- Any quantitative metric over time |
|
|
|
|
|
### Knowledge Sources |
|
|
- Document provenance tracking |
|
|
- Credibility scoring |
|
|
- Source metadata |
|
|
|
|
|
## API Examples |
|
|
|
|
|
### Analyze Query |
|
|
```bash |
|
|
curl -X POST http://localhost:8000/api/query/analyze \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"query": "Your geopolitical question here", |
|
|
"use_cache": true |
|
|
}' |
|
|
``` |
|
|
|
|
|
### Add Knowledge |
|
|
```bash |
|
|
curl -X POST http://localhost:8000/api/data/add-document \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"text": "Your geopolitical knowledge document", |
|
|
"metadata": {"type": "report", "country": "China"} |
|
|
}' |
|
|
``` |
|
|
|
|
|
Full API documentation: `docs/API.md` |
|
|
|
|
|
## Fine-Tuning Preparation |
|
|
|
|
|
### Export Training Data |
|
|
|
|
|
```python |
|
|
from models.database import SessionLocal, AnalysisCache |
|
|
import json |
|
|
|
|
|
db = SessionLocal() |
|
|
analyses = db.query(AnalysisCache).all() |
|
|
|
|
|
training_data = [] |
|
|
for analysis in analyses: |
|
|
training_data.append({ |
|
|
"messages": [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": "You are a geopolitical analysis expert..." |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": analysis.query_text |
|
|
}, |
|
|
{ |
|
|
"role": "assistant", |
|
|
"content": analysis.analysis_result |
|
|
} |
|
|
] |
|
|
}) |
|
|
|
|
|
with open("training_data.jsonl", "w") as f: |
|
|
for item in training_data: |
|
|
f.write(json.dumps(item) + "\n") |
|
|
``` |
|
|
|
|
|
### LoRA Training |
|
|
|
|
|
Use the exported data to fine-tune a LoRA adapter on geopolitical data: |
|
|
|
|
|
1. Export queries/responses from `analysis_cache` table |
|
|
2. Format as JSONL for LoRA training |
|
|
3. Train LoRA adapter on domain-specific data |
|
|
4. Deploy fine-tuned model for specialized analysis |
|
|
|
|
|
## Extending the System |
|
|
|
|
|
### Add New Countries |
|
|
|
|
|
```python |
|
|
from models.database import SessionLocal, Country |
|
|
|
|
|
db = SessionLocal() |
|
|
country = Country( |
|
|
name="Pakistan", |
|
|
iso_code="PAK", |
|
|
region="South Asia", |
|
|
population=235000000, |
|
|
gdp_usd=376000000000, |
|
|
military_budget_usd=11000000000 |
|
|
) |
|
|
db.add(country) |
|
|
db.commit() |
|
|
``` |
|
|
|
|
|
### Add Military Assets |
|
|
|
|
|
```python |
|
|
from models.database import MilitaryAsset |
|
|
|
|
|
asset = MilitaryAsset( |
|
|
country_id=country.id, |
|
|
asset_type="Fighter Aircraft", |
|
|
asset_name="JF-17 Thunder", |
|
|
quantity=150, |
|
|
operational_rate=0.75, |
|
|
production_rate_yearly=25, |
|
|
attrition_rate_yearly=0.05 |
|
|
) |
|
|
db.add(asset) |
|
|
db.commit() |
|
|
``` |
|
|
|
|
|
### Add Knowledge Documents |
|
|
|
|
|
```python |
|
|
from services.data_ingestion import DataIngestionService |
|
|
|
|
|
service = DataIngestionService() |
|
|
service.add_knowledge_document( |
|
|
text="Your geopolitical analysis or fact...", |
|
|
metadata={ |
|
|
"type": "intelligence_assessment", |
|
|
"country": "Iran", |
|
|
"classification": "open_source" |
|
|
} |
|
|
) |
|
|
``` |
|
|
|
|
|
## Configuration |
|
|
|
|
|
Edit `backend/config.py`: |
|
|
|
|
|
```python |
|
|
# Embedding model (smaller = faster, larger = better) |
|
|
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2" |
|
|
|
|
|
# RAG retrieval settings |
|
|
TOP_K_RESULTS = 5 # Number of relevant chunks |
|
|
SIMILARITY_THRESHOLD = 0.7 # Minimum relevance score |
|
|
|
|
|
# Claude settings |
|
|
DEFAULT_MODEL = "claude-sonnet-4-20250514" |
|
|
MAX_TOKENS = 4000 |
|
|
TEMPERATURE = 0.3 # Lower = more analytical |
|
|
``` |
|
|
|
|
|
## Performance Tips |
|
|
|
|
|
1. **Adjust retrieval**: Tune `TOP_K_RESULTS` and `SIMILARITY_THRESHOLD` |
|
|
2. **Enable caching**: Set `use_cache=true` for repeated queries |
|
|
3. **Batch document ingestion**: Use bulk-add for multiple documents |
|
|
4. **Index optimization**: Add SQLite indexes for frequent queries |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
### Strategic Planning |
|
|
- War games scenario modeling |
|
|
- Resource sustainability analysis |
|
|
- Timeline projections |
|
|
|
|
|
### Intelligence Analysis |
|
|
- Capability gap assessments |
|
|
- Economic constraint modeling |
|
|
- Production capacity tracking |
|
|
|
|
|
### Academic Research |
|
|
- Geopolitical trend analysis |
|
|
- Historical pattern recognition |
|
|
- Comparative case studies |
|
|
|
|
|
### Policy Analysis |
|
|
- Sanction impact modeling |
|
|
- Alliance dynamics assessment |
|
|
- Economic leverage analysis |
|
|
|
|
|
## Roadmap |
|
|
|
|
|
- [ ] Real-time data ingestion from news sources |
|
|
- [ ] Multi-agent debate for competing analyses |
|
|
- [ ] Temporal reasoning for historical patterns |
|
|
- [ ] Export to PDF reports |
|
|
- [ ] WebSocket streaming for long analyses |
|
|
- [ ] Named Entity Recognition for auto-tagging |
|
|
- [ ] Graph database for relationship modeling |
|
|
|
|
|
## Contributing |
|
|
|
|
|
Areas for contribution: |
|
|
1. **Data**: Add domain-specific geopolitical datasets |
|
|
2. **Models**: Integrate specialized embedding models |
|
|
3. **Analysis**: Enhance quantitative modeling functions |
|
|
4. **UI**: Improve frontend visualization |
|
|
5. **Documentation**: Add tutorials and examples |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - See LICENSE file |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this system in research: |
|
|
|
|
|
```bibtex |
|
|
@software{geopolitical_analysis_agent, |
|
|
title={Geopolitical Analysis Agent: RAG-based Strategic Forecasting}, |
|
|
author={[Your Name]}, |
|
|
year={2025}, |
|
|
url={https://github.com/yourusername/geopolitical-agent} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Support |
|
|
|
|
|
- Documentation: `docs/` |
|
|
- API Reference: `docs/API.md` |
|
|
- Setup Guide: `docs/SETUP.md` |
|
|
- Issues: GitHub Issues |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Built with: |
|
|
- [FastAPI](https://fastapi.tiangolo.com/) |
|
|
- [ChromaDB](https://www.trychroma.com/) |
|
|
- [Anthropic Claude](https://www.anthropic.com/) |
|
|
- [Sentence Transformers](https://www.sbert.net/) |
|
|
|
|
|
--- |
|
|
|
|
|
**Ready to analyze the world? Start with `python app.py`** 🚀 |
|
|
|