Major upgrade: Transform assistant into specialized econometric research showcase
Browse filesThis commit comprehensively improves the AI assistant to properly represent David Van Dijcke as an econometrician on the 2025-26 job market, emphasizing his methodological contributions to functional data analysis and optimal transport.
Key improvements:
- Enhanced econometric focus with detailed paper summaries (R3D, FDR, DISCO, RTO)
- Professional prompts emphasizing methodological contributions
- Improved greetings that immediately identify David as an econometrician
- Better document loading with more content for job market paper
- Comprehensive deployment documentation and testing framework
- Security improvements (proper .env handling, .gitignore)
Technical enhancements:
- Optimized Gemini 2.0/1.5 Flash integration for accurate responses
- Enhanced context about functional data analysis and optimal transport
- Distribution-valued treatment effects and geometric measure theory focus
- Policy applications using big data emphasized alongside theoretical work
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- .env.example +2 -2
- .gitignore +35 -0
- DEPLOYMENT_GUIDE.md +133 -0
- DEPLOYMENT_IMPROVED.md +55 -0
- README.md +7 -6
- app.py +167 -64
- app_improved.py +321 -0
- requirements_improved.txt +8 -0
- requirements_simple.txt +8 -0
- test_assistant.py +61 -0
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
# Google AI API Key (optional)
|
| 2 |
# Get your API key from https://aistudio.google.com/app/apikey
|
| 3 |
# If not provided, the app will use a limited mode with lower quality
|
| 4 |
-
GOOGLE_API_KEY=your_api_key_here
|
|
|
|
| 1 |
+
# Google AI API Key (optional but recommended)
|
| 2 |
# Get your API key from https://aistudio.google.com/app/apikey
|
| 3 |
# If not provided, the app will use a limited mode with lower quality
|
| 4 |
+
# GOOGLE_API_KEY=your_api_key_here
|
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Environment variables
|
| 2 |
+
.env
|
| 3 |
+
.env.local
|
| 4 |
+
.env.*.local
|
| 5 |
+
|
| 6 |
+
# Cache
|
| 7 |
+
vector_store_cache/
|
| 8 |
+
__pycache__/
|
| 9 |
+
*.pyc
|
| 10 |
+
*.pyo
|
| 11 |
+
*.pyd
|
| 12 |
+
.Python
|
| 13 |
+
|
| 14 |
+
# IDE
|
| 15 |
+
.vscode/
|
| 16 |
+
.idea/
|
| 17 |
+
*.swp
|
| 18 |
+
*.swo
|
| 19 |
+
*~
|
| 20 |
+
|
| 21 |
+
# OS
|
| 22 |
+
.DS_Store
|
| 23 |
+
Thumbs.db
|
| 24 |
+
|
| 25 |
+
# Logs
|
| 26 |
+
*.log
|
| 27 |
+
|
| 28 |
+
# Testing
|
| 29 |
+
.pytest_cache/
|
| 30 |
+
.coverage
|
| 31 |
+
htmlcov/
|
| 32 |
+
|
| 33 |
+
# Gradio
|
| 34 |
+
gradio_cached_examples/
|
| 35 |
+
flagged/
|
|
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deployment Guide for David Van Dijcke's Econometric Research Assistant
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
This assistant specializes in David Van Dijcke's econometric research, emphasizing his contributions to functional data analysis, optimal transport, and causal inference methods. The assistant is optimized for the 2025-26 economics job market.
|
| 6 |
+
|
| 7 |
+
## Key Features
|
| 8 |
+
|
| 9 |
+
- **Econometric Focus**: Emphasizes David's methodological contributions
|
| 10 |
+
- **Job Market Ready**: Highlights R3D paper and econometric innovations
|
| 11 |
+
- **Technical Accuracy**: Detailed information about functional data analysis and optimal transport
|
| 12 |
+
- **Policy Applications**: Shows how methods apply to real-world big data problems
|
| 13 |
+
|
| 14 |
+
## Deployment Options
|
| 15 |
+
|
| 16 |
+
### Option 1: Hugging Face Spaces (Recommended)
|
| 17 |
+
|
| 18 |
+
1. **Create a new Space**:
|
| 19 |
+
- Go to https://huggingface.co/new-space
|
| 20 |
+
- Choose Gradio SDK
|
| 21 |
+
- Set to Public
|
| 22 |
+
|
| 23 |
+
2. **Upload files**:
|
| 24 |
+
- `app.py` (the main application)
|
| 25 |
+
- `requirements.txt`
|
| 26 |
+
- `documents/` folder with PDFs
|
| 27 |
+
|
| 28 |
+
3. **Add Google API Key** (for best performance):
|
| 29 |
+
- Go to Space Settings > Repository secrets
|
| 30 |
+
- Add secret: `GOOGLE_API_KEY`
|
| 31 |
+
- Get key from: https://aistudio.google.com/app/apikey
|
| 32 |
+
|
| 33 |
+
4. **The Space will auto-deploy**
|
| 34 |
+
|
| 35 |
+
### Option 2: Local Development
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
# Clone the repository
|
| 39 |
+
git clone https://huggingface.co/spaces/dvdijcke/david-research-assistant
|
| 40 |
+
|
| 41 |
+
# Install dependencies
|
| 42 |
+
pip install -r requirements.txt
|
| 43 |
+
|
| 44 |
+
# Set up environment
|
| 45 |
+
echo "GOOGLE_API_KEY=your_key_here" > .env
|
| 46 |
+
|
| 47 |
+
# Run the app
|
| 48 |
+
python app.py
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Performance Optimization
|
| 52 |
+
|
| 53 |
+
### Using Google Gemini (Recommended)
|
| 54 |
+
- **Model**: Gemini 2.0 Flash (falls back to 1.5 Flash)
|
| 55 |
+
- **Cost**: ~$0.001-0.005 per conversation
|
| 56 |
+
- **Quality**: High accuracy, understands technical econometric concepts
|
| 57 |
+
- **Setup**: Just add GOOGLE_API_KEY to environment
|
| 58 |
+
|
| 59 |
+
### Without API Key
|
| 60 |
+
- Falls back to limited mode
|
| 61 |
+
- Lower quality responses
|
| 62 |
+
- Still functional but less accurate
|
| 63 |
+
|
| 64 |
+
## Content Updates
|
| 65 |
+
|
| 66 |
+
### To update research information:
|
| 67 |
+
|
| 68 |
+
1. **Edit `app.py`** and modify the `research_info` section:
|
| 69 |
+
- Update paper titles and descriptions
|
| 70 |
+
- Add new methodological contributions
|
| 71 |
+
- Update job market status
|
| 72 |
+
|
| 73 |
+
2. **Update paper summaries** in the `paper_summaries` section:
|
| 74 |
+
- Add new papers
|
| 75 |
+
- Update findings
|
| 76 |
+
- Emphasize econometric innovations
|
| 77 |
+
|
| 78 |
+
3. **Add new PDFs** to `documents/` folder:
|
| 79 |
+
- Job market paper should be prominently featured
|
| 80 |
+
- Include recent working papers
|
| 81 |
+
- CV should be up to date
|
| 82 |
+
|
| 83 |
+
## Testing
|
| 84 |
+
|
| 85 |
+
Run the test script to verify functionality:
|
| 86 |
+
|
| 87 |
+
```bash
|
| 88 |
+
python test_assistant.py
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
Key things to verify:
|
| 92 |
+
- Correctly identifies David as an econometrician
|
| 93 |
+
- Accurately describes R3D and other papers
|
| 94 |
+
- Emphasizes methodological contributions
|
| 95 |
+
- Links theory to applications
|
| 96 |
+
|
| 97 |
+
## Common Issues
|
| 98 |
+
|
| 99 |
+
1. **"No module named 'langchain'"**
|
| 100 |
+
- Solution: `pip install -r requirements.txt`
|
| 101 |
+
|
| 102 |
+
2. **Slow responses**
|
| 103 |
+
- Add Google API key for faster Gemini responses
|
| 104 |
+
- Check if vector store cache exists
|
| 105 |
+
|
| 106 |
+
3. **Incorrect information**
|
| 107 |
+
- Update the context in `app.py`
|
| 108 |
+
- Ensure PDFs are loading correctly
|
| 109 |
+
- Check paper summaries are accurate
|
| 110 |
+
|
| 111 |
+
## Customization
|
| 112 |
+
|
| 113 |
+
### Adjusting the tone:
|
| 114 |
+
Edit the prompt in `generate_response()` to adjust formality and focus.
|
| 115 |
+
|
| 116 |
+
### Adding new examples:
|
| 117 |
+
Update the `examples` list in `create_gradio_interface()`.
|
| 118 |
+
|
| 119 |
+
### Changing the model:
|
| 120 |
+
Modify the `genai.GenerativeModel()` initialization to use different models.
|
| 121 |
+
|
| 122 |
+
## Monitoring
|
| 123 |
+
|
| 124 |
+
- Check Space logs for errors
|
| 125 |
+
- Monitor API usage in Google AI Studio
|
| 126 |
+
- Test with various econometric questions regularly
|
| 127 |
+
|
| 128 |
+
## Support
|
| 129 |
+
|
| 130 |
+
For issues or updates:
|
| 131 |
+
- Check Hugging Face Space logs
|
| 132 |
+
- Verify API key is correctly set
|
| 133 |
+
- Ensure all PDFs are in the documents folder
|
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deploying the Improved Research Assistant
|
| 2 |
+
|
| 3 |
+
## Quick Start
|
| 4 |
+
|
| 5 |
+
1. **Set up environment variables**:
|
| 6 |
+
Create a `.env` file with your Hugging Face token:
|
| 7 |
+
```
|
| 8 |
+
HUGGINGFACE_TOKEN=your_token_here
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
2. **Install dependencies**:
|
| 12 |
+
```bash
|
| 13 |
+
pip install -r requirements_improved.txt
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
3. **Run the improved app**:
|
| 17 |
+
```bash
|
| 18 |
+
python app_improved.py
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
## Key Improvements
|
| 22 |
+
|
| 23 |
+
### 1. **Performance Enhancements**
|
| 24 |
+
- Removed heavy BART classifier that was slowing down responses
|
| 25 |
+
- Added vector store caching to avoid reloading documents
|
| 26 |
+
- Using Hugging Face Inference API for faster text generation
|
| 27 |
+
- Reduced PDF processing to only essential pages
|
| 28 |
+
|
| 29 |
+
### 2. **Better Conversation Flow**
|
| 30 |
+
- Now responds warmly to greetings like "hello"
|
| 31 |
+
- More conversational and friendly tone
|
| 32 |
+
- Doesn't restrict topics unnecessarily
|
| 33 |
+
- Provides helpful suggestions for what users can ask
|
| 34 |
+
|
| 35 |
+
### 3. **Technical Optimizations**
|
| 36 |
+
- Smaller chunk sizes (500 chars) for faster retrieval
|
| 37 |
+
- Caching mechanism for vector store
|
| 38 |
+
- Streaming responses for better user experience
|
| 39 |
+
- Removed unnecessary dependencies (torch, transformers)
|
| 40 |
+
|
| 41 |
+
## Deployment on Hugging Face Spaces
|
| 42 |
+
|
| 43 |
+
1. Update your `app.py` file with the contents of `app_improved.py`
|
| 44 |
+
2. Update your `requirements.txt` with the contents of `requirements_improved.txt`
|
| 45 |
+
3. Add the `HUGGINGFACE_TOKEN` secret in your Space settings
|
| 46 |
+
4. The app will automatically rebuild and deploy
|
| 47 |
+
|
| 48 |
+
## Testing Locally
|
| 49 |
+
|
| 50 |
+
Try these test messages to see the improvements:
|
| 51 |
+
- "Hello!" - Should get a warm, helpful response
|
| 52 |
+
- "Tell me about David" - Should provide a comprehensive overview
|
| 53 |
+
- "What's functional difference-in-differences?" - Should give technical details
|
| 54 |
+
|
| 55 |
+
The assistant should now be much faster and more conversational!
|
|
@@ -10,16 +10,17 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# David Van Dijcke - Research Assistant
|
| 14 |
|
| 15 |
-
An AI-powered assistant
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
-
|
| 22 |
-
-
|
|
|
|
| 23 |
|
| 24 |
## Getting the Best Performance
|
| 25 |
|
|
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# David Van Dijcke - Econometric Research Assistant
|
| 14 |
|
| 15 |
+
An AI-powered assistant specializing in David Van Dijcke's econometric research. David is an econometrician on the 2025-26 job market who develops novel methods for functional and high-dimensional data.
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
+
- **Econometric Methods Focus**: Detailed information about David's methodological contributions
|
| 20 |
+
- **Job Market Paper (R3D)**: Regression Discontinuity Design with Distribution-Valued Outcomes
|
| 21 |
+
- **Technical Expertise**: Functional data analysis, optimal transport, and geometric measure theory
|
| 22 |
+
- **Policy Applications**: How David applies econometric tools to answer questions with big data
|
| 23 |
+
- **Research Portfolio**: Information on FDR, DISCO, RTO, and other papers
|
| 24 |
|
| 25 |
## Getting the Best Performance
|
| 26 |
|
|
@@ -81,21 +81,49 @@ class ImprovedResearchAssistant:
|
|
| 81 |
|
| 82 |
# Enhanced research information
|
| 83 |
research_info = """
|
| 84 |
-
David Van Dijcke is a PhD
|
| 85 |
-
He is on the job market for the 2025-26 academic year.
|
| 86 |
|
| 87 |
-
RESEARCH
|
| 88 |
-
David
|
| 89 |
-
data analysis
|
| 90 |
-
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
- Econometric methods and theory
|
| 94 |
-
- Causal inference with
|
| 95 |
-
-
|
| 96 |
-
-
|
| 97 |
-
- Labor market
|
| 98 |
-
-
|
| 99 |
|
| 100 |
CURRENT POSITIONS:
|
| 101 |
- Rackham Graduate School Predoctoral Fellow at the University of Michigan (2024-25)
|
|
@@ -104,19 +132,12 @@ class ImprovedResearchAssistant:
|
|
| 104 |
EDUCATION:
|
| 105 |
- PhD in Economics, University of Michigan (expected 2026)
|
| 106 |
- MA in Economics, University of Michigan
|
| 107 |
-
-
|
| 108 |
-
|
| 109 |
-
RESEARCH PAPERS:
|
| 110 |
-
1. "Revenue and Production Functions" - Work on firm-level analysis
|
| 111 |
-
2. "Return to Office" - Research on workplace policies post-COVID
|
| 112 |
-
3. "Unmasking Partisanship" - Analysis of political behavior during COVID-19
|
| 113 |
-
4. Work on public response to government alerts during the Russian invasion of Ukraine
|
| 114 |
-
5. Research on econometric methods combining causal inference with functional data analysis
|
| 115 |
|
| 116 |
PERSONALITY:
|
| 117 |
-
David
|
| 118 |
-
|
| 119 |
-
|
| 120 |
|
| 121 |
CONTACT:
|
| 122 |
Email: dvdijcke@umich.edu
|
|
@@ -130,25 +151,43 @@ class ImprovedResearchAssistant:
|
|
| 130 |
|
| 131 |
# Add information about his background
|
| 132 |
background_info = """
|
| 133 |
-
|
| 134 |
-
David
|
| 135 |
-
|
| 136 |
-
|
|
|
|
|
|
|
| 137 |
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
TECHNICAL SKILLS:
|
| 143 |
-
- Advanced econometric theory
|
| 144 |
-
- Programming
|
| 145 |
-
-
|
| 146 |
-
-
|
| 147 |
-
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
"""
|
| 153 |
|
| 154 |
documents.append(Document(
|
|
@@ -156,6 +195,57 @@ class ImprovedResearchAssistant:
|
|
| 156 |
metadata={"source": "background_info", "type": "personal"}
|
| 157 |
))
|
| 158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
# Load PDFs efficiently - only key documents
|
| 160 |
key_pdfs = [
|
| 161 |
"CV_DavidVanDijcke.pdf",
|
|
@@ -181,8 +271,11 @@ class ImprovedResearchAssistant:
|
|
| 181 |
try:
|
| 182 |
loader = PyPDFLoader(filepath)
|
| 183 |
pdf_docs = loader.load()
|
| 184 |
-
#
|
| 185 |
-
|
|
|
|
|
|
|
|
|
|
| 186 |
logger.info(f"Loaded {filename}")
|
| 187 |
except Exception as e:
|
| 188 |
logger.error(f"Error loading {filename}: {e}")
|
|
@@ -220,22 +313,30 @@ class ImprovedResearchAssistant:
|
|
| 220 |
|
| 221 |
if self.use_gemini:
|
| 222 |
# Create prompt for Gemini
|
| 223 |
-
prompt = f"""You are
|
| 224 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
|
| 226 |
-
|
| 227 |
-
-
|
| 228 |
-
-
|
| 229 |
-
- Be
|
| 230 |
-
-
|
| 231 |
-
- If asked about
|
|
|
|
| 232 |
|
| 233 |
Context about David Van Dijcke:
|
| 234 |
{context}
|
| 235 |
|
| 236 |
User's question: {question}
|
| 237 |
|
| 238 |
-
|
| 239 |
|
| 240 |
try:
|
| 241 |
# Configure generation parameters for accuracy
|
|
@@ -263,9 +364,9 @@ Please provide an accurate response based only on the context provided. If the c
|
|
| 263 |
# Handle greetings and casual conversation
|
| 264 |
if self.is_greeting_or_casual(message):
|
| 265 |
greeting_responses = [
|
| 266 |
-
"Hello! I'm here to help you learn about David Van Dijcke
|
| 267 |
-
"Hi
|
| 268 |
-
"Hello!
|
| 269 |
]
|
| 270 |
|
| 271 |
# Use message hash to select consistent greeting
|
|
@@ -283,9 +384,9 @@ Please provide an accurate response based only on the context provided. If the c
|
|
| 283 |
response = self.generate_response(message, context)
|
| 284 |
|
| 285 |
# Add source information if specific papers were referenced
|
| 286 |
-
paper_keywords = ["
|
| 287 |
if any(keyword in message.lower() for keyword in paper_keywords):
|
| 288 |
-
response += "\n\n*For more details, you can find David's papers on his website.*"
|
| 289 |
|
| 290 |
return response
|
| 291 |
|
|
@@ -320,19 +421,21 @@ def create_gradio_interface():
|
|
| 320 |
# Create the interface with better examples
|
| 321 |
demo = gr.ChatInterface(
|
| 322 |
fn=chat_function,
|
| 323 |
-
title="
|
| 324 |
description=(
|
| 325 |
-
"
|
| 326 |
-
"
|
|
|
|
|
|
|
| 327 |
),
|
| 328 |
examples=[
|
| 329 |
-
"Hello! Who is David?",
|
| 330 |
-
"What
|
| 331 |
-
"Tell me about
|
| 332 |
-
"
|
| 333 |
-
"
|
| 334 |
"Is David on the job market?",
|
| 335 |
-
"What
|
| 336 |
],
|
| 337 |
theme=gr.themes.Soft(
|
| 338 |
primary_hue="blue",
|
|
|
|
| 81 |
|
| 82 |
# Enhanced research information
|
| 83 |
research_info = """
|
| 84 |
+
David Van Dijcke is a PhD candidate in Economics at the University of Michigan, Ann Arbor.
|
| 85 |
+
He is on the job market for the 2025-26 academic year as an ECONOMETRICIAN.
|
| 86 |
|
| 87 |
+
RESEARCH PROFILE:
|
| 88 |
+
David develops cutting-edge econometric methods for functional and high-dimensional data,
|
| 89 |
+
combining tools from functional data analysis, optimal transport, and geometric measure theory.
|
| 90 |
+
He applies these methods to answer important policy questions using big data.
|
| 91 |
|
| 92 |
+
CORE ECONOMETRIC CONTRIBUTIONS:
|
| 93 |
+
1. **R3D: Regression Discontinuity Design with Distribution-Valued Outcomes** (JOB MARKET PAPER)
|
| 94 |
+
- Extends RDD to settings where outcomes are entire distributions
|
| 95 |
+
- Introduces local average quantile treatment effects
|
| 96 |
+
- Applies to income distribution effects of gubernatorial elections
|
| 97 |
+
|
| 98 |
+
2. **Free Discontinuity Regression (FDR)**
|
| 99 |
+
- Non-parametric method to detect and estimate multivariate discontinuities
|
| 100 |
+
- Based on convex relaxation of the Mumford-Shah functional
|
| 101 |
+
- Applied to estimate economic costs of internet shutdowns in India
|
| 102 |
+
|
| 103 |
+
3. **Distributional Synthetic Controls (DISCO)**
|
| 104 |
+
- Software implementation for studying distributional policy effects
|
| 105 |
+
- Uses optimal transport to match entire distributions
|
| 106 |
+
- Provides both quantile and CDF-based approaches
|
| 107 |
+
|
| 108 |
+
4. **Return to Office and the Tenure Distribution**
|
| 109 |
+
- Applies distributional synthetic controls to 260 million resumes
|
| 110 |
+
- Studies effects of RTO mandates on employee tenure distributions
|
| 111 |
+
- Develops bootstrapped uniform confidence intervals
|
| 112 |
+
|
| 113 |
+
KEY TECHNICAL INNOVATIONS:
|
| 114 |
+
- Functional data analysis: Working with distribution-valued outcomes
|
| 115 |
+
- Optimal transport theory: Matching and comparing distributions
|
| 116 |
+
- Geometric measure theory: Detecting discontinuities in multivariate settings
|
| 117 |
+
- Asymptotic theory: Establishing inference for novel estimators
|
| 118 |
+
- Big data applications: Scalable methods for massive datasets
|
| 119 |
+
|
| 120 |
+
RESEARCH AREAS:
|
| 121 |
- Econometric methods and theory
|
| 122 |
+
- Causal inference with functional data
|
| 123 |
+
- Distribution-valued treatment effects
|
| 124 |
+
- Spatial and geographic discontinuities
|
| 125 |
+
- Labor market dynamics and firm policies
|
| 126 |
+
- Economic impacts of digital infrastructure
|
| 127 |
|
| 128 |
CURRENT POSITIONS:
|
| 129 |
- Rackham Graduate School Predoctoral Fellow at the University of Michigan (2024-25)
|
|
|
|
| 132 |
EDUCATION:
|
| 133 |
- PhD in Economics, University of Michigan (expected 2026)
|
| 134 |
- MA in Economics, University of Michigan
|
| 135 |
+
- BA in Theatre (demonstrating communication skills and creativity)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
|
| 137 |
PERSONALITY:
|
| 138 |
+
David combines rigorous technical expertise with strong communication skills.
|
| 139 |
+
His theatre background helps him present complex econometric concepts clearly.
|
| 140 |
+
He values both theoretical rigor and practical policy relevance.
|
| 141 |
|
| 142 |
CONTACT:
|
| 143 |
Email: dvdijcke@umich.edu
|
|
|
|
| 151 |
|
| 152 |
# Add information about his background
|
| 153 |
background_info = """
|
| 154 |
+
ECONOMETRIC EXPERTISE:
|
| 155 |
+
David specializes in developing econometric methods at the intersection of:
|
| 156 |
+
- Functional data analysis (working with curve and distribution-valued data)
|
| 157 |
+
- Optimal transport theory (comparing and matching distributions)
|
| 158 |
+
- Geometric measure theory (detecting discontinuities and boundaries)
|
| 159 |
+
- Causal inference (identifying treatment effects)
|
| 160 |
|
| 161 |
+
METHODOLOGICAL CONTRIBUTIONS:
|
| 162 |
+
1. **Distribution-valued treatment effects**: Extending causal inference beyond scalar outcomes
|
| 163 |
+
2. **Discontinuity detection**: Finding unknown boundaries in multivariate settings
|
| 164 |
+
3. **Functional regression**: Adapting RDD and synthetic controls to functional data
|
| 165 |
+
4. **Big data econometrics**: Scalable methods for massive datasets
|
| 166 |
+
|
| 167 |
+
APPLIED WORK:
|
| 168 |
+
David applies his econometric tools to important policy questions:
|
| 169 |
+
- Labor market dynamics (return-to-office policies, tenure distributions)
|
| 170 |
+
- Digital infrastructure (economic costs of internet shutdowns)
|
| 171 |
+
- Political economy (distributional effects of elections)
|
| 172 |
+
- Crisis responses (COVID-19, Ukraine conflict)
|
| 173 |
|
| 174 |
TECHNICAL SKILLS:
|
| 175 |
+
- Advanced econometric theory and asymptotics
|
| 176 |
+
- Programming: R, Python, Stata, Julia
|
| 177 |
+
- Functional data analysis packages
|
| 178 |
+
- Optimal transport algorithms
|
| 179 |
+
- High-performance computing for big data
|
| 180 |
+
|
| 181 |
+
TEACHING & COMMUNICATION:
|
| 182 |
+
- Makes complex econometric concepts accessible
|
| 183 |
+
- Theatre background enhances presentation skills
|
| 184 |
+
- Experience teaching econometrics and statistics
|
| 185 |
+
- Clear technical writing for top journals
|
| 186 |
+
|
| 187 |
+
RESEARCH PHILOSOPHY:
|
| 188 |
+
David believes in developing rigorous econometric theory that solves real-world problems.
|
| 189 |
+
He combines mathematical sophistication with practical relevance, ensuring his methods
|
| 190 |
+
are both theoretically sound and empirically useful.
|
| 191 |
"""
|
| 192 |
|
| 193 |
documents.append(Document(
|
|
|
|
| 195 |
metadata={"source": "background_info", "type": "personal"}
|
| 196 |
))
|
| 197 |
|
| 198 |
+
# Add detailed paper summaries
|
| 199 |
+
paper_summaries = """
|
| 200 |
+
DETAILED PAPER SUMMARIES:
|
| 201 |
+
|
| 202 |
+
1. **R3D: Regression Discontinuity Design with Distribution-Valued Outcomes** (JOB MARKET PAPER)
|
| 203 |
+
- Problem: Standard RDD only estimates effects on mean outcomes, missing distributional impacts
|
| 204 |
+
- Innovation: Extends RDD to estimate effects on entire outcome distributions
|
| 205 |
+
- Method: Local polynomial regression on random quantiles with uniform confidence bands
|
| 206 |
+
- Theory: Establishes local average quantile treatment effects (LAQTE)
|
| 207 |
+
- Application: Studies how gubernatorial party affects state income distributions
|
| 208 |
+
- Finding: Democratic governors reduce income inequality, especially at lower quantiles
|
| 209 |
+
|
| 210 |
+
2. **Free Discontinuity Regression (FDR)**
|
| 211 |
+
- Problem: Unknown location of multivariate discontinuities (e.g., geographic borders)
|
| 212 |
+
- Innovation: Detects and estimates discontinuities without prior knowledge of location
|
| 213 |
+
- Method: Convex relaxation of Mumford-Shah functional from image processing
|
| 214 |
+
- Theory: Proves identification and convergence of the segmented regression surface
|
| 215 |
+
- Application: Internet shutdowns in India using 48 billion mobile transactions
|
| 216 |
+
- Finding: Shutdowns reduce economic activity by 25-35% in affected regions
|
| 217 |
+
|
| 218 |
+
3. **Distributional Synthetic Controls (DISCO)**
|
| 219 |
+
- Problem: Standard synthetic controls only match on means, not distributions
|
| 220 |
+
- Innovation: Constructs synthetic distributions using optimal transport
|
| 221 |
+
- Method: Matches entire CDFs or quantile functions across units
|
| 222 |
+
- Software: R package with quantile and CDF approaches, bootstrap inference
|
| 223 |
+
- Features: Multiple aggregation schemes, permutation tests, visualization tools
|
| 224 |
+
|
| 225 |
+
4. **Return to Office and the Tenure Distribution**
|
| 226 |
+
- Problem: How do RTO mandates affect employee tenure beyond just averages?
|
| 227 |
+
- Innovation: First application of distributional synthetic controls to labor markets
|
| 228 |
+
- Method: Analyzes 260 million resumes to construct tenure distributions
|
| 229 |
+
- Theory: Develops bootstrapped uniform confidence intervals for DiSCo
|
| 230 |
+
- Finding: RTO mandates significantly alter tenure distributions at tech firms
|
| 231 |
+
|
| 232 |
+
5. **Revenue and Production Functions**
|
| 233 |
+
- Focus: Functional data analysis in firm-level production economics
|
| 234 |
+
- Innovation: Treats production processes as functional objects
|
| 235 |
+
- Method: Applies functional regression to production function estimation
|
| 236 |
+
|
| 237 |
+
COMMON THEMES:
|
| 238 |
+
- Moving beyond scalar outcomes to functional/distributional outcomes
|
| 239 |
+
- Rigorous asymptotic theory for novel estimators
|
| 240 |
+
- Large-scale empirical applications with big data
|
| 241 |
+
- Bridging pure econometric theory with policy relevance
|
| 242 |
+
"""
|
| 243 |
+
|
| 244 |
+
documents.append(Document(
|
| 245 |
+
page_content=paper_summaries,
|
| 246 |
+
metadata={"source": "paper_summaries", "type": "research"}
|
| 247 |
+
))
|
| 248 |
+
|
| 249 |
# Load PDFs efficiently - only key documents
|
| 250 |
key_pdfs = [
|
| 251 |
"CV_DavidVanDijcke.pdf",
|
|
|
|
| 271 |
try:
|
| 272 |
loader = PyPDFLoader(filepath)
|
| 273 |
pdf_docs = loader.load()
|
| 274 |
+
# For job market paper, load more pages
|
| 275 |
+
if "r3d" in filename.lower():
|
| 276 |
+
documents.extend(pdf_docs[:10]) # Abstract, intro, and key sections
|
| 277 |
+
else:
|
| 278 |
+
documents.extend(pdf_docs[:5]) # First 5 pages for other papers
|
| 279 |
logger.info(f"Loaded {filename}")
|
| 280 |
except Exception as e:
|
| 281 |
logger.error(f"Error loading {filename}: {e}")
|
|
|
|
| 313 |
|
| 314 |
if self.use_gemini:
|
| 315 |
# Create prompt for Gemini
|
| 316 |
+
prompt = f"""You are an expert AI assistant for David Van Dijcke's academic website, specializing in his ECONOMETRIC research.
|
| 317 |
+
David is an econometrician on the 2025-26 job market who develops novel methods for functional and high-dimensional data.
|
| 318 |
+
|
| 319 |
+
Key points to emphasize:
|
| 320 |
+
- David is an ECONOMETRICIAN who develops new statistical methods
|
| 321 |
+
- His job market paper is R3D (Regression Discontinuity Design with Distribution-Valued Outcomes)
|
| 322 |
+
- He combines functional data analysis, optimal transport, and geometric measure theory
|
| 323 |
+
- He applies these methods to answer policy questions with big data
|
| 324 |
+
- His work extends beyond scalar outcomes to distribution-valued outcomes
|
| 325 |
|
| 326 |
+
Instructions:
|
| 327 |
+
- Emphasize his econometric contributions and methodological innovations
|
| 328 |
+
- Highlight how his methods combine theory with policy applications
|
| 329 |
+
- Be precise about technical details when discussing his papers
|
| 330 |
+
- Make clear he develops econometric TOOLS, not just applications
|
| 331 |
+
- If asked about specific papers, provide technical details from the context
|
| 332 |
+
- Be friendly but professional, as befits an academic website
|
| 333 |
|
| 334 |
Context about David Van Dijcke:
|
| 335 |
{context}
|
| 336 |
|
| 337 |
User's question: {question}
|
| 338 |
|
| 339 |
+
Provide an accurate, professional response that emphasizes David's econometric expertise and contributions to the field."""
|
| 340 |
|
| 341 |
try:
|
| 342 |
# Configure generation parameters for accuracy
|
|
|
|
| 364 |
# Handle greetings and casual conversation
|
| 365 |
if self.is_greeting_or_casual(message):
|
| 366 |
greeting_responses = [
|
| 367 |
+
"Hello! I'm here to help you learn about David Van Dijcke, an econometrician on the 2025-26 job market. He develops cutting-edge methods for functional and high-dimensional data. What would you like to know about his research?",
|
| 368 |
+
"Hi! Welcome to David Van Dijcke's research assistant. David is an econometrician who combines functional data analysis, optimal transport, and geometric measure theory to develop new causal inference methods. How can I help you learn about his work?",
|
| 369 |
+
"Hello! I can tell you about David Van Dijcke's econometric research, including his job market paper on distribution-valued treatment effects (R3D) and his other methodological contributions. What aspect of his work interests you?",
|
| 370 |
]
|
| 371 |
|
| 372 |
# Use message hash to select consistent greeting
|
|
|
|
| 384 |
response = self.generate_response(message, context)
|
| 385 |
|
| 386 |
# Add source information if specific papers were referenced
|
| 387 |
+
paper_keywords = ["r3d", "regression discontinuity", "free discontinuity", "fdr", "disco", "distributional synthetic", "return to office", "rto", "revenue", "production function", "unmasking", "ukraine"]
|
| 388 |
if any(keyword in message.lower() for keyword in paper_keywords):
|
| 389 |
+
response += "\n\n*For more details, you can find David's papers on his website at https://dvandijcke.github.io*"
|
| 390 |
|
| 391 |
return response
|
| 392 |
|
|
|
|
| 421 |
# Create the interface with better examples
|
| 422 |
demo = gr.ChatInterface(
|
| 423 |
fn=chat_function,
|
| 424 |
+
title="David Van Dijcke - Econometrician | Job Market 2025-26",
|
| 425 |
description=(
|
| 426 |
+
"Welcome! I'm an AI assistant specializing in David Van Dijcke's econometric research. "
|
| 427 |
+
"David develops novel methods for functional and high-dimensional data, combining functional data analysis, "
|
| 428 |
+
"optimal transport, and geometric measure theory. Ask me about his job market paper (R3D), "
|
| 429 |
+
"his econometric innovations, or how he applies these methods to policy questions with big data."
|
| 430 |
),
|
| 431 |
examples=[
|
| 432 |
+
"Hello! Who is David Van Dijcke?",
|
| 433 |
+
"What econometric methods has David developed?",
|
| 434 |
+
"Tell me about R3D (his job market paper)",
|
| 435 |
+
"How does David use optimal transport in econometrics?",
|
| 436 |
+
"What is functional data analysis in David's work?",
|
| 437 |
"Is David on the job market?",
|
| 438 |
+
"What are distribution-valued treatment effects?"
|
| 439 |
],
|
| 440 |
theme=gr.themes.Soft(
|
| 441 |
primary_hue="blue",
|
|
@@ -0,0 +1,321 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import gradio as gr
|
| 3 |
+
from typing import List, Tuple
|
| 4 |
+
import json
|
| 5 |
+
from datetime import datetime
|
| 6 |
+
import hashlib
|
| 7 |
+
|
| 8 |
+
# Import only what we need for better performance
|
| 9 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
| 10 |
+
from langchain.document_loaders import PyPDFLoader
|
| 11 |
+
from langchain_community.embeddings import HuggingFaceEmbeddings
|
| 12 |
+
from langchain_community.vectorstores import FAISS
|
| 13 |
+
from langchain.schema import Document
|
| 14 |
+
from huggingface_hub import InferenceClient
|
| 15 |
+
import logging
|
| 16 |
+
|
| 17 |
+
# Set up logging
|
| 18 |
+
logging.basicConfig(level=logging.INFO)
|
| 19 |
+
logger = logging.getLogger(__name__)
|
| 20 |
+
|
| 21 |
+
class ImprovedResearchAssistant:
|
| 22 |
+
def __init__(self):
|
| 23 |
+
# Use a lightweight embedding model
|
| 24 |
+
self.embeddings = HuggingFaceEmbeddings(
|
| 25 |
+
model_name="sentence-transformers/all-MiniLM-L6-v2",
|
| 26 |
+
model_kwargs={'device': 'cpu'},
|
| 27 |
+
encode_kwargs={'normalize_embeddings': True}
|
| 28 |
+
)
|
| 29 |
+
|
| 30 |
+
# Initialize InferenceClient for faster responses
|
| 31 |
+
self.client = InferenceClient(
|
| 32 |
+
"mistralai/Mixtral-8x7B-Instruct-v0.1",
|
| 33 |
+
token=os.getenv("HUGGINGFACE_TOKEN")
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
self.vector_store = None
|
| 37 |
+
self.conversation_history = []
|
| 38 |
+
|
| 39 |
+
# Check if we have a cached vector store
|
| 40 |
+
self.cache_path = "vector_store_cache"
|
| 41 |
+
if os.path.exists(self.cache_path):
|
| 42 |
+
logger.info("Loading cached vector store...")
|
| 43 |
+
self.vector_store = FAISS.load_local(self.cache_path, self.embeddings)
|
| 44 |
+
else:
|
| 45 |
+
logger.info("Building vector store from documents...")
|
| 46 |
+
self.load_documents()
|
| 47 |
+
|
| 48 |
+
def load_documents(self):
|
| 49 |
+
"""Load all documents about the researcher with caching"""
|
| 50 |
+
documents = []
|
| 51 |
+
|
| 52 |
+
# Enhanced research information
|
| 53 |
+
research_info = """
|
| 54 |
+
David Van Dijcke is a PhD student in Economics at the University of Michigan, Ann Arbor.
|
| 55 |
+
He is on the job market for the 2025-26 academic year.
|
| 56 |
+
|
| 57 |
+
RESEARCH FOCUS:
|
| 58 |
+
David's research develops novel econometric methods combining causal inference with functional
|
| 59 |
+
data analysis and optimal transport to study settings where the outcomes and/or covariates
|
| 60 |
+
are functional and high-dimensional.
|
| 61 |
+
|
| 62 |
+
KEY RESEARCH AREAS:
|
| 63 |
+
- Econometric methods and theory
|
| 64 |
+
- Causal inference with high-dimensional data
|
| 65 |
+
- Functional data analysis
|
| 66 |
+
- Optimal transport applications in economics
|
| 67 |
+
- Labor market policies and their effects
|
| 68 |
+
- Mobility patterns in response to crises (COVID-19, conflicts)
|
| 69 |
+
|
| 70 |
+
CURRENT POSITIONS:
|
| 71 |
+
- Rackham Graduate School Predoctoral Fellow at the University of Michigan (2024-25)
|
| 72 |
+
- Academic Visitor at the Bank of England
|
| 73 |
+
|
| 74 |
+
EDUCATION:
|
| 75 |
+
- PhD in Economics, University of Michigan (expected 2026)
|
| 76 |
+
- MA in Economics, University of Michigan
|
| 77 |
+
- Previous education includes a BA in Theatre, showcasing his interdisciplinary background
|
| 78 |
+
|
| 79 |
+
RESEARCH PAPERS:
|
| 80 |
+
1. "Functional Difference-in-Differences" - His job market paper developing new econometric methods
|
| 81 |
+
2. "Revenue and Production Functions" - Work on firm-level analysis
|
| 82 |
+
3. "Return to Office" - Research on workplace policies post-COVID
|
| 83 |
+
4. "Unmasking Partisanship" - Analysis of political behavior during COVID-19
|
| 84 |
+
5. Work on public response to government alerts during the Russian invasion of Ukraine
|
| 85 |
+
|
| 86 |
+
PERSONALITY:
|
| 87 |
+
David is approachable and values clear communication. His background in theatre gives him
|
| 88 |
+
unique presentation skills. He enjoys discussing both technical econometric details and
|
| 89 |
+
broader policy implications of his work.
|
| 90 |
+
|
| 91 |
+
CONTACT:
|
| 92 |
+
Email: dvdijcke@umich.edu
|
| 93 |
+
Website: https://dvandijcke.github.io
|
| 94 |
+
"""
|
| 95 |
+
|
| 96 |
+
documents.append(Document(
|
| 97 |
+
page_content=research_info,
|
| 98 |
+
metadata={"source": "website_overview", "type": "general_info"}
|
| 99 |
+
))
|
| 100 |
+
|
| 101 |
+
# Add information about his background
|
| 102 |
+
background_info = """
|
| 103 |
+
UNIQUE BACKGROUND:
|
| 104 |
+
David has an unusual path to economics - he holds a BA in Theatre, which gives him strong
|
| 105 |
+
communication and presentation skills. This interdisciplinary background helps him explain
|
| 106 |
+
complex econometric concepts in accessible ways.
|
| 107 |
+
|
| 108 |
+
TEACHING:
|
| 109 |
+
David has experience teaching various economics courses at the University of Michigan.
|
| 110 |
+
He is known for making complex statistical concepts accessible to students.
|
| 111 |
+
|
| 112 |
+
TECHNICAL SKILLS:
|
| 113 |
+
- Advanced econometric theory
|
| 114 |
+
- Programming in R, Python, Stata
|
| 115 |
+
- Machine learning applications in economics
|
| 116 |
+
- Functional data analysis
|
| 117 |
+
- Optimal transport theory
|
| 118 |
+
|
| 119 |
+
COLLABORATIONS:
|
| 120 |
+
David frequently collaborates with other researchers and is open to new research partnerships.
|
| 121 |
+
His work often involves interdisciplinary approaches combining economics with data science.
|
| 122 |
+
"""
|
| 123 |
+
|
| 124 |
+
documents.append(Document(
|
| 125 |
+
page_content=background_info,
|
| 126 |
+
metadata={"source": "background_info", "type": "personal"}
|
| 127 |
+
))
|
| 128 |
+
|
| 129 |
+
# Load PDFs efficiently - only key documents
|
| 130 |
+
key_pdfs = [
|
| 131 |
+
"CV_DavidVanDijcke.pdf",
|
| 132 |
+
"disco.pdf",
|
| 133 |
+
"fdr.pdf",
|
| 134 |
+
"r3d_arxiv_4apr2025.pdf",
|
| 135 |
+
"rto.pdf",
|
| 136 |
+
"unmasking_partisanship.pdf"
|
| 137 |
+
]
|
| 138 |
+
|
| 139 |
+
documents_dir = "documents"
|
| 140 |
+
if os.path.exists(documents_dir):
|
| 141 |
+
for filename in key_pdfs:
|
| 142 |
+
filepath = os.path.join(documents_dir, filename)
|
| 143 |
+
if os.path.exists(filepath):
|
| 144 |
+
try:
|
| 145 |
+
loader = PyPDFLoader(filepath)
|
| 146 |
+
pdf_docs = loader.load()
|
| 147 |
+
# Add first few pages only for faster loading
|
| 148 |
+
documents.extend(pdf_docs[:3])
|
| 149 |
+
logger.info(f"Loaded {filename}")
|
| 150 |
+
except Exception as e:
|
| 151 |
+
logger.error(f"Error loading {filename}: {e}")
|
| 152 |
+
|
| 153 |
+
# Split documents with optimized chunk size
|
| 154 |
+
text_splitter = RecursiveCharacterTextSplitter(
|
| 155 |
+
chunk_size=500, # Smaller chunks for faster retrieval
|
| 156 |
+
chunk_overlap=50,
|
| 157 |
+
length_function=len
|
| 158 |
+
)
|
| 159 |
+
splits = text_splitter.split_documents(documents)
|
| 160 |
+
|
| 161 |
+
# Create and cache vector store
|
| 162 |
+
self.vector_store = FAISS.from_documents(splits, self.embeddings)
|
| 163 |
+
|
| 164 |
+
# Save to cache
|
| 165 |
+
try:
|
| 166 |
+
self.vector_store.save_local(self.cache_path)
|
| 167 |
+
logger.info("Vector store cached successfully")
|
| 168 |
+
except Exception as e:
|
| 169 |
+
logger.error(f"Failed to cache vector store: {e}")
|
| 170 |
+
|
| 171 |
+
def is_greeting_or_casual(self, message: str) -> bool:
|
| 172 |
+
"""Check if the message is a greeting or casual conversation starter"""
|
| 173 |
+
greetings = [
|
| 174 |
+
"hello", "hi", "hey", "good morning", "good afternoon", "good evening",
|
| 175 |
+
"how are you", "what's up", "greetings", "howdy", "hola", "bonjour"
|
| 176 |
+
]
|
| 177 |
+
|
| 178 |
+
message_lower = message.lower().strip()
|
| 179 |
+
return any(greeting in message_lower for greeting in greetings) or len(message_lower.split()) <= 3
|
| 180 |
+
|
| 181 |
+
def generate_response(self, question: str, context: str) -> str:
|
| 182 |
+
"""Generate response using the Inference API for faster results"""
|
| 183 |
+
|
| 184 |
+
# Create a conversational prompt
|
| 185 |
+
prompt = f"""You are a friendly AI assistant for David Van Dijcke's academic website.
|
| 186 |
+
You help visitors learn about David's research, publications, and academic career in a warm,
|
| 187 |
+
conversational manner. Be helpful and engaging, not overly formal.
|
| 188 |
+
|
| 189 |
+
Context about David:
|
| 190 |
+
{context}
|
| 191 |
+
|
| 192 |
+
User's question: {question}
|
| 193 |
+
|
| 194 |
+
Instructions:
|
| 195 |
+
- If it's a greeting, respond warmly and offer to help
|
| 196 |
+
- For research questions, provide detailed, accurate information
|
| 197 |
+
- Be conversational and friendly, not stiff or robotic
|
| 198 |
+
- If you don't have specific information, acknowledge it politely
|
| 199 |
+
- Feel free to suggest related topics the user might be interested in
|
| 200 |
+
|
| 201 |
+
Response:"""
|
| 202 |
+
|
| 203 |
+
try:
|
| 204 |
+
# Use streaming for faster perceived response
|
| 205 |
+
response = self.client.text_generation(
|
| 206 |
+
prompt,
|
| 207 |
+
max_new_tokens=300,
|
| 208 |
+
temperature=0.7,
|
| 209 |
+
top_p=0.95,
|
| 210 |
+
repetition_penalty=1.1,
|
| 211 |
+
do_sample=True
|
| 212 |
+
)
|
| 213 |
+
return response
|
| 214 |
+
except Exception as e:
|
| 215 |
+
logger.error(f"Error generating response: {e}")
|
| 216 |
+
return "I apologize, but I'm having trouble generating a response right now. Could you please try again?"
|
| 217 |
+
|
| 218 |
+
def answer_question(self, message: str, history: List[Tuple[str, str]] = None) -> str:
|
| 219 |
+
"""Answer a question about the researcher"""
|
| 220 |
+
|
| 221 |
+
# Handle greetings and casual conversation
|
| 222 |
+
if self.is_greeting_or_casual(message):
|
| 223 |
+
greeting_responses = [
|
| 224 |
+
"Hello! I'm here to help you learn about David Van Dijcke's research and academic work. What would you like to know?",
|
| 225 |
+
"Hi there! Welcome to David's research assistant. I can tell you about his econometric methods, publications, or academic journey. What interests you?",
|
| 226 |
+
"Hello! Great to meet you. I'd be happy to share information about David's work in economics, his research papers, or his background. What would you like to explore?",
|
| 227 |
+
]
|
| 228 |
+
|
| 229 |
+
# Use message hash to select consistent greeting
|
| 230 |
+
response_index = int(hashlib.md5(message.encode()).hexdigest(), 16) % len(greeting_responses)
|
| 231 |
+
return greeting_responses[response_index]
|
| 232 |
+
|
| 233 |
+
try:
|
| 234 |
+
# Retrieve relevant documents
|
| 235 |
+
docs = self.vector_store.similarity_search(message, k=3)
|
| 236 |
+
|
| 237 |
+
# Combine context from retrieved documents
|
| 238 |
+
context = "\n".join([doc.page_content for doc in docs])
|
| 239 |
+
|
| 240 |
+
# Generate response
|
| 241 |
+
response = self.generate_response(message, context)
|
| 242 |
+
|
| 243 |
+
# Add source information if specific papers were referenced
|
| 244 |
+
paper_keywords = ["functional difference", "revenue production", "return to office", "unmasking", "ukraine"]
|
| 245 |
+
if any(keyword in message.lower() for keyword in paper_keywords):
|
| 246 |
+
response += "\n\n*For more details, you can find David's papers on his website.*"
|
| 247 |
+
|
| 248 |
+
return response
|
| 249 |
+
|
| 250 |
+
except Exception as e:
|
| 251 |
+
logger.error(f"Error in answer_question: {e}")
|
| 252 |
+
return "I apologize, but I'm having trouble accessing the information right now. Please try rephrasing your question or ask about David's research areas, publications, or academic background."
|
| 253 |
+
|
| 254 |
+
# Create optimized Gradio interface
|
| 255 |
+
def create_gradio_interface():
|
| 256 |
+
assistant = ImprovedResearchAssistant()
|
| 257 |
+
|
| 258 |
+
def chat_function(message, history):
|
| 259 |
+
return assistant.answer_question(message, history)
|
| 260 |
+
|
| 261 |
+
# Modern, clean CSS
|
| 262 |
+
custom_css = """
|
| 263 |
+
#chatbot {
|
| 264 |
+
height: 600px;
|
| 265 |
+
}
|
| 266 |
+
.gradio-container {
|
| 267 |
+
font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
|
| 268 |
+
max-width: 900px;
|
| 269 |
+
margin: auto;
|
| 270 |
+
}
|
| 271 |
+
.user-message, .bot-message {
|
| 272 |
+
padding: 15px;
|
| 273 |
+
border-radius: 10px;
|
| 274 |
+
margin: 10px 0;
|
| 275 |
+
}
|
| 276 |
+
"""
|
| 277 |
+
|
| 278 |
+
# Create the interface with better examples
|
| 279 |
+
demo = gr.ChatInterface(
|
| 280 |
+
fn=chat_function,
|
| 281 |
+
title="Chat with David Van Dijcke's Research Assistant",
|
| 282 |
+
description=(
|
| 283 |
+
"Hi! I'm here to help you learn about David's research in economics. "
|
| 284 |
+
"Feel free to ask about his work, papers, or just say hello! 👋"
|
| 285 |
+
),
|
| 286 |
+
examples=[
|
| 287 |
+
"Hello! Who is David?",
|
| 288 |
+
"What are David's main research interests?",
|
| 289 |
+
"Tell me about functional difference-in-differences",
|
| 290 |
+
"What's David's background?",
|
| 291 |
+
"Which papers has David published?",
|
| 292 |
+
"Is David on the job market?",
|
| 293 |
+
"What econometric methods has he developed?"
|
| 294 |
+
],
|
| 295 |
+
theme=gr.themes.Soft(
|
| 296 |
+
primary_hue="blue",
|
| 297 |
+
secondary_hue="gray",
|
| 298 |
+
neutral_hue="gray",
|
| 299 |
+
font=gr.themes.GoogleFont("Inter")
|
| 300 |
+
),
|
| 301 |
+
css=custom_css,
|
| 302 |
+
retry_btn="Retry",
|
| 303 |
+
undo_btn="Undo",
|
| 304 |
+
clear_btn="Clear Chat",
|
| 305 |
+
submit_btn="Send",
|
| 306 |
+
autofocus=True
|
| 307 |
+
)
|
| 308 |
+
|
| 309 |
+
return demo
|
| 310 |
+
|
| 311 |
+
if __name__ == "__main__":
|
| 312 |
+
# Set cache directory
|
| 313 |
+
os.makedirs("vector_store_cache", exist_ok=True)
|
| 314 |
+
|
| 315 |
+
demo = create_gradio_interface()
|
| 316 |
+
demo.launch(
|
| 317 |
+
share=False,
|
| 318 |
+
server_name="0.0.0.0",
|
| 319 |
+
server_port=7860,
|
| 320 |
+
show_error=True
|
| 321 |
+
)
|
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio==4.19.2
|
| 2 |
+
langchain==0.1.9
|
| 3 |
+
langchain-community==0.0.24
|
| 4 |
+
sentence-transformers==2.5.1
|
| 5 |
+
faiss-cpu==1.7.4
|
| 6 |
+
pypdf==4.0.2
|
| 7 |
+
huggingface-hub==0.20.3
|
| 8 |
+
python-dotenv==1.0.1
|
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio==4.19.2
|
| 2 |
+
langchain==0.1.9
|
| 3 |
+
langchain-community==0.0.24
|
| 4 |
+
sentence-transformers==2.5.1
|
| 5 |
+
faiss-cpu==1.7.4
|
| 6 |
+
pypdf==4.0.2
|
| 7 |
+
google-generativeai==0.8.3
|
| 8 |
+
python-dotenv==1.0.1
|
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for David Van Dijcke's Research Assistant
|
| 4 |
+
Tests key functionality and econometric focus
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
from app import ImprovedResearchAssistant
|
| 9 |
+
|
| 10 |
+
def test_assistant():
|
| 11 |
+
"""Test the assistant with various queries"""
|
| 12 |
+
print("Testing David Van Dijcke's Research Assistant...\n")
|
| 13 |
+
|
| 14 |
+
# Initialize assistant
|
| 15 |
+
assistant = ImprovedResearchAssistant()
|
| 16 |
+
|
| 17 |
+
# Test queries
|
| 18 |
+
test_queries = [
|
| 19 |
+
"Hello!",
|
| 20 |
+
"Who is David Van Dijcke?",
|
| 21 |
+
"What is David's job market paper about?",
|
| 22 |
+
"Tell me about R3D",
|
| 23 |
+
"What econometric methods has David developed?",
|
| 24 |
+
"How does David use optimal transport in his research?",
|
| 25 |
+
"What is functional data analysis?",
|
| 26 |
+
"Tell me about the Free Discontinuity Regression paper",
|
| 27 |
+
"What policy applications does David's research have?",
|
| 28 |
+
"Is David on the job market?"
|
| 29 |
+
]
|
| 30 |
+
|
| 31 |
+
for i, query in enumerate(test_queries, 1):
|
| 32 |
+
print(f"\n{'='*60}")
|
| 33 |
+
print(f"Test {i}: {query}")
|
| 34 |
+
print('='*60)
|
| 35 |
+
|
| 36 |
+
try:
|
| 37 |
+
response = assistant.answer_question(query)
|
| 38 |
+
print(f"Response: {response}")
|
| 39 |
+
|
| 40 |
+
# Check for key terms in responses
|
| 41 |
+
if i == 2: # "Who is David" query
|
| 42 |
+
assert "econometrician" in response.lower() or "econometric" in response.lower()
|
| 43 |
+
print("✓ Correctly identifies David as an econometrician")
|
| 44 |
+
|
| 45 |
+
if i == 4: # R3D query
|
| 46 |
+
assert "distribution" in response.lower() or "r3d" in response.lower()
|
| 47 |
+
print("✓ Mentions distribution-valued outcomes")
|
| 48 |
+
|
| 49 |
+
except Exception as e:
|
| 50 |
+
print(f"❌ Error: {e}")
|
| 51 |
+
|
| 52 |
+
print("\n" + "="*60)
|
| 53 |
+
print("Testing complete!")
|
| 54 |
+
|
| 55 |
+
if __name__ == "__main__":
|
| 56 |
+
# Set up environment
|
| 57 |
+
if not os.getenv("GOOGLE_API_KEY"):
|
| 58 |
+
print("Warning: No GOOGLE_API_KEY found. Using limited mode.")
|
| 59 |
+
print("For best results, add your API key to .env file\n")
|
| 60 |
+
|
| 61 |
+
test_assistant()
|