smokxy commited on
Commit
cf68bed
Β·
1 Parent(s): 6f2728d

update readme and add .env.example

Browse files
Files changed (2) hide show
  1. .env.example +18 -0
  2. README.md +88 -29
.env.example ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gemini
2
+ GEMINI_API_KEY1=
3
+ GEMINI_API_KEY2=
4
+ ...
5
+ GEMINI_API_KEYN=
6
+
7
+ # MongoDB
8
+ MONGODB_URI = ""
9
+ DB_NAME = "papers_summary_database"
10
+ COLLECTION_NAME = "papers"
11
+ METADATA_COLLECTION = "metadata"
12
+
13
+ # API and URL configurations
14
+ HF_API_URL = "https://huggingface.co/api/daily_papers"
15
+ PDF_BASE_URL = "https://arxiv.org/pdf/{id}.pdf"
16
+
17
+ # Storage configurations
18
+ TEMP_DIR = "temp_papers"
README.md CHANGED
@@ -1,31 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ```
2
- paperflux/
3
- β”œβ”€β”€ .env.example
4
- β”œβ”€β”€ pyproject.toml
5
- β”œβ”€β”€ poetry.lock
6
- β”œβ”€β”€ README.md
7
- β”œβ”€β”€ .gitignore
8
- β”œβ”€β”€ src/
9
- β”‚ β”œβ”€β”€ __init__.py
10
- β”‚ β”œβ”€β”€ tools/
11
- β”‚ β”‚ β”œβ”€β”€ __init__.py
12
- β”‚ β”‚ β”œβ”€β”€ hf_tools/
13
- β”‚ β”‚ β”‚ β”œβ”€β”€ __init__.py
14
- β”‚ β”‚ β”‚ β”œβ”€β”€ paper_pdf_tool.py
15
- β”‚ β”‚ β”‚ └── summarization_tool.py
16
- β”‚ β”‚ β”œβ”€β”€ cache/
17
- β”‚ β”‚ β”‚ β”œβ”€β”€ __init__.py
18
- β”‚ β”‚ β”‚ β”œβ”€β”€ redis_client.py # Core Redis operations
19
- β”‚ β”‚ β”‚ └── cache_interface.py # Abstract base class
20
- β”‚ β”‚ └── cache_manager.py # High-level cache operations
21
- β”‚ β”œβ”€β”€ agents/
22
- β”‚ β”‚ β”œβ”€β”€ __init__.py
23
- β”‚ β”‚ └── agent.py
24
- β”‚ β”œβ”€β”€ models/
25
- β”‚ β”‚ β”œβ”€β”€ __init__.py
26
- β”‚ β”‚ └── model.py # Pydantic models for data validation
27
- β”‚ │── scheduler.py # Scheduled cache updates
28
- | └── app.py # gradio web app
29
- ```
30
 
31
- ``` Above is agentic workflow design, initial workflow will be using gemini api key and will be extended to agentic system ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PaperFlux: AI Research Paper Insights
2
+
3
+ PaperFlux is a Streamlit based web application powered by Gemini that automatically fetches, analyzes, and explains the latest AI research papers from Hugging Face's daily curated list. Using Google's Gemini Pro AI, it provides in-depth explanations and technical breakdowns of complex research papers, making cutting-edge AI research more accessible.
4
+
5
+ ## Features
6
+
7
+ - **Daily Updates**: Automatically fetches and processes new papers every weekday at ```8:00 AM UTC```
8
+ - **AI-Powered Analysis**: Uses Google's ```Gemini Pro``` to provide detailed explanations of complex research
9
+ - **Paper Library**: Browse through all processed papers with easy navigation
10
+ - **Technical Breakdowns**: Get in-depth explanations of mathematical concepts and methodologies
11
+ - **Critical Assessment**: Read AI-generated critical analysis of each paper
12
+ - **Responsive Interface**: User-friendly interface built with Streamlit
13
+
14
+ ## System Architecture
15
+
16
+ PaperFlux follows a robust architecture for fetching, processing, and displaying research papers:
17
+
18
+ ```mermaid
19
+ flowchart TD
20
+ A[Scheduler] -->|Daily trigger| B[Paper Processor]
21
+ B -->|Fetch papers| C[Hugging Face API]
22
+ B -->|Download PDFs| D[arXiv]
23
+ B -->|Analyze content| E[Gemini Pro API]
24
+ B -->|Store data| F[(MongoDB)]
25
+ G[Streamlit UI] -->|Display papers| F
26
+ H[User] -->|View papers| G
27
+
28
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+ ## System Flow
31
+
32
+ 1. **Scheduled Polling**: Every weekday at 8:00 AM UTC, the scheduler checks if papers need to be processed
33
+ 2. **Data Collection**: The application fetches the latest papers from Hugging Face's API
34
+ 3. **PDF Processing**: Papers are downloaded from arXiv and stored temporarily
35
+ 4. **AI Analysis**: Each paper is analyzed using Google's Gemini Pro API
36
+ 5. **Data Storage**: Results are stored in MongoDB for quick access
37
+ 6. **User Interface**: Users can browse all processed papers through the Streamlit interface
38
+
39
+ ## Installation
40
+
41
+ ### Prerequisites
42
+
43
+ - Python 3.8 or higher
44
+ - MongoDB database
45
+ - Google Gemini Pro API key(s)
46
+ - Poetry (dependency management)
47
+
48
+ ### Local Setup with Poetry
49
+
50
+ 1. Clone the repository:
51
+ ```bash
52
+ git clone https://github.com/yourusername/paperflux.git
53
+ cd paperflux
54
+ ```
55
+
56
+ 2. Install dependencies using Poetry:
57
+ ```bash
58
+ # Install Poetry if you haven't already
59
+ # curl -sSL https://install.python-poetry.org | python3 -
60
+
61
+ # Install dependencies
62
+ poetry install
63
+ ```
64
+
65
+ 3. Create a `.env` file with your credentials (copy from `.env.example`):
66
+ ```bash
67
+ cp .env.example .env
68
+ # Edit .env with your credentials
69
+ ```
70
+
71
+ 4. Configure your environment variables:
72
+ ```
73
+ MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/paperflux
74
+ GEMINI_API_KEY1=your_gemini_api_key_1
75
+ GEMINI_API_KEY2=your_gemini_api_key_2
76
+ # Add more API keys as needed for load balancing
77
+ ```
78
+
79
+ 5. Run the Streamlit app with Poetry:
80
+ ```bash
81
+ poetry run streamlit run app.py
82
+ ```
83
+
84
+ ## Contributing
85
+
86
+ Contributions are welcome! Please feel free to submit a Pull Request.
87
+
88
+ ## License
89
+
90
+ This project is licensed under the MIT License - see the LICENSE file for details.