Spaces:
Build error
A newer version of the Streamlit SDK is available: 1.56.0
title: Planning AI
sdk: streamlit
app_file: app.py
pinned: false
Planning AI is a tool designed to process and analyse responses to local government policy documents. It uses advanced natural language processing techniques to summarise and categorise feedback, providing insights into public opinion on proposed developments.
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
__start__([<p>__start__</p>]):::first
add_entities(add_entities)
generate_summary(generate_summary)
check_hallucination(check_hallucination)
fix_hallucination(fix_hallucination)
generate_final_report(generate_final_report)
__end__([<p>__end__</p>]):::last
__start__ --> add_entities;
check_hallucination --> generate_final_report;
generate_final_report --> __end__;
add_entities -.-> generate_summary;
generate_summary -.-> check_hallucination;
check_hallucination -.-> fix_hallucination;
fix_hallucination -.-> check_hallucination;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
Features
- Document Processing: Extracts and processes text from
.jsonand.pdffiles. - Summarisation: Generates concise summaries each response, highlighting key points and how they relate to policies.
- Thematic Analysis: Breaks down responses into themes.
- Reporting: Aggregates response summaries to produce an extensive final overview, and summary document.
Project Tree
planning_ai/
βββ chains # llm calls with prompts using langchain
βββ common # shared utility functions
βββ documents # processing for final documents
βββ eval # evaluation functions to compare summaries to manual summaries
βββ graph.py # main langgraph functiosn
βββ llms # openai llm definitions
βββ logging.py # shared logging functiosn
βββ main.py # calls langgraph functions and document processing
βββ nodes # langgraph nodes that use chains to modify graph state
βββ preprocessing # functions for processing .json and .pdf files
βββ states.py # define the paramaters used by graph states
βββ themes.py # defines main themes and policies
Installation
To set up the project, ensure you have Python >3.10 installed. Then, clone the repository and install the required dependencies:
git clone https://github.com/cjber/planning-ai.git
cd planning_ai
pip install . # (or uv sync)
Usage
This project uses Streamlit to provide a simple frontend to the system. Run using:
streamlit run app.py
The project is hosted on HuggingFace Spaces, to push new updates you must run:
git push hf main
Alternatively run everything manually:
Preprocessing: Run the preprocessing scripts to convert raw data into a format suitable for analysis.
python -m planning_ai.preprocessing.gcpt3 python -m planning_ai.preprocessing.azure_docRun Graph: Execute the main script to process the documents and generate Summary documents.
python -m planning_ai.main
Configuration
- Environment Variables: Use a
.envfile to store sensitive information like API keys.OPENAI_API_KEYrequired for summarisation.AZURE_API_KEYandAZURE_API_ENDPOINTneeded to process.pdfs
- Constants: Adjust
Constsinplanning_ai/common/utils.pyto modify token limits and other settings. - The document output format may be altered using files in
planning_ai/document - There are several dependencies that are required to run this project locally:
- texlive-latex-extra
- fonts-liberation
- cm-super
- dvipng
- pandoc
Workflow
- Data Loading: Documents are loaded from the staging directory using the
DirectoryLoader. - Text Splitting: Documents are split into manageable chunks using
CharacterTextSplitter. - Graph Processing: The
StateGraphorchestrates the flow of data through various nodes, including mapping and reducing summaries. - Summarisation: The
map_chainandreduce_chainare used to generate and refine summaries using LLMs. - Output: Final summaries and thematic breakdowns are used to produce a final report.
Citations within the final report correspond with the document IDs attributed to responses in the summaries document.
