planning-ai / README.md
cjber's picture
fix: update deps
2b1fd2b

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade
metadata
title: Planning AI
sdk: streamlit
app_file: app.py
pinned: false

Python LangChain OpenAI

Planning AI is a tool designed to process and analyse responses to local government policy documents. It uses advanced natural language processing techniques to summarise and categorise feedback, providing insights into public opinion on proposed developments.

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
        __start__([<p>__start__</p>]):::first
        add_entities(add_entities)
        generate_summary(generate_summary)
        check_hallucination(check_hallucination)
        fix_hallucination(fix_hallucination)
        generate_final_report(generate_final_report)
        __end__([<p>__end__</p>]):::last
        __start__ --> add_entities;
        check_hallucination --> generate_final_report;
        generate_final_report --> __end__;
        add_entities -.-> generate_summary;
        generate_summary -.-> check_hallucination;
        check_hallucination -.-> fix_hallucination;
        fix_hallucination -.-> check_hallucination;
        classDef default fill:#f2f0ff,line-height:1.2
        classDef first fill-opacity:0
        classDef last fill:#bfb6fc

Features

  • Document Processing: Extracts and processes text from .json and .pdf files.
  • Summarisation: Generates concise summaries each response, highlighting key points and how they relate to policies.
  • Thematic Analysis: Breaks down responses into themes.
  • Reporting: Aggregates response summaries to produce an extensive final overview, and summary document.

Project Tree

planning_ai/
β”œβ”€β”€ chains  # llm calls with prompts using langchain
β”œβ”€β”€ common  # shared utility functions
β”œβ”€β”€ documents  # processing for final documents
β”œβ”€β”€ eval  # evaluation functions to compare summaries to manual summaries
β”œβ”€β”€ graph.py  # main langgraph functiosn
β”œβ”€β”€ llms  # openai llm definitions
β”œβ”€β”€ logging.py  # shared logging functiosn
β”œβ”€β”€ main.py  # calls langgraph functions and document processing
β”œβ”€β”€ nodes  # langgraph nodes that use chains to modify graph state
β”œβ”€β”€ preprocessing  # functions for processing .json and .pdf files 
β”œβ”€β”€ states.py  # define the paramaters used by graph states
└── themes.py  # defines main themes and policies

Installation

To set up the project, ensure you have Python >3.10 installed. Then, clone the repository and install the required dependencies:

git clone https://github.com/cjber/planning-ai.git
cd planning_ai
pip install . # (or uv sync)

Usage

This project uses Streamlit to provide a simple frontend to the system. Run using:

streamlit run app.py

The project is hosted on HuggingFace Spaces, to push new updates you must run:

git push hf main

Alternatively run everything manually:

  1. Preprocessing: Run the preprocessing scripts to convert raw data into a format suitable for analysis.

    python -m planning_ai.preprocessing.gcpt3
    python -m planning_ai.preprocessing.azure_doc
    
  2. Run Graph: Execute the main script to process the documents and generate Summary documents.

    python -m planning_ai.main
    

Configuration

  • Environment Variables: Use a .env file to store sensitive information like API keys.
    • OPENAI_API_KEY required for summarisation.
    • AZURE_API_KEY and AZURE_API_ENDPOINT needed to process .pdfs
  • Constants: Adjust Consts in planning_ai/common/utils.py to modify token limits and other settings.
  • The document output format may be altered using files in planning_ai/document
  • There are several dependencies that are required to run this project locally:
    • texlive-latex-extra
    • fonts-liberation
    • cm-super
    • dvipng
    • pandoc

Workflow

  1. Data Loading: Documents are loaded from the staging directory using the DirectoryLoader.
  2. Text Splitting: Documents are split into manageable chunks using CharacterTextSplitter.
  3. Graph Processing: The StateGraph orchestrates the flow of data through various nodes, including mapping and reducing summaries.
  4. Summarisation: The map_chain and reduce_chain are used to generate and refine summaries using LLMs.
  5. Output: Final summaries and thematic breakdowns are used to produce a final report.

Citations within the final report correspond with the document IDs attributed to responses in the summaries document.