Spaces:

semmyk
/

semmyKG

Runtime error

App Files Files Community

semmyk commited on Oct 23, 2025

Commit

dc56f4d

1 Parent(s): 3ff6af7

v0.2.8.5: Baseline 02 - upload markdown files/folder, accumulate_dir - sidebar (settings) - visualise KG - reset working folder files - updated README

Browse files

Files changed (5) hide show

README.md +62 -192
app.py +77 -37
app_gradio_lightrag.py +72 -107
utils/file_utils.py +37 -1
utils/llm_login.py +1 -1

README.md CHANGED Viewed

@@ -34,220 +34,90 @@ requires-python: ">=3.12"
 #---
 ---
-# semmyKG[lightrag] - LightRAG-based Knowledge Graph Toolkit
-A modular, sophisticated Gradio application for Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) using the [LightRAG][1] framework.
-##   Overview
-semmyKG gears towards a comprehensive solution that combines the power of LightRAG with modern web interfaces to create, query, and visualise knowledge graphs from markdown documents.
- The toolkit enables intelligent document processing, semantic search, and interactive knowledge graph visualisation with support for multiple LLM backends. It supports OpenAI and Ollama LLM backends.
-##  ✨ Key Features
-###  🔍 Intelligent Document processing and RAG Capabilities
-- **Dual-level KG-RAG**: Combines traditional RAG with knowledge graph reasoning (powered by LightRAG)
-- **Multi-modal LLM Support**: OpenAI, Ollama, and Google GenAI backends. Full GenAI support coming soon.
-- **Semantic Search**: Vector-based document retrieval with embedding models (powered by LightRAG)
-- **Multi-format Support**: Markdown ingestion with ParserPDF ([GitHub][3] | [HF Space][4]) integration for PDF, Word, and HTML conversion. Full integration coming soon.
-- **Markdown Ingestion**: Process and index markdown files from specified directories
-- **Knowledge Graph Construction**: Automatically builds entity-relationship graphs after indexing
-- **Interactive Visualisation**: Real-time KG exploration
-###  ️ Technical Excellence
-- **Modular Architecture**: Clean, maintainable code structure
-- **Async Operations**: Efficient handling of large document collections
-- **Robust Error Handling**: Comprehensive logging and exception management
-##  ️ Installation & Setup
-### Method 1: Using UV (Recommended)
 ```bash
 git clone https://github.com/semmyk-research/semmyKG
 cd semmyKG
-# Create virtual environment and install dependencies
-uv venv .venv
-source .venv/bin/activate  # Linux/MacOS
-# .venv\Scripts\activate on Windows
-# Sync dependencies
-uv pip sync
-```
-### Method 2: Traditional Python Setup
-```bash
-git clone https://github.com/semmyk-research/semmyKG
-cd semmyKG
-# Create virtual environment
 python -m venv .venv
-source .venv/bin/activate  # Linux/MacOS
-# .venv\Scripts\activate on Windows
-# Install dependencies
 pip install -r requirements.txt
 ```
-##  🔧 Configuration
-### Environment Variables Setup
-Copy `.env.example` to `.env` and configure your settings:
-```env
-# API Configuration
 OPENAI_API_KEY=your-openai-api-key
-# Model Selection (format: provider/model-identifier)
-LLM_MODEL=openai/gpt-oss-120b
-# LLM Inference Endpoints
-OPENAI_API_BASE=your-llm-provider-endpoint
-# For local inference servers: http://localhost:1234/v1
-# Embedding Configuration
-OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint
-# Note: For local embedding services, do not include /embedding in URL
-LLM_MODEL_EMBED=your-embedding-model
-# Ollama/Local hosting Configuration
 OLLAMA_HOST=http://localhost:11434
-OLLAMA_API_KEY=your-ollama-api-key-if-required
-#[For LMStudio] OLLAMA_API_KEY=lmstudio
-## Alternative: Direct Web UI Configuration
-# If .env is not set, you can enter credentials directly in the web interface
-```
-##   Quick Start
-### 1. Initialise the Application
-```bash
-python app.py
-```
-### 2. Web Interface Workflow
-1. **Select Data Folder**: Choose your markdown documents directory (default: `dataset/data/docs`)
-2. **Configure Settings**:
-- **Choose LLM Backend**: Select between OpenAI, Ollama, or GenAI
-- Select or input other configuration in the Settings pane,
-3. **Activate**: Activate the lightRAG constructor
-4. **Process Documents**: Click 'Index Documents' to process your files
-5. **Query the System**: Enter your questions and select query mode
-6. **Visualise Results**: Click 'Show Knowledge Graph' to finalise building Knowledge Graph and for interactive exploration
-##  📁 Project Structure
-```
-semmyKG/
-├── app_gradio_lightrag.py    # Central Gradio coordinating processing
-├── app.py                    # Main Gradio app entry point
-├── requirements.txt          # Project dependencies
-├── .env.example              # Environment template
-├── dataset/
-│    └── data/
-│        └── docs/            # Default document directory
-├── utils/
-│   ├── utils.py              # Utility functions
-│   ├── file_utils.py          # File operations
-│   ├── logger.py              # Logging configuration
-└── logs/                     # Application logs
-```
-##   Deployment Options
-### Local Development
 ```bash
-python app.py
-```
-### HuggingFace Spaces
-- **Requirements**: Ensure all dependencies in `requirements.txt`
-- **Environment**: Configure via web UI or Space secrets
-### Google Colab
-- **Quick Setup**: Install requirements and configure tokens in 'Secret'
--  **Run**: Copy to `Files`, following folder structure and run app cells as approriate
-###  📋 System Requirements
-- **Python**: 3.12+
-- **Memory**: 8GB+ vRAM recommended for large document sets
-- **Storage**: Sufficient space for document collections and vector databases
-###  🔌 Supported LLM Backends
-#### OpenAI Compatible and Google GenAI
-- **Models**: Frontline providers (Openai, Deepseek ...) and custom models
-- **Gemini Models**: Access to Google's latest AI models
-- **Endpoints**: Local inference servers (LMStudio, Jan.ai, ollama ...)
-- **Embedding Models**: Multiple sentence transformer models and inference providers
-#### Ollama Integration
-- **Local Models**: Access to Ollama's model ecosystem
-- **Self-hosted**: Complete data privacy and control
-### Document Ingestion
-- **Format Support**: Markdown files only (use ParserPDF for other formats)
 ```python
-# The system automatically processes markdown files from:
-# - dataset/data/docs/ (default)
 ```
-### Query Modes
-- **Semantic Search**: Vector-based similarity matching
-- **KG-enhanced RAG**: Combines traditional RAG with graph reasoning
-### Interactive Visualisation
-- **Real-time Exploration**: Dynamic graph manipulation
-- **Entity Highlighting**: Focus on specific nodes and relationships
-###  📈 Performance Optimisation: Batch Processing
-- **Parallel Insertion**: Configurable batch sizes
-- **Rate Limiting**: Built-in delays to prevent API throttling
-###  📊  Custom System Prompts: Domain-Specific Expertise
-- **Domain Adaptation**: Modify prompts for specific use cases and customised NER (Named Entity Recognition)domain-specific entities rules
-- **Specialised Processing**: Tailored entity recognition for security domains
-- **Legislation Awareness**: Built-in understanding of legal frameworks
-##  🔍 Troubleshooting
-### Common Issues
-- **Module Import Errors**: Ensure all dependencies are installed
-- **API Connection Issues**: Verify endpoint URLs and API keys
-- **Memory Management**: Monitor resource usage during large-scale indexing
-### Notes
-- All user-facing text are in UK English
-- For advanced configuration, see LightRAG documemntation
-Pending full integration, use our ParserPDF tool ([GitHub][3] | [HF Space][4]) to generate markdown from documents (PDF, Word, html)
-##  🤝 Contributing
-We welcome contributions! Please see our contributing guidelines for more information.
-## 🛣️ Roadmap (no defined timeline)
-- Integrate Huggingface log in (in progress)
 - [ParserPDF][3] integration
-- Pre and post processing document viewer
-- Modal platform support
-- Conected UX refactoring
-##  📄 License
-This project is licensed under the [MIT License][2].
-##  🔗 References
-- [LightRAG Framework][1]
-- [ParserPDF Tool][3] for document conversion
-- [HuggingFace Space][4] for ParserPDF
-[1]: https://github.com/HKUDS/LightRAG "LightRAG GitHub Repository"
 [2]: https://opensource.org/license/mit "MIT License"
-[3]: https://github.com/semmyk-research/parserPDF "ParserPDF GitHub Repository"
-[4]: https://huggingface.co/spaces/semmyk/parserPDF "ParserPDF HuggingFace Space"

 #---
 ---
+# LightRAG Gradio App
+A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using [LightRAG][1]. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF ([GitHub][3] | [HF Space][4]) pipeline generate markdown from documents (pdf, Word, html).
+## Features
+- LightRAG for Dual-level RAG and knowledge graph (KG)
+- Ingest markdown files from a folder (default: `dataset/data/docs`).
+- Query with OpenAI or Ollama backend (user-selectable)
+- Visualise KG interactively in-browser
+- Deployable to venv, Colab, or HuggingFace Spaces
+- Robust, pythonic, modular code (UK English)
+## Setup
+### 1. Clone and create venv
 ```bash
 git clone https://github.com/semmyk-research/semmyKG
 cd semmyKG
+uv venv .venv              # ensure you have the uv package
+source .venv/bin/activate  # or .venv\Scripts\activate on Windows
+uv pip sync                # or uv pip sync requirements.txt
+or
 python -m venv .venv
+source .venv/bin/activate  # or .venv\Scripts\activate on Windows
 pip install -r requirements.txt
 ```
+### 2. Configure environment
+Copy `.env.example` to `.env` and fill in your keys:
+```markdown
 OPENAI_API_KEY=your-openai-api-key
+LLM_MODEL=your-LLM-model-Name
+    ##(in the format: provider/model-identifier)
+OPENAI_API_BASE=your-LLM-inference-provider-endpoint
+    ##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
+OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint
+    ##(for locally hosted, do not include /embedding)
+LLM_MODEL_EMBED=your-embedding-model  ##(in the format: provider/embedding-name)
 OLLAMA_HOST=http://localhost:11434
+OLLAMA_API_KEY=  ##(include if required)
+```
+If .env is not set, you can enter into the web UI directly. <br>
+Ditto, override .env by inputting directly in web UI.
+### 3. Run the app
 ```bash
+python app_gradio_lightrag.py
+```
+For 'faster' development 'debug'
 ```python
+##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
+gradio app_gradio_lightrag.py --demo-name=gradio_ui
 ```
+### 4. Colab/Spaces
+- For HuggingFace Spaces: ensure all dependencies are in `requirements.txt` and `.env` is set via the web UI or Space secret.
+- For Colab: install requirements and run the app cell.
+## Usage
+- Select your data folder (default: `dataset/data/docs`)
+- Choose LLM backend (OpenAI or Ollama). GenAI has a bug yieling error: role: 'assistant' instead of 'user' when updating history.
+- Activate the RAG constructor
+- Click 'Index Documents' to build the KG entities
+- Click 'Query' to get answers
+-- Enter your query and select query mode
+- Click 'Show Knowledge Graph' to visualise the KG
+## Notes
+- Only markdown files are supported for ingestion (images in `/images` subfolder are ignored for now). <br>NB: other formats will be enabled later: pdf, txt, html...
+- To generate markdown from documents (PDf, Word, html), use our ParserPDF tool [GitHub][3] | [HF Space][4].
+- All user-facing text is in UK English
+- For advanced configuration, see LightRAG documentation
+## Roadmap (no defined timeline)
+- HuggingFace log in
 - [ParserPDF][3] integration
+## License
+[MIT][2]
+[1]: https://github.com/HKUDS/LightRAG "LightRAG GitHub"
 [2]: https://opensource.org/license/mit "MIT License"
+[3]: https://github.com/semmyk-research/parserPDF "ParserPDF (GitHub)"
+[4]: https://huggingface.co/spaces/semmyk/parserPDF "ParserPDF (HF Space)"

app.py CHANGED Viewed

@@ -6,6 +6,7 @@ import gradio as gr
 #from watchfiles import run_process  ##gradio reload watch
 from app_gradio_lightrag import LightRAGApp  ##SMY lightrag logging
 from utils.llm_login import get_login_token
 import asyncio
 import nest_asyncio
@@ -17,6 +18,7 @@ from dotenv import load_dotenv
 # Load environment variables
 load_dotenv()
 # Pythonic error handling decorator
 def handle_errors(func):
     def wrapper(*args, **kwargs):
@@ -25,6 +27,7 @@ def handle_errors(func):
         except Exception as e:
             return gr.update(value=f"Error: {e}")
     return wrapper
 # Instantiate app logic
 #app_logic = LightRAGApp()  ## See main()
@@ -67,39 +70,64 @@ def gradio_ui(app_logic: LightRAGApp):
         """)
         # Step 0: Section 1
         # Define openai_api textbox initial value
         openai_api_key_init = os.getenv("OPENAI_API_KEY", "jan-ai")
-        with gr.Accordion(label="🛞 LLM settings", open=False):
-            with gr.Row():
-                data_folder_tb = gr.Textbox(value="dataset/data/docs2", label="Data Folder (markdown only)", show_copy_button=True)
-                working_dir_tb = gr.Textbox(value="./working_folder1", label="lightRAG working folder", show_copy_button=True)
-                llm_backend_cb = gr.Radio(["OpenAI", "Ollama", "GenAI"], value="OpenAI", label="LLM Backend: OpenAI, Local or GenAI")
-                llm_model_name_tb = gr.Textbox(value=os.getenv("LLM_MODEL", "openai/gpt-oss-120b"), label="LLM Model Name", show_copy_button=True)  #.split('/')[1], label="LLM Model Name") "meta-llama/Llama-4-Maverick-17B-128E-Instruct")),  #image-Text-to-Text  #"openai/gpt-oss-120b",
-            with gr.Row():
-                with gr.Row():  #elem_classes="password-box"):
-                    #openai_key_tb = gr.Textbox(value=os.getenv("OPENAI_API_KEY", "jan-ai"), label="OpenAI API Key",
-                    #                           type="password", elem_classes="password-box", container=False, interactive=True, info="OpenAI API Key") #, show_copy_button=True)
-                    openai_key_tb = gr.Textbox(value=openai_api_key_init, label="OpenAI API Key",
-                                               type="password", elem_classes="password-box", container=False, interactive=True, info="OpenAI API Key") #, show_copy_button=True)
-                    toggle_btn_openai_key = gr.Button(
-                                value="👁️",  # Initial eye icon
-                                elem_classes="icon-button", size="sm")  #, min_width=50)
-                openai_baseurl_tb = gr.Textbox(value=os.getenv("OPENAI_API_BASE", "https://router.huggingface.co/v1"), label="OpenAI baseurl", show_copy_button=True)
-                ollama_host_tb = gr.Textbox(value=os.getenv("OLLAMA_HOST", "http://localhost:1234/v1"), label="Ollama Host", show_copy_button=True)
-                #ollama_host_tb = gr.Textbox(value=os.getenv("OPENAI_API_EMBED_BASE", ""), label="Ollama Host")
-            with gr.Row():
-                embed_backend_dd = gr.Dropdown(choices=["Transformer", "Provider"], value="Provider", label="Embedding Type")
-                openai_baseurl_embed_tb = gr.Textbox(value=os.getenv("OPENAI_API_EMBED_BASE", "http://localhost:1234/v1"), label="LLM Embed baseurl", show_copy_button=True)
-                llm_model_embed_tb = gr.Textbox(value=os.getenv("LLM_MODEL_EMBED","text-embedding-bge-m3"), label="LLM Embedding Model", show_copy_button=True) #.split('/')[1], label="Embedding Model")
-                with gr.Row():  #elem_classes="password-box"):
-                    openai_key_embed_tb = gr.Textbox(value=os.getenv("OPENAI_API_KEY_EMBED", "jan-ai"), label="LLM API Key Embed",   #lm-studio
-                                               type="password", elem_classes="password-box", container=False, interactive=True, info="LLM API Key Embed") #, show_copy_button=True)
-                    toggle_btn_openai_key_embed = gr.Button(
-                                value="👁️",  # Initial eye icon
-                                elem_classes="icon-button", size="sm")  #, min_width=50)
-                #openai_key_embed_tb = gr.Textbox(value=os.getenv("OPENAI_API_KEY_EMBED", "jan-ai"), label="OpenAI API Key Embed", type="password", show_copy_button=True)  #("OLLAMA_API_KEY", ""), label="OpenAI API Key Embed", type="password")
         # Step 1: Section 2
         with gr.Accordion("🤗 HuggingFace Client Control", open=True):  #, open=False):
             # HuggingFace controls
             hf_login_logout_btn = gr.LoginButton(value="Sign in to HuggingFace 🤗", logout_value="Logout of HF: ({}) 🤗", variant="huggingface")
@@ -113,7 +141,7 @@ def gradio_ui(app_logic: LightRAGApp):
         gr.HTML("<hr>")   #gr.Markdown("---")
         with gr.Row():
-            index_btn = gr.Button("Index Documents")
             stop_btn = gr.Button("Stop", variant="stop")  ## Add cancel event button
             query_text_tb = gr.Textbox(label="Your Query")
             mode_dd = gr.Dropdown(["naive", "local", "global", "hybrid", "mix"], value="hybrid", label="Query Mode")
@@ -135,6 +163,7 @@ def gradio_ui(app_logic: LightRAGApp):
         st_openai_key = gr.State(value=openai_api_key_init)    #gr.State("")
         st_password1 = gr.State(value="password")
         st_password2 = gr.State(value="password")
         ### Change handling
@@ -217,9 +246,9 @@ def gradio_ui(app_logic: LightRAGApp):
         # Button logic with async handling
-        async def setup_wrapper(df, wd, llm_back, embed_back, oai, base, base_embed, model, model_embed, host, embedkey):
-            return await app_logic.setup(df, wd, llm_back, embed_back, oai,
-                                         base, base_embed, model, model_embed, host, embedkey)
         async def index_wrapper(df):
             return await app_logic.index_documents(df)
@@ -241,6 +270,13 @@ def gradio_ui(app_logic: LightRAGApp):
         #hf_login_logout_btn.click(update_state_stored_value, inputs=openai_key_tb, outputs=st_openai_key)
         hf_login_logout_btn.click(fn=custom_do_logout, inputs=openai_key_tb, outputs=[hf_login_logout_btn, st_openai_key])
         toggle_btn_openai_key.click(
             fn=toggle_password,
             inputs=[st_password1],
@@ -253,10 +289,14 @@ def gradio_ui(app_logic: LightRAGApp):
             outputs=[openai_key_embed_tb, toggle_btn_openai_key_embed, st_password2],
             show_progress="hidden"
             )
-        inputs_arg = [data_folder_tb, working_dir_tb, llm_backend_cb, embed_backend_dd, st_openai_key, #openai_key_tb,
                       openai_baseurl_tb, openai_baseurl_embed_tb, llm_model_name_tb, llm_model_embed_tb,
-                      ollama_host_tb, openai_key_embed_tb]
         setup_btn.click(
             fn=setup_wrapper,
@@ -267,7 +307,7 @@ def gradio_ui(app_logic: LightRAGApp):
             )
         index_btn.click(
             fn=index_wrapper,
-            inputs=[data_folder_tb],
             outputs=[status_box, progress_tb],
             show_progress=True
             )

 #from watchfiles import run_process  ##gradio reload watch
 from app_gradio_lightrag import LightRAGApp  ##SMY lightrag logging
 from utils.llm_login import get_login_token
+from utils.file_utils import accumulate_dir
 import asyncio
 import nest_asyncio
 # Load environment variables
 load_dotenv()
+'''
 # Pythonic error handling decorator
 def handle_errors(func):
     def wrapper(*args, **kwargs):
         except Exception as e:
             return gr.update(value=f"Error: {e}")
     return wrapper
+'''
 # Instantiate app logic
 #app_logic = LightRAGApp()  ## See main()
         """)
         # Step 0: Section 1
+        # Define ext type (in lieu of getting from global var)
+        #ext = (".md", "md")        #SMY disused: 'tuple' object has no attribute '_id'
         # Define openai_api textbox initial value
         openai_api_key_init = os.getenv("OPENAI_API_KEY", "jan-ai")
+        with gr.Sidebar(position="right"):
+            system_prompt_tb = gr.Textbox(
+                value="You are a helpful assistant. You answer questions based on the provided context.",    # If you don't know the answer, just say so. Don't make up information.",
+                label="System Prompt",
+                lines=3,
+                interactive=True,
+                show_copy_button=True,
+            )
+            with gr.Accordion(label="🛞 LLM settings", open=False):
+                with gr.Row():
+                    llm_backend_cb = gr.Radio(["OpenAI", "Ollama", "GenAI"], value="OpenAI", label="LLM Backend: OpenAI, Local or GenAI")
+                    llm_model_name_tb = gr.Textbox(value=os.getenv("LLM_MODEL", "openai/gpt-oss-120b"), label="LLM Model Name", show_copy_button=True)  #.split('/')[1], label="LLM Model Name") "meta-llama/Llama-4-Maverick-17B-128E-Instruct")),  #image-Text-to-Text  #"openai/gpt-oss-120b",
+                with gr.Row():
+                    with gr.Row():  #elem_classes="password-box"):
+                        #openai_key_tb = gr.Textbox(value=os.getenv("OPENAI_API_KEY", "jan-ai"), label="OpenAI API Key",
+                        #                           type="password", elem_classes="password-box", container=False, interactive=True, info="OpenAI API Key") #, show_copy_button=True)
+                        openai_key_tb = gr.Textbox(value=openai_api_key_init, label="OpenAI API Key",
+                                                type="password", elem_classes="password-box", container=False, interactive=True, info="OpenAI API Key") #, show_copy_button=True)
+                        toggle_btn_openai_key = gr.Button(
+                                    value="👁️",  # Initial eye icon
+                                    elem_classes="icon-button", size="sm")  #, min_width=50)
+                with gr.Row():
+                    openai_baseurl_tb = gr.Textbox(value=os.getenv("OPENAI_API_BASE", "https://router.huggingface.co/v1"), label="OpenAI baseurl", show_copy_button=True)
+                    ollama_host_tb = gr.Textbox(value=os.getenv("OLLAMA_HOST", "http://localhost:1234/v1"), label="Ollama Host", show_copy_button=True)
+                    #ollama_host_tb = gr.Textbox(value=os.getenv("OPENAI_API_EMBED_BASE", ""), label="Ollama Host")
+                with gr.Row():
+                    openai_baseurl_embed_tb = gr.Textbox(value=os.getenv("OPENAI_API_EMBED_BASE", "http://localhost:1234/v1"), label="LLM Embed baseurl", show_copy_button=True)
+                    llm_model_embed_tb = gr.Textbox(value=os.getenv("LLM_MODEL_EMBED","text-embedding-bge-m3"), label="LLM Embedding Model", show_copy_button=True) #.split('/')[1], label="Embedding Model")
+                with gr.Row():
+                    embed_backend_dd = gr.Dropdown(choices=["Transformer", "Provider"], value="Provider", label="Embedding Type")
+                    with gr.Row():  #elem_classes="password-box"):
+                        openai_key_embed_tb = gr.Textbox(value=os.getenv("OPENAI_API_KEY_EMBED", "jan-ai"), label="LLM API Key Embed",   #lm-studio
+                                                type="password", elem_classes="password-box", container=False, interactive=True, info="LLM API Key Embed") #, show_copy_button=True)
+                        toggle_btn_openai_key_embed = gr.Button(
+                                    value="👁️",  # Initial eye icon
+                                    elem_classes="icon-button", size="sm")  #, min_width=50)
+                    #openai_key_embed_tb = gr.Textbox(value=os.getenv("OPENAI_API_KEY_EMBED", "jan-ai"), label="OpenAI API Key Embed", type="password", show_copy_button=True)  #("OLLAMA_API_KEY", ""), label="OpenAI API Key Embed", type="password")
         # Step 1: Section 2
+        with gr.Row():
+                with gr.Column():
+                    #data_folder_tb = gr.Textbox(value="dataset/data/docs2", label="Data Folder (markdown only)", show_copy_button=True)
+                    dir_btn = gr.UploadButton(
+                        #value='dataset/data/',      #docs2     #[Errno 13] Permission denied
+                        label="📁 Upload Folder",
+                        #file_types=ext,           #["file"],
+                        file_count="directory",
+                    )
+                    upload_count_md = gr.Markdown(visible=False)
+                working_dir_tb = gr.Textbox(value="./working_folder1", label="lightRAG working folder", show_copy_button=True)
+                working_dir_reset_cb = gr.Checkbox(value=False, label="Reset working files?")
         with gr.Accordion("🤗 HuggingFace Client Control", open=True):  #, open=False):
             # HuggingFace controls
             hf_login_logout_btn = gr.LoginButton(value="Sign in to HuggingFace 🤗", logout_value="Logout of HF: ({}) 🤗", variant="huggingface")
         gr.HTML("<hr>")   #gr.Markdown("---")
         with gr.Row():
+            index_btn = gr.Button("Index Documents", interactive=False)
             stop_btn = gr.Button("Stop", variant="stop")  ## Add cancel event button
             query_text_tb = gr.Textbox(label="Your Query")
             mode_dd = gr.Dropdown(["naive", "local", "global", "hybrid", "mix"], value="hybrid", label="Query Mode")
         st_openai_key = gr.State(value=openai_api_key_init)    #gr.State("")
         st_password1 = gr.State(value="password")
         st_password2 = gr.State(value="password")
+        state_uploaded_file_list = gr.State(value=[])
         ### Change handling
         # Button logic with async handling
+        async def setup_wrapper(df, wd, wd_reset, llm_back, embed_back, oai, base, base_embed, model, model_embed, host, embedkey, sys_prompt):
+            return await app_logic.setup(df, wd, wd_reset, llm_back, embed_back, oai,
+                                         base, base_embed, model, model_embed, host, embedkey, sys_prompt)
         async def index_wrapper(df):
             return await app_logic.index_documents(df)
         #hf_login_logout_btn.click(update_state_stored_value, inputs=openai_key_tb, outputs=st_openai_key)
         hf_login_logout_btn.click(fn=custom_do_logout, inputs=openai_key_tb, outputs=[hf_login_logout_btn, st_openai_key])
+        dir_btn.upload(
+            fn=accumulate_dir,
+            inputs=[dir_btn, state_uploaded_file_list],
+            outputs=[state_uploaded_file_list, index_btn, upload_count_md, status_box],
+            show_progress="hidden"
+            )
         toggle_btn_openai_key.click(
             fn=toggle_password,
             inputs=[st_password1],
             outputs=[openai_key_embed_tb, toggle_btn_openai_key_embed, st_password2],
             show_progress="hidden"
             )
+        '''
+        async def setup(self, data_folder: str, working_dir: str, wdir_reset: bool, llm_backend: str, embed_backend: str, openai_key: str,
+             openai_baseurl: str, openai_baseurl_embed: str, llm_model_name: str, llm_model_embed: str,
+             ollama_host: str, embed_key: str, system_prompt: str) -> str:
+        '''
+        inputs_arg = [state_uploaded_file_list, working_dir_tb, working_dir_reset_cb, llm_backend_cb, embed_backend_dd, st_openai_key, #openai_key_tb,
                       openai_baseurl_tb, openai_baseurl_embed_tb, llm_model_name_tb, llm_model_embed_tb,
+                      ollama_host_tb, openai_key_embed_tb, system_prompt_tb]      #data_folder_tb,
         setup_btn.click(
             fn=setup_wrapper,
             )
         index_btn.click(
             fn=index_wrapper,
+            inputs=state_uploaded_file_list,     #[data_folder_tb],
             outputs=[status_box, progress_tb],
             show_progress=True
             )

app_gradio_lightrag.py CHANGED Viewed

@@ -10,7 +10,7 @@ import random
 from functools import partial
 from typing import Tuple, Optional, Any, List, Union
-import inspect  ##SMY lightrag_openai_compatible_demo.py
 def install(package):
     import subprocess
@@ -86,9 +86,7 @@ def configure_logging():
     # Get log directory path from environment variable or use current directory
     #log_dir = os.getenv("LOG_DIR", os.getcwd())
     log_dir = os.getenv("LOG_DIR", "logs")
-    '''log_file_path = os.path.abspath(
-        os.path.join(log_dir, "lightrag_compatible_demo.log")
-                )'''
     if log_dir:
         log_file_path = Path(log_dir) / "lightrag_logs.log"
     else:
@@ -165,15 +163,25 @@ def visualise_graphml(graphml_path: str, working_dir: str) -> str:
     ## Load the GraphML file
     G = nx.read_graphml(graphml_path)
     ## Create a Pyvis network
     #net = Network(height="100vh", notebook=True)
-    net = Network(notebook=True, width="100%", height="600px")  #, heading=f"Knowledge Graph Visualisation")  #(noteboot=False)
-    ## Convert NetworkX graph to Pyvis network
     net.from_nx(G)
     # Add colors and title to nodes
     for node in net.nodes:
-        node["color"] = "#{:06x}".format(random.randint(0, 0xFFFFFF))
         if "description" in node:
             node["title"] = node["description"]
@@ -184,22 +192,32 @@ def visualise_graphml(graphml_path: str, working_dir: str) -> str:
     ## Set the 'physics' attribute to repulsion
     net.repulsion(node_distance=120, spring_length=200)
-    net.show_buttons(filter_=['physics'])  ##SMY: dynamically modify the network
     #net.show_buttons()
     ## graph path
-    kg_viz_html_file = "kg_viz.html"
-    #html_path = os.path.join(working_dir, kg_viz_html_file)
     html_path = Path(working_dir) / kg_viz_html_file
-    #net.save_graph(html_path)
     ## Save and display the generated KG network html
     #net.show(html_path)
     net.show(str(html_path), local=True, notebook=False)
-    ##SMY read and display generated KG html
-    #with open(html_path, "r", encoding="utf-8") as f:
-    #    return f.read()  ## html
 # Utility: Get all markdown files in a folder
 def get_markdown_files(folder: str) -> list[str]:
@@ -242,6 +260,7 @@ class LightRAGApp:
         if custom_system_prompt:
             self.system_prompt = custom_system_prompt
         else:
             self.system_prompt = """
             You are a domain expert on Cybersecurity, the South Africa landscape and South African legislation.
@@ -269,8 +288,9 @@ class LightRAGApp:
                 - For instance, maintain a single node for Protection of Information Act, Protection of Information Act, 1982, Protection of Information Act No 84, 1982.
                 - However, have a separate node for Protection of Personal Information Act, 2013; as it it a separate legislation.
                 - Also take note that 'Republic of South Africa' is an offical geo entity while 'South Africa' is a referred to place, although also a geo entity:
-                - Always watch the context and becareful of lumping them together.
                 """
         return self.system_prompt
@@ -358,7 +378,7 @@ class LightRAGApp:
         logger.debug(f"Sending messages to Gemini: Model: {self.llm_model_name.rpartition('/')[-1]} \n~ Message: {prompt}")
         logger_kg.log(level=20, msg=f"Sending messages to Gemini: Model: {self.llm_model_name.rpartition('/')[-1]} \n~ Message: {prompt}")
-        # 2. Initialize the GenAI Client with Gemini API Key
         client = Client(api_key=self.llm_api_key)     #api_key=gemini_api_key
         #aclient = genai.Client(api_key=self.llm_api_key).aio  # use AsyncClient
@@ -454,14 +474,30 @@ class LightRAGApp:
     def _ensure_working_dir(self) -> str:
         """Ensure working directory exists and return status message"""
-        '''if not os.path.exists(self.working_dir):
-            os.makedirs(self.working_dir, exist_ok=True)
-            return f"Created working directory: {self.working_dir}"'''
         if not Path(self.working_dir).exists():
             check_create_dir(self.working_dir)
             return f"Created working directory: {self.working_dir}"
         return f"Working directory exists: {self.working_dir}"
     async def _initialise_storages(self) -> str:
     #def _initialise_storages(self) -> str:
@@ -487,31 +523,11 @@ class LightRAGApp:
         #print(f"_embedding_func: llm_api_key_embed: {self.llm_api_key_embed}")
         #print(f"_embedding_func: llm_baseurl_embed: {self.llm_baseurl_embed}")
-        # Clear old data files
-        #wrap_async(self._clear_old_data_files)
-        #await self._clear_old_data_files()
-        """Clear old data files"""
-        files_to_delete = [
-                    "graph_chunk_entity_relation.graphml",
-                    "kv_store_doc_status.json",
-                    "kv_store_full_docs.json",
-                    "kv_store_text_chunks.json",
-                    "vdb_chunks.json",
-                    "vdb_entities.json",
-                    "vdb_relationships.json",
-                ]
-        for file in files_to_delete:
-            '''file_path = os.path.join(self.working_dir, file)
-            if os.path.exists(file_path):
-                os.remove(file_path)
-                print(f"Deleting old file:: {file_path}")'''
-            file_path = Path(self.working_dir) / file
-            if file_path.exists():
-                file_path.unlink()
-                logger_kg.log(level=20, msg=f"LightRAG class: Deleting old files", extra={"filepath": file_path.name})
         # Get embedding
         if self.embed_backend == "Transformer" or self.embed_backend[0] == "Transformer":
             logger_kg.log(level=20, msg=f"Getting embeddings dynamically through _embedding_func: ",
@@ -549,7 +565,7 @@ class LightRAGApp:
             await self._initialise_storages()
             #await rag.initialize_storages()
-            #await initialize_pipeline_status()  ##SMY: still relevant in updated lightRAG? - """Asynchronously finalize the storages"""
             self.status = f"Storages and pipeline initialised successfully"  ##SMY: debug
             logger_kg.log(level=20, msg=f"Storages and pipeline initialised successfully")
@@ -561,9 +577,9 @@ class LightRAGApp:
     @handle_errors
     #def setup(self, data_folder: str, working_dir: str, llm_backend: str,
-    async def setup(self, data_folder: str, working_dir: str, llm_backend: str, embed_backend: str,
              openai_key: str, openai_baseurl: str, openai_baseurl_embed: str, llm_model_name: str,
-             llm_model_embed: str, ollama_host: str, embed_key: str) -> str:
         """Set up LightRAG with specified configuration"""
         # Configure environment
         #os.environ["OPENAI_API_KEY"] = openai_key or os.getenv("OPENAI_API_KEY", "")
@@ -573,8 +589,9 @@ class LightRAGApp:
         #os.environ["OPENAI_API_EMBED_BASE"] = openai_baseurl_embed or os.getenv("OPENAI_API_EMBED_BASE")  #, "http://localhost:1234/v1/embeddings")
         # Update instance state
-        self.data_folder = data_folder
         self.working_dir = working_dir
         self.llm_backend = llm_backend
         self.embed_backend = embed_backend if isinstance(embed_backend, str) else embed_backend[0],
         self.llm_model_name = llm_model_name
@@ -592,7 +609,7 @@ class LightRAGApp:
             except Exception as e:
                 self.status = f"LightRAG initialisation.setup: working dir err | {str(e)}"
-            # Initialize lightRAG with storages
             try:
                 #self.rag = wrap_async( self._initialise_rag)
                 self.rag = await self._initialise_rag()
@@ -628,15 +645,18 @@ class LightRAGApp:
     '''
     @handle_errors
-    async def index_documents(self, data_folder: str) -> Tuple[str, str]:
     #def index_documents(self, data_folder: str) -> Tuple[str, str]:
         """Index markdown documents with progress tracking"""
         if not self._is_initialised or self.rag is None:
             return "Please initialise LightRAG first using the 'Initialise App' button.", "Not started"
-        md_files = get_markdown_files(data_folder)
         if not md_files:
-            return f"No markdown files found in {data_folder}:", "No files"
         try:
             total_files = len(md_files)
@@ -739,9 +759,7 @@ class LightRAGApp:
         """Display knowledge graph visualisation"""
         ## graphml_path: defaults to lightRAG's generated graph_chunk_entity_relation.graphml
         ## working_dir: lightRAG's working directory set by user
-        '''graphml_path = os.path.join(self.working_dir, "graph_chunk_entity_relation.graphml")
-        if not os.path.exists(graphml_path):
-            return "Knowledge graph file not found. Please index documents first to generate Knowledge Graph."'''
         graphml_path = Path(self.working_dir) / "graph_chunk_entity_relation.graphml"
         if not Path(graphml_path).exists():
             return "Knowledge graph file not found. Please index documents first to generate Knowledge Graph."
@@ -759,59 +777,6 @@ class LightRAGApp:
 ############
-'''
-    ##SMY: //TODO: Gradio toggle button
-    def _clear_old_data_files(self):
-        """Clear old data files"""
-        files_to_delete = [
-                    "graph_chunk_entity_relation.graphml",
-                    "kv_store_doc_status.json",
-                    "kv_store_full_docs.json",
-                    "kv_store_text_chunks.json",
-                    "vdb_chunks.json",
-                    "vdb_entities.json",
-                    "vdb_relationships.json",
-                ]
-        for file in files_to_delete:
-            file_path = Path(self.working_dir) / file
-            if file_path.exists():
-                file_path.unlink()
-                logger_kg.log(level=20, msg=f"LightRAG class: Deleting old files", extra={"filepath": file_path.name})'''
-'''
-    async def _get_llm_functions(self) -> Tuple[callable, callable]:
-    #def _get_llm_functions(self) -> Tuple[callable, callable]:
-        """Get LLM and embedding functions based on backend"""
-        try:
-            # Get embedding dimension dynamically
-            try:
-                embedding_dimension = await self._get_embedding_dim()
-                self.status = f"Using embedding dimension: {embedding_dimension}"
-                logger_kg.log(level=20, msg=f"Using embedding dimension: {embedding_dimension}")
-            except Exception as e:
-                # feedback dimensions error
-                self.status = f"_get_llm_function: embedding_dim error with fallback: {str(e)}"
-            # Create embedding function wrapper: # Wrap with EmbeddingFunc to provide required attributes
-            embed_func = EmbeddingFunc(
-                embedding_dim=embedding_dimension,
-                max_token_size=8192,  #4096,  #8192,  # Conservative default | #ollama
-                func=self._embedding_func
-            )
-            # Get LLM function
-            #llm_func = await self._llm_model_func  ##SMY: not used
-            # return LLM and embed functions
-            #return llm_func, embed_func
-            return await self._llm_model_func(), embed_func
-        except Exception as e:
-            self.status = f"{self.status} \n| _get_llm_functions error: {str(e)}"
-            logger_kg.log(level=30, msg=f"{self.status} \n| _get_llm_functions error: {str(e)}")
-            raise  # Re-raise to be caught by the setup method
-    '''
 '''
     ##SMY: record only. for deletion

 from functools import partial
 from typing import Tuple, Optional, Any, List, Union
+from utils.utils import get_time_now_str  ##SMY lightrag_openai_compatible_demo.py
 def install(package):
     import subprocess
     # Get log directory path from environment variable or use current directory
     #log_dir = os.getenv("LOG_DIR", os.getcwd())
     log_dir = os.getenv("LOG_DIR", "logs")
     if log_dir:
         log_file_path = Path(log_dir) / "lightrag_logs.log"
     else:
     ## Load the GraphML file
     G = nx.read_graphml(graphml_path)
+    ## Dynamically size nodes
+    # Calculate note attributes for sizing
+    node_degrees = dict(G.degree())
+    # Scale node degrees for better visual differentiation
+    max_degree = max(node_degrees.values())
+    for node, degree in node_degrees.items():
+        G.nodes[node]['size'] = 10 + (degree / max_degree) * 80   #40  # scaling
     ## Create a Pyvis network
     #net = Network(height="100vh", notebook=True)
+    net = Network(notebook=True, width="100%", height="100vh")      #, heading=f"Knowledge Graph Visualisation")  #(noteboot=False) height="600px",
+    # Convert NetworkX graph to Pyvis network
     net.from_nx(G)
     # Add colors and title to nodes
     for node in net.nodes:
+        #node["color"] = "#{:06x}".format(random.randint(0, 0xFFFFFF))
+        node["color"] = "#{:01x}".format(random.randint(0, 0xFFFFFF))
         if "description" in node:
             node["title"] = node["description"]
     ## Set the 'physics' attribute to repulsion
     net.repulsion(node_distance=120, spring_length=200)
+    net.show_buttons(filter_=['physics', 'layout'])  ##SMY: dynamically modify the network
     #net.show_buttons()
     ## graph path
+    kg_viz_html_file = f"kg_viz_{get_time_now_str(date_format='%Y-%m-%d')}.html"
     html_path = Path(working_dir) / kg_viz_html_file
     ## Save and display the generated KG network html
+    #net.save_graph(html_path)
     #net.show(html_path)
     net.show(str(html_path), local=True, notebook=False)
+    # get HTML content
+    html_iframe = net.generate_html(str(html_path), local=True, notebook=False)
+    ## need to remove ' from HTML     ##assist: https://huggingface.co/spaces/simonduerr/pyvisdemo/blob/main/app.py
+    html_iframe = html_iframe.replace("'", "\"")
+    ##SMY display generated KG html
+    #'''
+    return gr.update(show_label=True, container=True, value=f"""<iframe style="width: 100%; height: 100vh;margin:0 auto" name="result" allow="midi; geolocation; microphone; camera;
+        display-capture; encrypted-media;" sandbox="allow-modals allow-forms
+        allow-scripts allow-same-origin allow-popups
+        allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
+        allowpaymentrequest="" frameborder="0" srcdoc='{html_iframe}'></iframe>"""
+    )
+    #'''
 # Utility: Get all markdown files in a folder
 def get_markdown_files(folder: str) -> list[str]:
         if custom_system_prompt:
             self.system_prompt = custom_system_prompt
+        '''     ## system_prompt now in gradio ui
         else:
             self.system_prompt = """
             You are a domain expert on Cybersecurity, the South Africa landscape and South African legislation.
                 - For instance, maintain a single node for Protection of Information Act, Protection of Information Act, 1982, Protection of Information Act No 84, 1982.
                 - However, have a separate node for Protection of Personal Information Act, 2013; as it it a separate legislation.
                 - Also take note that 'Republic of South Africa' is an offical geo entity while 'South Africa' is a referred to place, although also a geo entity:
+                - Always watch the context and be careful of lumping them together.
                 """
+            '''
         return self.system_prompt
         logger.debug(f"Sending messages to Gemini: Model: {self.llm_model_name.rpartition('/')[-1]} \n~ Message: {prompt}")
         logger_kg.log(level=20, msg=f"Sending messages to Gemini: Model: {self.llm_model_name.rpartition('/')[-1]} \n~ Message: {prompt}")
+        # 2. Initialise the GenAI Client with Gemini API Key
         client = Client(api_key=self.llm_api_key)     #api_key=gemini_api_key
         #aclient = genai.Client(api_key=self.llm_api_key).aio  # use AsyncClient
     def _ensure_working_dir(self) -> str:
         """Ensure working directory exists and return status message"""
         if not Path(self.working_dir).exists():
             check_create_dir(self.working_dir)
             return f"Created working directory: {self.working_dir}"
         return f"Working directory exists: {self.working_dir}"
+    ##SMY: //TODO: Gradio toggle button
+    async def _clear_old_data_files(self):
+        """Clear old data files"""
+        files_to_delete = [
+                    "graph_chunk_entity_relation.graphml",
+                    "kv_store_doc_status.json",
+                    "kv_store_full_docs.json",
+                    "kv_store_text_chunks.json",
+                    "vdb_chunks.json",
+                    "vdb_entities.json",
+                    "vdb_relationships.json",
+                ]
+        for file in files_to_delete:
+            file_path = Path(self.working_dir) / file
+            if file_path.exists():
+                file_path.unlink()
+                logger_kg.log(level=20, msg=f"LightRAG class: Deleting old files", extra={"filepath": file_path.name})
     async def _initialise_storages(self) -> str:
     #def _initialise_storages(self) -> str:
         #print(f"_embedding_func: llm_api_key_embed: {self.llm_api_key_embed}")
         #print(f"_embedding_func: llm_baseurl_embed: {self.llm_baseurl_embed}")
+        if self.working_dir_reset:
+            # Clear old data files
+            await self._clear_old_data_files()
         # Get embedding
         if self.embed_backend == "Transformer" or self.embed_backend[0] == "Transformer":
             logger_kg.log(level=20, msg=f"Getting embeddings dynamically through _embedding_func: ",
             await self._initialise_storages()
             #await rag.initialize_storages()
+            #await initialize_pipeline_status()  ##SMY: still relevant in updated lightRAG? - """Asynchronously finalise the storages"""
             self.status = f"Storages and pipeline initialised successfully"  ##SMY: debug
             logger_kg.log(level=20, msg=f"Storages and pipeline initialised successfully")
     @handle_errors
     #def setup(self, data_folder: str, working_dir: str, llm_backend: str,
+    async def setup(self, data_folder: str, working_dir: str, wdir_reset: bool, llm_backend: str, embed_backend: str,
              openai_key: str, openai_baseurl: str, openai_baseurl_embed: str, llm_model_name: str,
+             llm_model_embed: str, ollama_host: str, embed_key: str, system_prompt: str) -> str:
         """Set up LightRAG with specified configuration"""
         # Configure environment
         #os.environ["OPENAI_API_KEY"] = openai_key or os.getenv("OPENAI_API_KEY", "")
         #os.environ["OPENAI_API_EMBED_BASE"] = openai_baseurl_embed or os.getenv("OPENAI_API_EMBED_BASE")  #, "http://localhost:1234/v1/embeddings")
         # Update instance state
+        self.data_folder = data_folder      ##SMY: redundant
         self.working_dir = working_dir
+        self.working_dir_reset = wdir_reset
         self.llm_backend = llm_backend
         self.embed_backend = embed_backend if isinstance(embed_backend, str) else embed_backend[0],
         self.llm_model_name = llm_model_name
             except Exception as e:
                 self.status = f"LightRAG initialisation.setup: working dir err | {str(e)}"
+            # Initialise lightRAG with storages
             try:
                 #self.rag = wrap_async( self._initialise_rag)
                 self.rag = await self._initialise_rag()
     '''
     @handle_errors
+    async def index_documents(self, data_folder: Union[list[str], str]) -> Tuple[str, str]:
     #def index_documents(self, data_folder: str) -> Tuple[str, str]:
         """Index markdown documents with progress tracking"""
         if not self._is_initialised or self.rag is None:
             return "Please initialise LightRAG first using the 'Initialise App' button.", "Not started"
+        #md_files = get_markdown_files(data_folder)     #data_folder is now list of ploaded files
+        #if not md_files:
+        #    return f"No markdown files found in {data_folder}:", "No files"
+        md_files = data_folder
         if not md_files:
+            return f"No markdown files uploaded {data_folder}:", "No files"
         try:
             total_files = len(md_files)
         """Display knowledge graph visualisation"""
         ## graphml_path: defaults to lightRAG's generated graph_chunk_entity_relation.graphml
         ## working_dir: lightRAG's working directory set by user
         graphml_path = Path(self.working_dir) / "graph_chunk_entity_relation.graphml"
         if not Path(graphml_path).exists():
             return "Knowledge graph file not found. Please index documents first to generate Knowledge Graph."
 ############
 '''
     ##SMY: record only. for deletion

utils/file_utils.py CHANGED Viewed

@@ -131,6 +131,40 @@ def create_temp_folder(tempfolder: Optional[str | Path] = '', program_name: str
     return output_dir
 ##=========
 def find_file(file_name: str) -> Path:  #configparser.ConfigParser:
@@ -194,6 +228,8 @@ def resolve_grandparent_object(gp_object:str):
     ###
     # Create a Path object based on current file's location, resolve it to an absolute path,
     # and then get its parent's parent using chained .parent calls or the parents[] attribute.
     # 1. Get the current script's path, its parent and its grandparent directory
     try:
@@ -339,7 +375,7 @@ def accumulate_files(uploaded_files, current_state):
     from globals_config import config_load
     import gradio as gr
-    # Initialize state if it's the first run
     if current_state is None:
         current_state = []

     return output_dir
+def accumulate_dir(uploaded_files, current_state, ext: Union[str, tuple] = (".md", "md")):
+    """ accumulate uploaded files in dir based on ext with the existing state """
+    import gradio as gr
+    # Initialise state if it's the first run
+    if current_state is None:
+        current_state = []
+    # Check if files were uploaded in the current iteration, return the current state.
+    if not uploaded_files:
+        return current_state, gr.update(), gr.update(visible=True, value="No new files uploaded"), gr.update(value="No new files uploaded")
+    # call is_file_with_extension to check if pathlib.Path object is a file and has a non-empty extension
+    #new_file_paths = [f.name for f in uploaded_files if is_file_with_extension(Path(f.name))]  #Path(f.name) and Path(f.name).is_file() and bool(Path(f.name).suffix)]  #Path(f.name).suffix.lower() !=""]
+    new_file_paths = [f.name for f in uploaded_files if is_file_with_extension(Path(f.name)) and f.name.endswith(ext)]
+    # Concatenate the new files with the existing ones in the state
+    updated_files = current_state + new_file_paths
+    updated_filenames = [Path(f).name for f in updated_files]      ##SMY: filenames only
+    updated_files_count = len(updated_files)
+    # Return the updated state and a message to the user
+    filename_info = "\n".join(updated_filenames)    ##SMY: not used(updated_filenames)
+    #message = f"Accumulated {len(updated_files)} file(s) total: \n{filename_info}"
+    message_count = f"Accumulated {updated_files_count} file(s) total."
+    message = f"Accumulated {updated_files_count} file(s) total: \n{filename_info}"
+    #outputs=[state_uploaded_file_list, dir_btn, upload_count_md, status_box],
+    #return updated_files, updated_files_count, message, gr.update(interactive=True), gr.update(interactive=True)
+    return updated_files, gr.update(interactive=True,), gr.update(visible=True, value=message_count), gr.update(value=message)
 ##=========
 def find_file(file_name: str) -> Path:  #configparser.ConfigParser:
     ###
     # Create a Path object based on current file's location, resolve it to an absolute path,
     # and then get its parent's parent using chained .parent calls or the parents[] attribute.
+    #import sys
     # 1. Get the current script's path, its parent and its grandparent directory
     try:
     from globals_config import config_load
     import gradio as gr
+    # Initialise state if it's the first run
     if current_state is None:
         current_state = []

utils/llm_login.py CHANGED Viewed

@@ -29,7 +29,7 @@ def get_login_token( api_token_arg, oauth_token):
 def login_huggingface(token: Optional[str] = None):
     """
-    Login to Hugging Face account. Prioritize CLI login for privacy and determinism.
     Attempts to log in to Hugging Face Hub.
     First, it tries to log in interactively via the Hugging Face CLI.

 def login_huggingface(token: Optional[str] = None):
     """
+    Login to Hugging Face account. Prioritise CLI login for privacy and determinism.
     Attempts to log in to Hugging Face Hub.
     First, it tries to log in interactively via the Hugging Face CLI.