Spaces:

Em4e
/

chunk-based-text-editor

Sleeping

App Files Files Community

Em4e commited on Jun 9, 2025

Commit

7926d85

verified ·

1 Parent(s): 223e4c3

Update app.py

Browse files

Files changed (1) hide show

app.py +22 -23

app.py CHANGED Viewed

@@ -213,29 +213,28 @@ st.caption("A tool to fetch, chunk, and refine web content.")
 st.markdown(
     "Developed by [Emilija Gjorgjevska](https://www.linkedin.com/in/emilijagjorgjevska/). "
     "Inspired by Andrea Volpini's [work on content chunking](https://www.linkedin.com/pulse/understanding-chunking-google-ai-mode-practical-content-volpini-zseaf/)")
-st.info(
-    """
-    • **App version:** v0.0 (alpha) — this is the very first public release, so you may run into bugs or incomplete features.
-    • **Server policy warning:** this app relies on automated requests (“bots”) under the hood.
-      If the target server enforces a restrictive bot policy (e.g., rate-limits requests, blocks unknown user-agents or IP addresses), parts of the app **may not work** as expected.
-    **What to do if you hit an issue:**
-    1. Check the server’s logs or policy settings to ensure it allows automated clients.
-    2. Keep an eye out for updates — v0.x → v1.0 is coming soon!
-    ---
-    **How Layout-Based Chunking is Implemented Here**
-    This app uses a sophisticated, two-step process to create meaningful chunks based on the document’s visual and semantic structure:
-    1. **Structural Preservation (HTML → Markdown):**
-       Converts the webpage’s HTML into Markdown, translating tags (`<h1>`, `<p>`, `<ul>`) into their Markdown equivalents (`#`, paragraph breaks, `*`) to preserve layout and hierarchy.
-    2. **Layout-Aware Parsing (`MarkdownNodeParser`):**
-       Uses LlamaIndex’s `MarkdownNodeParser` to read the structured Markdown and split it at logical boundaries (headers like `#`, `##`, etc.), yielding context-aware chunks that respect original sections.
-    _Note: Some websites may block content scraping. This is an early version, so you might encounter bugs._
-    """,
-    icon="ℹ️"
-)
 url_input = st.text_input("Enter a webpage URL to start", key="url_input")

 st.markdown(
     "Developed by [Emilija Gjorgjevska](https://www.linkedin.com/in/emilijagjorgjevska/). "
     "Inspired by Andrea Volpini's [work on content chunking](https://www.linkedin.com/pulse/understanding-chunking-google-ai-mode-practical-content-volpini-zseaf/)")
+with st.expander("ℹ️ App Information & Chunking Details", expanded=False):
+    st.info(
+        """
+        • **App version:** v0.0 (alpha) — this is the very first public release, so you may run into bugs or incomplete features.
+        • **Server policy warning:** this app relies on automated requests (“bots”) under the hood.
+          If the target server enforces a restrictive bot policy (e.g., rate-limits requests, blocks unknown user-agents or IP addresses), parts of the app **may not work** as expected.
+        **What to do if you hit an issue:**
+        1. Check the server’s logs or policy settings to ensure it allows automated clients.
+        2. Keep an eye out for updates — v0.x → v1.0 is coming soon!
+        ---
+        **How Layout-Based Chunking is Implemented Here**
+        This app uses a sophisticated, two-step process to create meaningful chunks based on the document’s visual and semantic structure:
+        1. **Structural Preservation (HTML → Markdown):**
+           Converts the webpage’s HTML into Markdown, translating tags (`<h1>`, `<p>`, `<ul>`) into their Markdown equivalents (`#`, paragraph breaks, `*`) to preserve layout and hierarchy.
+        2. **Layout-Aware Parsing (`MarkdownNodeParser`):**
+           Uses LlamaIndex’s `MarkdownNodeParser` to read the structured Markdown and split it at logical boundaries (headers like `#`, `##`, etc.), yielding context-aware chunks that respect original sections.
+        _Note: Some websites may block content scraping. This is an early version, so you might encounter bugs._
+        """
+    , icon="ℹ️")
 url_input = st.text_input("Enter a webpage URL to start", key="url_input")