Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
|
@@ -199,29 +199,38 @@ processor = st.session_state.processor
|
|
| 199 |
manager = st.session_state.manager
|
| 200 |
|
| 201 |
st.title("Chunk Webpage Content Editor")
|
| 202 |
-
st.caption("A tool to fetch, chunk, and refine web content.")
|
| 203 |
st.markdown(
|
| 204 |
"Developed by [Emilija Gjorgjevska](https://www.linkedin.com/in/emilijagjorgjevska/). "
|
| 205 |
"Inspired by Andrea Volpini's [work on content chunking](https://www.linkedin.com/pulse/understanding-chunking-google-ai-mode-practical-content-volpini-zseaf/)")
|
| 206 |
|
| 207 |
-
|
|
|
|
| 208 |
st.info(
|
| 209 |
"""
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
**
|
| 214 |
-
|
| 215 |
-
2. Keep an eye out for updates — v0.x → v1.0 is coming soon!
|
| 216 |
---
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
|
| 226 |
url_input = st.text_input("Enter a webpage URL to start", key="url_input")
|
| 227 |
|
|
@@ -276,7 +285,7 @@ with tab1:
|
|
| 276 |
selected_chunk = manager.get_chunk_by_id(st.session_state.selected_chunk_id)
|
| 277 |
|
| 278 |
if selected_chunk:
|
| 279 |
-
# ---
|
| 280 |
editor_col, preview_col = st.columns(2)
|
| 281 |
|
| 282 |
with editor_col:
|
|
|
|
| 199 |
manager = st.session_state.manager
|
| 200 |
|
| 201 |
st.title("Chunk Webpage Content Editor")
|
| 202 |
+
st.caption("A tool to fetch, chunk, and refine web content for AI synthesis.")
|
| 203 |
st.markdown(
|
| 204 |
"Developed by [Emilija Gjorgjevska](https://www.linkedin.com/in/emilijagjorgjevska/). "
|
| 205 |
"Inspired by Andrea Volpini's [work on content chunking](https://www.linkedin.com/pulse/understanding-chunking-google-ai-mode-practical-content-volpini-zseaf/)")
|
| 206 |
|
| 207 |
+
# --- MODIFIED: Added concise guidelines to the expander ---
|
| 208 |
+
with st.expander("ℹ️ App Information & AI Writing Guidelines", expanded=False):
|
| 209 |
st.info(
|
| 210 |
"""
|
| 211 |
+
### How Layout-Based Chunking is Implemented Here
|
| 212 |
+
This app uses a two-step process to create meaningful chunks based on a document’s structure:
|
| 213 |
+
1. **Structural Preservation (HTML → Markdown):** It converts a webpage’s HTML into Markdown, preserving the original layout and hierarchy (e.g., `<h1>` becomes `#`).
|
| 214 |
+
2. **Layout-Aware Parsing (`MarkdownNodeParser`):** It then uses LlamaIndex’s `MarkdownNodeParser` to split the Markdown at logical boundaries (like headers), yielding context-aware chunks that respect the original sections.
|
| 215 |
+
|
|
|
|
| 216 |
---
|
| 217 |
+
|
| 218 |
+
### Writing for AI Verifiability: A Quick Guide
|
| 219 |
+
To ensure your content is selected and cited by AI, focus on making each chunk clear, coherent, and verifiable.
|
| 220 |
+
|
| 221 |
+
* **Structure with Headers:** Use a logical hierarchy of headings (H1, H2, H3) in your source content. The app uses these to create chunks.
|
| 222 |
+
* **Write for Clarity:**
|
| 223 |
+
* Use short, direct sentences.
|
| 224 |
+
* State facts explicitly—don't make the AI guess.
|
| 225 |
+
* Follow the "one idea per paragraph" rule to create self-contained, meaningful chunks.
|
| 226 |
+
* **Create Verifiable Blocks:** Format content as direct definitions, Q&A sections, or step-by-step guides. These are ideal formats for AI to extract and use as answers.
|
| 227 |
+
* **Use the Editor's Metrics:** In the "Chunk Editor" tab, use the real-time stats to guide your writing.
|
| 228 |
+
* **Reading Ease:** Aim for a score **above 60**.
|
| 229 |
+
* **Word Count:** Keep chunks within the target range (e.g., 40-600 words).
|
| 230 |
+
* The colors (red/green) will show if you are meeting the targets set in the "Settings" tab.
|
| 231 |
+
"""
|
| 232 |
+
, icon="💡")
|
| 233 |
+
|
| 234 |
|
| 235 |
url_input = st.text_input("Enter a webpage URL to start", key="url_input")
|
| 236 |
|
|
|
|
| 285 |
selected_chunk = manager.get_chunk_by_id(st.session_state.selected_chunk_id)
|
| 286 |
|
| 287 |
if selected_chunk:
|
| 288 |
+
# --- Side-by-side layout for editor and live preview ---
|
| 289 |
editor_col, preview_col = st.columns(2)
|
| 290 |
|
| 291 |
with editor_col:
|