Em4e commited on
Commit
de55a7c
·
verified ·
1 Parent(s): 3217d2c

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +26 -17
app.py CHANGED
@@ -199,29 +199,38 @@ processor = st.session_state.processor
199
  manager = st.session_state.manager
200
 
201
  st.title("Chunk Webpage Content Editor")
202
- st.caption("A tool to fetch, chunk, and refine web content.")
203
  st.markdown(
204
  "Developed by [Emilija Gjorgjevska](https://www.linkedin.com/in/emilijagjorgjevska/). "
205
  "Inspired by Andrea Volpini's [work on content chunking](https://www.linkedin.com/pulse/understanding-chunking-google-ai-mode-practical-content-volpini-zseaf/)")
206
 
207
- with st.expander("ℹ️ App Information & Chunking Details", expanded=False):
 
208
  st.info(
209
  """
210
- **App version:** v0.0 (alpha) — this is the very first public release, so you may run into bugs or incomplete features.
211
- **Server policy warning:** this app relies on automated requests (“bots”) under the hood.
212
- If the target server enforces a restrictive bot policy (e.g., rate-limits requests, blocks unknown user-agents or IP addresses), parts of the app **may not work** as expected.
213
- **What to do if you hit an issue:**
214
- 1. Check the server’s logs or policy settings to ensure it allows automated clients.
215
- 2. Keep an eye out for updates — v0.x → v1.0 is coming soon!
216
  ---
217
- **How Layout-Based Chunking is Implemented Here**
218
- This app uses a sophisticated, two-step process to create meaningful chunks based on the document’s visual and semantic structure:
219
- 1. **Structural Preservation (HTML Markdown):**
220
- Converts the webpage’s HTML into Markdown, translating tags (`<h1>`, `<p>`, `<ul>`) into their Markdown equivalents (`#`, paragraph breaks, `*`) to preserve layout and hierarchy.
221
- 2. **Layout-Aware Parsing (`MarkdownNodeParser`):**
222
- Uses LlamaIndex’s `MarkdownNodeParser` to read the structured Markdown and split it at logical boundaries (headers like `#`, `##`, etc.), yielding context-aware chunks that respect original sections.
223
- """
224
- , icon="ℹ️")
 
 
 
 
 
 
 
 
 
225
 
226
  url_input = st.text_input("Enter a webpage URL to start", key="url_input")
227
 
@@ -276,7 +285,7 @@ with tab1:
276
  selected_chunk = manager.get_chunk_by_id(st.session_state.selected_chunk_id)
277
 
278
  if selected_chunk:
279
- # --- RESTORED: Side-by-side layout for editor and live preview ---
280
  editor_col, preview_col = st.columns(2)
281
 
282
  with editor_col:
 
199
  manager = st.session_state.manager
200
 
201
  st.title("Chunk Webpage Content Editor")
202
+ st.caption("A tool to fetch, chunk, and refine web content for AI synthesis.")
203
  st.markdown(
204
  "Developed by [Emilija Gjorgjevska](https://www.linkedin.com/in/emilijagjorgjevska/). "
205
  "Inspired by Andrea Volpini's [work on content chunking](https://www.linkedin.com/pulse/understanding-chunking-google-ai-mode-practical-content-volpini-zseaf/)")
206
 
207
+ # --- MODIFIED: Added concise guidelines to the expander ---
208
+ with st.expander("ℹ️ App Information & AI Writing Guidelines", expanded=False):
209
  st.info(
210
  """
211
+ ### How Layout-Based Chunking is Implemented Here
212
+ This app uses a two-step process to create meaningful chunks based on a document’s structure:
213
+ 1. **Structural Preservation (HTML Markdown):** It converts a webpage’s HTML into Markdown, preserving the original layout and hierarchy (e.g., `<h1>` becomes `#`).
214
+ 2. **Layout-Aware Parsing (`MarkdownNodeParser`):** It then uses LlamaIndex’s `MarkdownNodeParser` to split the Markdown at logical boundaries (like headers), yielding context-aware chunks that respect the original sections.
215
+
 
216
  ---
217
+
218
+ ### Writing for AI Verifiability: A Quick Guide
219
+ To ensure your content is selected and cited by AI, focus on making each chunk clear, coherent, and verifiable.
220
+
221
+ * **Structure with Headers:** Use a logical hierarchy of headings (H1, H2, H3) in your source content. The app uses these to create chunks.
222
+ * **Write for Clarity:**
223
+ * Use short, direct sentences.
224
+ * State facts explicitly—don't make the AI guess.
225
+ * Follow the "one idea per paragraph" rule to create self-contained, meaningful chunks.
226
+ * **Create Verifiable Blocks:** Format content as direct definitions, Q&A sections, or step-by-step guides. These are ideal formats for AI to extract and use as answers.
227
+ * **Use the Editor's Metrics:** In the "Chunk Editor" tab, use the real-time stats to guide your writing.
228
+ * **Reading Ease:** Aim for a score **above 60**.
229
+ * **Word Count:** Keep chunks within the target range (e.g., 40-600 words).
230
+ * The colors (red/green) will show if you are meeting the targets set in the "Settings" tab.
231
+ """
232
+ , icon="💡")
233
+
234
 
235
  url_input = st.text_input("Enter a webpage URL to start", key="url_input")
236
 
 
285
  selected_chunk = manager.get_chunk_by_id(st.session_state.selected_chunk_id)
286
 
287
  if selected_chunk:
288
+ # --- Side-by-side layout for editor and live preview ---
289
  editor_col, preview_col = st.columns(2)
290
 
291
  with editor_col: