Spaces:

garywelz
/

shadow

Sleeping

garywelz commited on Apr 20

Commit

5d18cec

1 Parent(s): d1e0a5c

Improve Shadow Space manuscript workflow

Add manuscript browser/search and Audrey-first workflow controls, plus adjustable context settings for completion generation.

Made-with: Cursor

Files changed (4) hide show

README.md +28 -3
app.py +171 -20
llm_completion.py +63 -13
requirements.txt +0 -1

README.md CHANGED Viewed

@@ -15,6 +15,8 @@ pinned: false
 This Hugging Face Space is dedicated to completing **The Shadow of Lillya**, a novel by the late **Audrey Berger Welz**. This work serves as both a sequel and prequel to her previous novel, **Circus of the Queens**.
 ## Project Mission
 **Our primary goal is to use as much of Audrey Berger Welz's original material and original intent as possible.**
@@ -51,6 +53,27 @@ Our approach prioritizes Audrey's original material:
 See `WORKFLOW_AUDREY_FIRST.md` for detailed workflow instructions.
 ## Contributing
 This is a deeply personal project honoring Audrey's literary legacy. While this space serves as a memorial and completion effort, we welcome respectful engagement with the material.
@@ -58,9 +81,11 @@ This is a deeply personal project honoring Audrey's literary legacy. While this
 ## Key Files and Tools
 ### Material Organization
-- `manuscripts/Shadow_of_Lillya/audrey_original/` - Extracted original material
-- `manuscripts/Shadow_of_Lillya/audrey_edited/` - Edited for clarity (minimal changes)
-- `manuscripts/Shadow_of_Lillya/final_compilation/` - Final manuscript with attribution
 ### Workflow Tools
 - `extract_audrey_material.py` - Extract only Audrey's original writing

 This Hugging Face Space is dedicated to completing **The Shadow of Lillya**, a novel by the late **Audrey Berger Welz**. This work serves as both a sequel and prequel to her previous novel, **Circus of the Queens**.
+This is an ongoing **research + craft** project: build tooling that helps preserve Audrey’s authentic voice, organize source material, and (optionally) use LLMs to propose clearly-attributed continuations.
 ## Project Mission
 **Our primary goal is to use as much of Audrey Berger Welz's original material and original intent as possible.**
 See `WORKFLOW_AUDREY_FIRST.md` for detailed workflow instructions.
+## How the Space works
+The Streamlit app (`app.py`) provides:
+- **Manuscript browser**: view and download the markdown manuscript files under `manuscripts/`
+- **Search**: keyword/phrase search across manuscripts with contextual snippets
+- **Audrey-first tools**: buttons to run:
+  - `extract_audrey_material.py`
+  - `edit_audrey_material.py`
+  - `compile_final_manuscript.py`
+- **LLM completion generator** (optional): runs `llm_completion.py` with adjustable context sizes
+## LLM keys (optional)
+If you want to generate completions inside the Space, configure Space secrets:
+- `OPENAI_API_KEY` for OpenAI models
+- `ANTHROPIC_API_KEY` for Anthropic models
+You can also paste a key into the UI for a single run, but Space secrets are preferred.
 ## Contributing
 This is a deeply personal project honoring Audrey's literary legacy. While this space serves as a memorial and completion effort, we welcome respectful engagement with the material.
 ## Key Files and Tools
 ### Material Organization
+- `manuscripts/Shadow_of_Lillya/edited_version/` - Draft sent to editor (reference)
+- `manuscripts/Shadow_of_Lillya/unedited_material/` - Other draft material (reference)
+- `manuscripts/Shadow_of_Lillya/audrey_original/` - Extracted original material (generated)
+- `manuscripts/Shadow_of_Lillya/audrey_edited/` - Edited for clarity (generated)
+- `manuscripts/Shadow_of_Lillya/final_compilation/` - Final manuscript with attribution (generated)
 ### Workflow Tools
 - `extract_audrey_material.py` - Extract only Audrey's original writing

app.py CHANGED Viewed

@@ -2,6 +2,9 @@ import streamlit as st
 import pandas as pd
 from pathlib import Path
 import os
 # Page configuration
 st.set_page_config(
@@ -45,6 +48,54 @@ st.markdown("""
 </style>
 """, unsafe_allow_html=True)
 def main():
     # Header
     st.markdown('<h1 class="main-header">The Shadow of Lillya</h1>', unsafe_allow_html=True)
@@ -135,20 +186,107 @@ def show_home_page():
 def show_manuscripts_page():
     st.markdown('<h2 class="section-header">Original Manuscripts</h2>', unsafe_allow_html=True)
-    st.info("📚 Manuscript files will be uploaded here. Please check back soon for the complete texts.")
-    # Placeholder for manuscript display
-    st.markdown("""
-    ### Available Documents
-    Once uploaded, this section will contain:
-    - **Circus of the Queens** - The complete original novel
-    - **The Shadow of Lillya Draft** - Audrey's manuscript draft
-    - **Character Notes** - Any additional character development materials
-    - **Plot Outlines** - Story structure and planned developments
-    """)
 def show_generate_page():
     st.markdown('<h2 class="section-header">Generate Completion</h2>', unsafe_allow_html=True)
@@ -163,26 +301,35 @@ def show_generate_page():
     with col1:
         provider = st.selectbox("LLM Provider", ["OpenAI", "Anthropic"])
         model_name = st.text_input("Model Name", value="gpt-4" if provider == "OpenAI" else "claude-3-opus-20240229")
-        api_key = st.text_input("API Key", type="password", help="Or set OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable")
         max_tokens = st.slider("Max Tokens", 500, 4000, 2000)
     with col2:
         continuation_point = st.text_area("Continuation Point",
                                          placeholder="Optional: Specify where to continue from. Leave blank to continue from end of manuscript.",
                                          height=100)
-        st.info("💡 The system will automatically load all manuscripts as context for the completion.")
     if st.button("Generate Completion", type="primary"):
         with st.spinner("Generating completion..."):
             try:
-                import subprocess
                 import sys
                 # Build command
                 cmd = [sys.executable, "llm_completion.py",
                        "--model", provider.lower(),
                        "--model-name", model_name,
-                       "--max-tokens", str(max_tokens)]
                 if api_key:
                     cmd.extend(["--api-key", api_key])
@@ -200,9 +347,13 @@ def show_generate_page():
                 if result.returncode == 0:
                     st.success("✅ Completion generated successfully!")
-                    st.code(result.stdout)
                 else:
-                    st.error(f"❌ Error generating completion:\n{result.stderr}")
             except Exception as e:
                 st.error(f"❌ Error: {e}")

 import pandas as pd
 from pathlib import Path
 import os
+import re
+import subprocess
+from dataclasses import dataclass
 # Page configuration
 st.set_page_config(
 </style>
 """, unsafe_allow_html=True)
+MANUSCRIPTS_DIR = Path("manuscripts")
+@dataclass(frozen=True)
+class DocInfo:
+    label: str
+    path: Path
+    bytes: int
+    words: int
+@st.cache_data(show_spinner=False)
+def _read_text(path: str) -> str:
+    p = Path(path)
+    return p.read_text(encoding="utf-8", errors="ignore")
+def _count_words(text: str) -> int:
+    return len(re.findall(r"\b\w+\b", text))
+def _discover_markdown_docs() -> list[DocInfo]:
+    if not MANUSCRIPTS_DIR.exists():
+        return []
+    docs: list[DocInfo] = []
+    for p in sorted(MANUSCRIPTS_DIR.rglob("*.md")):
+        # Avoid showing huge autogenerated artifacts by default; user can still browse files directly.
+        if any(part.startswith(".") for part in p.parts):
+            continue
+        try:
+            text = _read_text(str(p))
+            size = p.stat().st_size
+            docs.append(
+                DocInfo(
+                    label=str(p.relative_to(MANUSCRIPTS_DIR)),
+                    path=p,
+                    bytes=size,
+                    words=_count_words(text),
+                )
+            )
+        except Exception:
+            continue
+    return docs
+def _script_exists(name: str) -> bool:
+    return Path(name).exists()
+def _run_python_script(script: str, args: list[str], env: dict[str, str]) -> tuple[int, str, str]:
+    cmd = ["python3", script, *args]
+    proc = subprocess.run(cmd, capture_output=True, text=True, env=env)
+    return proc.returncode, proc.stdout, proc.stderr
 def main():
     # Header
     st.markdown('<h1 class="main-header">The Shadow of Lillya</h1>', unsafe_allow_html=True)
 def show_manuscripts_page():
     st.markdown('<h2 class="section-header">Original Manuscripts</h2>', unsafe_allow_html=True)
+    if not MANUSCRIPTS_DIR.exists():
+        st.warning("No `manuscripts/` directory found in the Space repository.")
+        return
+    docs = _discover_markdown_docs()
+    if not docs:
+        st.warning("No manuscript markdown files found under `manuscripts/`.")
+        return
+    st.success(f"Found {len(docs)} manuscript file(s).")
+    # Project workflow status
+    st.markdown("### Audrey-first workflow status")
+    col_a, col_b, col_c = st.columns(3)
+    with col_a:
+        st.write("**Extracted original material**")
+        st.code("manuscripts/Shadow_of_Lillya/audrey_original/audrey_original_compiled.md", language=None)
+        st.write("✅" if Path("manuscripts/Shadow_of_Lillya/audrey_original/audrey_original_compiled.md").exists() else "— not generated yet")
+    with col_b:
+        st.write("**Edited for clarity**")
+        st.code("manuscripts/Shadow_of_Lillya/audrey_edited/audrey_edited_clean.md", language=None)
+        st.write("✅" if Path("manuscripts/Shadow_of_Lillya/audrey_edited/audrey_edited_clean.md").exists() else "— not generated yet")
+    with col_c:
+        st.write("**Final compilation**")
+        st.code("manuscripts/Shadow_of_Lillya/final_compilation/shadow_of_lillya_final.md", language=None)
+        st.write("✅" if Path("manuscripts/Shadow_of_Lillya/final_compilation/shadow_of_lillya_final.md").exists() else "— not generated yet")
+    st.markdown("### Run workflow tools (optional)")
+    tool_env = os.environ.copy()
+    # These scripts are local and do not require API keys.
+    col1, col2, col3 = st.columns(3)
+    with col1:
+        if st.button("1) Extract Audrey original", use_container_width=True, disabled=not _script_exists("extract_audrey_material.py")):
+            with st.spinner("Running extract_audrey_material.py..."):
+                rc, out, err = _run_python_script("extract_audrey_material.py", [], tool_env)
+            (st.success if rc == 0 else st.error)(f"Exit code: {rc}")
+            if out:
+                st.text_area("stdout", out, height=220)
+            if err:
+                st.text_area("stderr", err, height=220)
+    with col2:
+        if st.button("2) Edit for clarity", use_container_width=True, disabled=not _script_exists("edit_audrey_material.py")):
+            with st.spinner("Running edit_audrey_material.py..."):
+                rc, out, err = _run_python_script("edit_audrey_material.py", [], tool_env)
+            (st.success if rc == 0 else st.error)(f"Exit code: {rc}")
+            if out:
+                st.text_area("stdout", out, height=220)
+            if err:
+                st.text_area("stderr", err, height=220)
+    with col3:
+        if st.button("3) Compile final manuscript", use_container_width=True, disabled=not _script_exists("compile_final_manuscript.py")):
+            with st.spinner("Running compile_final_manuscript.py..."):
+                rc, out, err = _run_python_script("compile_final_manuscript.py", [], tool_env)
+            (st.success if rc == 0 else st.error)(f"Exit code: {rc}")
+            if out:
+                st.text_area("stdout", out, height=220)
+            if err:
+                st.text_area("stderr", err, height=220)
+    st.markdown("---")
+    st.markdown("### Browse manuscripts")
+    doc_by_label = {d.label: d for d in docs}
+    selected = st.selectbox("Select a document", options=list(doc_by_label.keys()))
+    info = doc_by_label[selected]
+    st.caption(f"`{info.path}` • {info.words:,} words • {info.bytes/1024/1024:.2f} MB")
+    max_chars = st.slider("Preview length (characters)", min_value=2_000, max_value=80_000, value=10_000, step=2_000)
+    text = _read_text(str(info.path))
+    preview = text[:max_chars]
+    st.text_area("Preview", preview, height=420)
+    st.download_button("Download full document", data=text, file_name=info.path.name, mime="text/markdown")
+    st.markdown("---")
+    st.markdown("### Search across manuscripts")
+    q = st.text_input("Keyword / phrase", placeholder="e.g., Lillya, circus, queen, shadow, chapter, character name…")
+    context = st.slider("Context characters around match", 40, 400, 120, 20)
+    max_hits = st.slider("Max hits to show", 5, 50, 15, 5)
+    if q.strip():
+        q_norm = q.strip()
+        hits = []
+        for d in docs:
+            t = _read_text(str(d.path))
+            for m in re.finditer(re.escape(q_norm), t, flags=re.IGNORECASE):
+                start = max(0, m.start() - context)
+                end = min(len(t), m.end() + context)
+                snippet = t[start:end].replace("\n", " ")
+                hits.append((d.label, snippet))
+                if len(hits) >= max_hits:
+                    break
+            if len(hits) >= max_hits:
+                break
+        if not hits:
+            st.info("No matches found.")
+        else:
+            st.success(f"Showing {len(hits)} match(es).")
+            for i, (label, snippet) in enumerate(hits, 1):
+                st.markdown(f"**{i}.** `{label}`")
+                st.code(snippet, language=None)
 def show_generate_page():
     st.markdown('<h2 class="section-header">Generate Completion</h2>', unsafe_allow_html=True)
     with col1:
         provider = st.selectbox("LLM Provider", ["OpenAI", "Anthropic"])
         model_name = st.text_input("Model Name", value="gpt-4" if provider == "OpenAI" else "claude-3-opus-20240229")
+        api_key = st.text_input("API Key (optional)", type="password", help="Prefer setting Space secrets (OPENAI_API_KEY / ANTHROPIC_API_KEY).")
         max_tokens = st.slider("Max Tokens", 500, 4000, 2000)
+        use_audrey_first = st.checkbox("Prefer Audrey-first edited material (if available)", value=True)
+        shadow_tail_chars = st.slider("Shadow context (last characters)", 2_000, 40_000, 12_000, 1_000)
+        circus_head_chars = st.slider("Circus context (first characters)", 0, 15_000, 4_000, 500)
     with col2:
         continuation_point = st.text_area("Continuation Point",
                                          placeholder="Optional: Specify where to continue from. Leave blank to continue from end of manuscript.",
                                          height=100)
+        target_words = st.slider("Target words (guideline)", 300, 4000, 1400, 100)
+        st.info("💡 If no API key is configured, generation will fail—use the Manuscripts tab to prep Audrey-first material first.")
     if st.button("Generate Completion", type="primary"):
         with st.spinner("Generating completion..."):
             try:
                 import sys
                 # Build command
                 cmd = [sys.executable, "llm_completion.py",
                        "--model", provider.lower(),
                        "--model-name", model_name,
+                       "--max-tokens", str(max_tokens),
+                       "--shadow-tail-chars", str(shadow_tail_chars),
+                       "--circus-chars", str(circus_head_chars),
+                       "--target-words", str(target_words)]
+                if use_audrey_first:
+                    cmd.append("--use-audrey-first")
                 if api_key:
                     cmd.extend(["--api-key", api_key])
                 if result.returncode == 0:
                     st.success("✅ Completion generated successfully!")
+                    st.text_area("Output", result.stdout, height=260)
                 else:
+                    st.error("❌ Error generating completion")
+                    if result.stdout:
+                        st.text_area("stdout", result.stdout, height=200)
+                    if result.stderr:
+                        st.text_area("stderr", result.stderr, height=240)
             except Exception as e:
                 st.error(f"❌ Error: {e}")

llm_completion.py CHANGED Viewed

@@ -23,6 +23,12 @@ class LLMCompletion:
         """Load all manuscript files"""
         manuscripts = {}
         manuscripts_dir = Path("manuscripts")
         # Load Circus of the Queens
         circus_dir = manuscripts_dir / "Circus_of_the_Queens"
@@ -30,11 +36,12 @@ class LLMCompletion:
             with open(md_file, 'r', encoding='utf-8') as f:
                 manuscripts['circus_of_the_queens'] = f.read()
-        # Load edited version of Shadow of Lillya
-        edited_dir = manuscripts_dir / "Shadow_of_Lillya" / "edited_version"
-        for md_file in edited_dir.glob("*.md"):
-            with open(md_file, 'r', encoding='utf-8') as f:
-                manuscripts['shadow_edited'] = f.read()
         # Load unedited material
         unedited_dir = manuscripts_dir / "Shadow_of_Lillya" / "unedited_material"
@@ -55,21 +62,41 @@ class LLMCompletion:
         return manuscripts
-    def create_prompt(self, manuscripts: Dict[str, str], continuation_point: Optional[str] = None) -> str:
         """Create a prompt for LLM completion"""
         prompt = f"""You are completing the novel "The Shadow of Lillya" by Audrey Berger Welz. This is a sequel/prequel to her novel "Circus of the Queens."
 CONTEXT - CIRCUS OF THE QUEENS:
-{manuscripts.get('circus_of_the_queens', '')[:5000]}...
 CURRENT MANUSCRIPT - THE SHADOW OF LILLYA (Edited Version):
-{manuscripts.get('shadow_edited', '')}
 ADDITIONAL MATERIAL - UNEDITED VERSIONS:
-{manuscripts.get('shadow_unedited', '')[:3000]}...
 NOTES AND OUTLINES:
-{manuscripts.get('notes', '')}
 INSTRUCTIONS:
 1. Continue the story from where Audrey left off, maintaining her unique voice and writing style
@@ -81,7 +108,11 @@ INSTRUCTIONS:
 CONTINUATION POINT:
 {continuation_point if continuation_point else 'Continue from the end of the edited manuscript.'}
-Please continue the novel, writing approximately 1000-2000 words that seamlessly continue from where Audrey's manuscript ends."""
         return prompt
@@ -205,6 +236,18 @@ def main():
                        help='Maximum tokens to generate')
     parser.add_argument('--continuation-point', type=str,
                        help='Specific point in text to continue from')
     args = parser.parse_args()
@@ -216,7 +259,15 @@ def main():
     # Create prompt
     print("📝 Creating prompt...")
-    prompt = base_completion.create_prompt(manuscripts, args.continuation_point)
     print(f"  ✓ Prompt created ({len(prompt)} characters)")
     # Initialize LLM
@@ -255,4 +306,3 @@ def main():
 if __name__ == '__main__':
     exit(main())

         """Load all manuscript files"""
         manuscripts = {}
         manuscripts_dir = Path("manuscripts")
+        # Prefer Audrey-first edited material if it exists (generated by edit_audrey_material.py)
+        audrey_first_path = Path("manuscripts/Shadow_of_Lillya/audrey_edited/audrey_edited_clean.md")
+        if audrey_first_path.exists():
+            with open(audrey_first_path, "r", encoding="utf-8") as f:
+                manuscripts["shadow_edited"] = f.read()
         # Load Circus of the Queens
         circus_dir = manuscripts_dir / "Circus_of_the_Queens"
             with open(md_file, 'r', encoding='utf-8') as f:
                 manuscripts['circus_of_the_queens'] = f.read()
+        # Load edited version of Shadow of Lillya (fallback if Audrey-first not present)
+        if "shadow_edited" not in manuscripts:
+            edited_dir = manuscripts_dir / "Shadow_of_Lillya" / "edited_version"
+            for md_file in edited_dir.glob("*.md"):
+                with open(md_file, 'r', encoding='utf-8') as f:
+                    manuscripts['shadow_edited'] = f.read()
         # Load unedited material
         unedited_dir = manuscripts_dir / "Shadow_of_Lillya" / "unedited_material"
         return manuscripts
+    def create_prompt(
+        self,
+        manuscripts: Dict[str, str],
+        continuation_point: Optional[str] = None,
+        *,
+        circus_chars: int = 4000,
+        shadow_tail_chars: int = 12000,
+        unedited_chars: int = 2000,
+        notes_chars: int = 4000,
+        target_words: int = 1400,
+    ) -> str:
         """Create a prompt for LLM completion"""
+        circus = manuscripts.get("circus_of_the_queens", "")
+        shadow = manuscripts.get("shadow_edited", "")
+        unedited = manuscripts.get("shadow_unedited", "")
+        notes = manuscripts.get("notes", "")
+        circus_excerpt = circus[: max(0, circus_chars)]
+        shadow_excerpt = shadow[-max(0, shadow_tail_chars) :] if shadow_tail_chars > 0 else ""
+        unedited_excerpt = unedited[: max(0, unedited_chars)]
+        notes_excerpt = notes[: max(0, notes_chars)]
         prompt = f"""You are completing the novel "The Shadow of Lillya" by Audrey Berger Welz. This is a sequel/prequel to her novel "Circus of the Queens."
 CONTEXT - CIRCUS OF THE QUEENS:
+{circus_excerpt}
 CURRENT MANUSCRIPT - THE SHADOW OF LILLYA (Edited Version):
+{shadow_excerpt}
 ADDITIONAL MATERIAL - UNEDITED VERSIONS:
+{unedited_excerpt}
 NOTES AND OUTLINES:
+{notes_excerpt}
 INSTRUCTIONS:
 1. Continue the story from where Audrey left off, maintaining her unique voice and writing style
 CONTINUATION POINT:
 {continuation_point if continuation_point else 'Continue from the end of the edited manuscript.'}
+OUTPUT:
+- Write approximately {target_words} words (flexible).
+- Keep continuity with the final paragraphs of the Shadow excerpt above.
+- Do not include analysis or meta commentary—just the next section of the novel text.
+"""
         return prompt
                        help='Maximum tokens to generate')
     parser.add_argument('--continuation-point', type=str,
                        help='Specific point in text to continue from')
+    parser.add_argument('--use-audrey-first', action='store_true',
+                        help='Prefer Audrey-first edited material if present (default behavior). Included for clarity in logs/UI.')
+    parser.add_argument('--circus-chars', type=int, default=4000,
+                        help='How many characters of Circus to include (from the beginning)')
+    parser.add_argument('--shadow-tail-chars', type=int, default=12000,
+                        help='How many characters of Shadow to include (from the end)')
+    parser.add_argument('--unedited-chars', type=int, default=2000,
+                        help='How many characters of unedited Shadow material to include')
+    parser.add_argument('--notes-chars', type=int, default=4000,
+                        help='How many characters of notes/outlines to include')
+    parser.add_argument('--target-words', type=int, default=1400,
+                        help='Approximate word target for the continuation')
     args = parser.parse_args()
     # Create prompt
     print("📝 Creating prompt...")
+    prompt = base_completion.create_prompt(
+        manuscripts,
+        args.continuation_point,
+        circus_chars=args.circus_chars,
+        shadow_tail_chars=args.shadow_tail_chars,
+        unedited_chars=args.unedited_chars,
+        notes_chars=args.notes_chars,
+        target_words=args.target_words,
+    )
     print(f"  ✓ Prompt created ({len(prompt)} characters)")
     # Initialize LLM
 if __name__ == '__main__':
     exit(main())

requirements.txt CHANGED Viewed

@@ -1,6 +1,5 @@
 streamlit>=1.28.0
 pandas>=2.0.0
-pathlib
 numpy>=1.24.0
 plotly>=5.15.0
 pymupdf>=1.23.0

 streamlit>=1.28.0
 pandas>=2.0.0
 numpy>=1.24.0
 plotly>=5.15.0
 pymupdf>=1.23.0