Improve Shadow Space manuscript workflow
Browse filesAdd manuscript browser/search and Audrey-first workflow controls, plus adjustable context settings for completion generation.
Made-with: Cursor
- README.md +28 -3
- app.py +171 -20
- llm_completion.py +63 -13
- requirements.txt +0 -1
README.md
CHANGED
|
@@ -15,6 +15,8 @@ pinned: false
|
|
| 15 |
|
| 16 |
This Hugging Face Space is dedicated to completing **The Shadow of Lillya**, a novel by the late **Audrey Berger Welz**. This work serves as both a sequel and prequel to her previous novel, **Circus of the Queens**.
|
| 17 |
|
|
|
|
|
|
|
| 18 |
## Project Mission
|
| 19 |
|
| 20 |
**Our primary goal is to use as much of Audrey Berger Welz's original material and original intent as possible.**
|
|
@@ -51,6 +53,27 @@ Our approach prioritizes Audrey's original material:
|
|
| 51 |
|
| 52 |
See `WORKFLOW_AUDREY_FIRST.md` for detailed workflow instructions.
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
## Contributing
|
| 55 |
|
| 56 |
This is a deeply personal project honoring Audrey's literary legacy. While this space serves as a memorial and completion effort, we welcome respectful engagement with the material.
|
|
@@ -58,9 +81,11 @@ This is a deeply personal project honoring Audrey's literary legacy. While this
|
|
| 58 |
## Key Files and Tools
|
| 59 |
|
| 60 |
### Material Organization
|
| 61 |
-
- `manuscripts/Shadow_of_Lillya/
|
| 62 |
-
- `manuscripts/Shadow_of_Lillya/
|
| 63 |
-
- `manuscripts/Shadow_of_Lillya/
|
|
|
|
|
|
|
| 64 |
|
| 65 |
### Workflow Tools
|
| 66 |
- `extract_audrey_material.py` - Extract only Audrey's original writing
|
|
|
|
| 15 |
|
| 16 |
This Hugging Face Space is dedicated to completing **The Shadow of Lillya**, a novel by the late **Audrey Berger Welz**. This work serves as both a sequel and prequel to her previous novel, **Circus of the Queens**.
|
| 17 |
|
| 18 |
+
This is an ongoing **research + craft** project: build tooling that helps preserve Audrey’s authentic voice, organize source material, and (optionally) use LLMs to propose clearly-attributed continuations.
|
| 19 |
+
|
| 20 |
## Project Mission
|
| 21 |
|
| 22 |
**Our primary goal is to use as much of Audrey Berger Welz's original material and original intent as possible.**
|
|
|
|
| 53 |
|
| 54 |
See `WORKFLOW_AUDREY_FIRST.md` for detailed workflow instructions.
|
| 55 |
|
| 56 |
+
## How the Space works
|
| 57 |
+
|
| 58 |
+
The Streamlit app (`app.py`) provides:
|
| 59 |
+
|
| 60 |
+
- **Manuscript browser**: view and download the markdown manuscript files under `manuscripts/`
|
| 61 |
+
- **Search**: keyword/phrase search across manuscripts with contextual snippets
|
| 62 |
+
- **Audrey-first tools**: buttons to run:
|
| 63 |
+
- `extract_audrey_material.py`
|
| 64 |
+
- `edit_audrey_material.py`
|
| 65 |
+
- `compile_final_manuscript.py`
|
| 66 |
+
- **LLM completion generator** (optional): runs `llm_completion.py` with adjustable context sizes
|
| 67 |
+
|
| 68 |
+
## LLM keys (optional)
|
| 69 |
+
|
| 70 |
+
If you want to generate completions inside the Space, configure Space secrets:
|
| 71 |
+
|
| 72 |
+
- `OPENAI_API_KEY` for OpenAI models
|
| 73 |
+
- `ANTHROPIC_API_KEY` for Anthropic models
|
| 74 |
+
|
| 75 |
+
You can also paste a key into the UI for a single run, but Space secrets are preferred.
|
| 76 |
+
|
| 77 |
## Contributing
|
| 78 |
|
| 79 |
This is a deeply personal project honoring Audrey's literary legacy. While this space serves as a memorial and completion effort, we welcome respectful engagement with the material.
|
|
|
|
| 81 |
## Key Files and Tools
|
| 82 |
|
| 83 |
### Material Organization
|
| 84 |
+
- `manuscripts/Shadow_of_Lillya/edited_version/` - Draft sent to editor (reference)
|
| 85 |
+
- `manuscripts/Shadow_of_Lillya/unedited_material/` - Other draft material (reference)
|
| 86 |
+
- `manuscripts/Shadow_of_Lillya/audrey_original/` - Extracted original material (generated)
|
| 87 |
+
- `manuscripts/Shadow_of_Lillya/audrey_edited/` - Edited for clarity (generated)
|
| 88 |
+
- `manuscripts/Shadow_of_Lillya/final_compilation/` - Final manuscript with attribution (generated)
|
| 89 |
|
| 90 |
### Workflow Tools
|
| 91 |
- `extract_audrey_material.py` - Extract only Audrey's original writing
|
app.py
CHANGED
|
@@ -2,6 +2,9 @@ import streamlit as st
|
|
| 2 |
import pandas as pd
|
| 3 |
from pathlib import Path
|
| 4 |
import os
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
# Page configuration
|
| 7 |
st.set_page_config(
|
|
@@ -45,6 +48,54 @@ st.markdown("""
|
|
| 45 |
</style>
|
| 46 |
""", unsafe_allow_html=True)
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
def main():
|
| 49 |
# Header
|
| 50 |
st.markdown('<h1 class="main-header">The Shadow of Lillya</h1>', unsafe_allow_html=True)
|
|
@@ -135,20 +186,107 @@ def show_home_page():
|
|
| 135 |
|
| 136 |
def show_manuscripts_page():
|
| 137 |
st.markdown('<h2 class="section-header">Original Manuscripts</h2>', unsafe_allow_html=True)
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
|
| 153 |
def show_generate_page():
|
| 154 |
st.markdown('<h2 class="section-header">Generate Completion</h2>', unsafe_allow_html=True)
|
|
@@ -163,26 +301,35 @@ def show_generate_page():
|
|
| 163 |
with col1:
|
| 164 |
provider = st.selectbox("LLM Provider", ["OpenAI", "Anthropic"])
|
| 165 |
model_name = st.text_input("Model Name", value="gpt-4" if provider == "OpenAI" else "claude-3-opus-20240229")
|
| 166 |
-
api_key = st.text_input("API Key", type="password", help="
|
| 167 |
max_tokens = st.slider("Max Tokens", 500, 4000, 2000)
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
with col2:
|
| 170 |
continuation_point = st.text_area("Continuation Point",
|
| 171 |
placeholder="Optional: Specify where to continue from. Leave blank to continue from end of manuscript.",
|
| 172 |
height=100)
|
| 173 |
-
st.
|
|
|
|
| 174 |
|
| 175 |
if st.button("Generate Completion", type="primary"):
|
| 176 |
with st.spinner("Generating completion..."):
|
| 177 |
try:
|
| 178 |
-
import subprocess
|
| 179 |
import sys
|
| 180 |
|
| 181 |
# Build command
|
| 182 |
cmd = [sys.executable, "llm_completion.py",
|
| 183 |
"--model", provider.lower(),
|
| 184 |
"--model-name", model_name,
|
| 185 |
-
"--max-tokens", str(max_tokens)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
|
| 187 |
if api_key:
|
| 188 |
cmd.extend(["--api-key", api_key])
|
|
@@ -200,9 +347,13 @@ def show_generate_page():
|
|
| 200 |
|
| 201 |
if result.returncode == 0:
|
| 202 |
st.success("✅ Completion generated successfully!")
|
| 203 |
-
st.
|
| 204 |
else:
|
| 205 |
-
st.error(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
except Exception as e:
|
| 207 |
st.error(f"❌ Error: {e}")
|
| 208 |
|
|
|
|
| 2 |
import pandas as pd
|
| 3 |
from pathlib import Path
|
| 4 |
import os
|
| 5 |
+
import re
|
| 6 |
+
import subprocess
|
| 7 |
+
from dataclasses import dataclass
|
| 8 |
|
| 9 |
# Page configuration
|
| 10 |
st.set_page_config(
|
|
|
|
| 48 |
</style>
|
| 49 |
""", unsafe_allow_html=True)
|
| 50 |
|
| 51 |
+
MANUSCRIPTS_DIR = Path("manuscripts")
|
| 52 |
+
|
| 53 |
+
@dataclass(frozen=True)
|
| 54 |
+
class DocInfo:
|
| 55 |
+
label: str
|
| 56 |
+
path: Path
|
| 57 |
+
bytes: int
|
| 58 |
+
words: int
|
| 59 |
+
|
| 60 |
+
@st.cache_data(show_spinner=False)
|
| 61 |
+
def _read_text(path: str) -> str:
|
| 62 |
+
p = Path(path)
|
| 63 |
+
return p.read_text(encoding="utf-8", errors="ignore")
|
| 64 |
+
|
| 65 |
+
def _count_words(text: str) -> int:
|
| 66 |
+
return len(re.findall(r"\b\w+\b", text))
|
| 67 |
+
|
| 68 |
+
def _discover_markdown_docs() -> list[DocInfo]:
|
| 69 |
+
if not MANUSCRIPTS_DIR.exists():
|
| 70 |
+
return []
|
| 71 |
+
docs: list[DocInfo] = []
|
| 72 |
+
for p in sorted(MANUSCRIPTS_DIR.rglob("*.md")):
|
| 73 |
+
# Avoid showing huge autogenerated artifacts by default; user can still browse files directly.
|
| 74 |
+
if any(part.startswith(".") for part in p.parts):
|
| 75 |
+
continue
|
| 76 |
+
try:
|
| 77 |
+
text = _read_text(str(p))
|
| 78 |
+
size = p.stat().st_size
|
| 79 |
+
docs.append(
|
| 80 |
+
DocInfo(
|
| 81 |
+
label=str(p.relative_to(MANUSCRIPTS_DIR)),
|
| 82 |
+
path=p,
|
| 83 |
+
bytes=size,
|
| 84 |
+
words=_count_words(text),
|
| 85 |
+
)
|
| 86 |
+
)
|
| 87 |
+
except Exception:
|
| 88 |
+
continue
|
| 89 |
+
return docs
|
| 90 |
+
|
| 91 |
+
def _script_exists(name: str) -> bool:
|
| 92 |
+
return Path(name).exists()
|
| 93 |
+
|
| 94 |
+
def _run_python_script(script: str, args: list[str], env: dict[str, str]) -> tuple[int, str, str]:
|
| 95 |
+
cmd = ["python3", script, *args]
|
| 96 |
+
proc = subprocess.run(cmd, capture_output=True, text=True, env=env)
|
| 97 |
+
return proc.returncode, proc.stdout, proc.stderr
|
| 98 |
+
|
| 99 |
def main():
|
| 100 |
# Header
|
| 101 |
st.markdown('<h1 class="main-header">The Shadow of Lillya</h1>', unsafe_allow_html=True)
|
|
|
|
| 186 |
|
| 187 |
def show_manuscripts_page():
|
| 188 |
st.markdown('<h2 class="section-header">Original Manuscripts</h2>', unsafe_allow_html=True)
|
| 189 |
+
|
| 190 |
+
if not MANUSCRIPTS_DIR.exists():
|
| 191 |
+
st.warning("No `manuscripts/` directory found in the Space repository.")
|
| 192 |
+
return
|
| 193 |
+
|
| 194 |
+
docs = _discover_markdown_docs()
|
| 195 |
+
if not docs:
|
| 196 |
+
st.warning("No manuscript markdown files found under `manuscripts/`.")
|
| 197 |
+
return
|
| 198 |
+
|
| 199 |
+
st.success(f"Found {len(docs)} manuscript file(s).")
|
| 200 |
+
|
| 201 |
+
# Project workflow status
|
| 202 |
+
st.markdown("### Audrey-first workflow status")
|
| 203 |
+
col_a, col_b, col_c = st.columns(3)
|
| 204 |
+
with col_a:
|
| 205 |
+
st.write("**Extracted original material**")
|
| 206 |
+
st.code("manuscripts/Shadow_of_Lillya/audrey_original/audrey_original_compiled.md", language=None)
|
| 207 |
+
st.write("✅" if Path("manuscripts/Shadow_of_Lillya/audrey_original/audrey_original_compiled.md").exists() else "— not generated yet")
|
| 208 |
+
with col_b:
|
| 209 |
+
st.write("**Edited for clarity**")
|
| 210 |
+
st.code("manuscripts/Shadow_of_Lillya/audrey_edited/audrey_edited_clean.md", language=None)
|
| 211 |
+
st.write("✅" if Path("manuscripts/Shadow_of_Lillya/audrey_edited/audrey_edited_clean.md").exists() else "— not generated yet")
|
| 212 |
+
with col_c:
|
| 213 |
+
st.write("**Final compilation**")
|
| 214 |
+
st.code("manuscripts/Shadow_of_Lillya/final_compilation/shadow_of_lillya_final.md", language=None)
|
| 215 |
+
st.write("✅" if Path("manuscripts/Shadow_of_Lillya/final_compilation/shadow_of_lillya_final.md").exists() else "— not generated yet")
|
| 216 |
+
|
| 217 |
+
st.markdown("### Run workflow tools (optional)")
|
| 218 |
+
tool_env = os.environ.copy()
|
| 219 |
+
# These scripts are local and do not require API keys.
|
| 220 |
+
col1, col2, col3 = st.columns(3)
|
| 221 |
+
with col1:
|
| 222 |
+
if st.button("1) Extract Audrey original", use_container_width=True, disabled=not _script_exists("extract_audrey_material.py")):
|
| 223 |
+
with st.spinner("Running extract_audrey_material.py..."):
|
| 224 |
+
rc, out, err = _run_python_script("extract_audrey_material.py", [], tool_env)
|
| 225 |
+
(st.success if rc == 0 else st.error)(f"Exit code: {rc}")
|
| 226 |
+
if out:
|
| 227 |
+
st.text_area("stdout", out, height=220)
|
| 228 |
+
if err:
|
| 229 |
+
st.text_area("stderr", err, height=220)
|
| 230 |
+
with col2:
|
| 231 |
+
if st.button("2) Edit for clarity", use_container_width=True, disabled=not _script_exists("edit_audrey_material.py")):
|
| 232 |
+
with st.spinner("Running edit_audrey_material.py..."):
|
| 233 |
+
rc, out, err = _run_python_script("edit_audrey_material.py", [], tool_env)
|
| 234 |
+
(st.success if rc == 0 else st.error)(f"Exit code: {rc}")
|
| 235 |
+
if out:
|
| 236 |
+
st.text_area("stdout", out, height=220)
|
| 237 |
+
if err:
|
| 238 |
+
st.text_area("stderr", err, height=220)
|
| 239 |
+
with col3:
|
| 240 |
+
if st.button("3) Compile final manuscript", use_container_width=True, disabled=not _script_exists("compile_final_manuscript.py")):
|
| 241 |
+
with st.spinner("Running compile_final_manuscript.py..."):
|
| 242 |
+
rc, out, err = _run_python_script("compile_final_manuscript.py", [], tool_env)
|
| 243 |
+
(st.success if rc == 0 else st.error)(f"Exit code: {rc}")
|
| 244 |
+
if out:
|
| 245 |
+
st.text_area("stdout", out, height=220)
|
| 246 |
+
if err:
|
| 247 |
+
st.text_area("stderr", err, height=220)
|
| 248 |
+
|
| 249 |
+
st.markdown("---")
|
| 250 |
+
st.markdown("### Browse manuscripts")
|
| 251 |
+
doc_by_label = {d.label: d for d in docs}
|
| 252 |
+
selected = st.selectbox("Select a document", options=list(doc_by_label.keys()))
|
| 253 |
+
info = doc_by_label[selected]
|
| 254 |
+
st.caption(f"`{info.path}` • {info.words:,} words • {info.bytes/1024/1024:.2f} MB")
|
| 255 |
+
|
| 256 |
+
max_chars = st.slider("Preview length (characters)", min_value=2_000, max_value=80_000, value=10_000, step=2_000)
|
| 257 |
+
text = _read_text(str(info.path))
|
| 258 |
+
preview = text[:max_chars]
|
| 259 |
+
st.text_area("Preview", preview, height=420)
|
| 260 |
+
st.download_button("Download full document", data=text, file_name=info.path.name, mime="text/markdown")
|
| 261 |
+
|
| 262 |
+
st.markdown("---")
|
| 263 |
+
st.markdown("### Search across manuscripts")
|
| 264 |
+
q = st.text_input("Keyword / phrase", placeholder="e.g., Lillya, circus, queen, shadow, chapter, character name…")
|
| 265 |
+
context = st.slider("Context characters around match", 40, 400, 120, 20)
|
| 266 |
+
max_hits = st.slider("Max hits to show", 5, 50, 15, 5)
|
| 267 |
+
|
| 268 |
+
if q.strip():
|
| 269 |
+
q_norm = q.strip()
|
| 270 |
+
hits = []
|
| 271 |
+
for d in docs:
|
| 272 |
+
t = _read_text(str(d.path))
|
| 273 |
+
for m in re.finditer(re.escape(q_norm), t, flags=re.IGNORECASE):
|
| 274 |
+
start = max(0, m.start() - context)
|
| 275 |
+
end = min(len(t), m.end() + context)
|
| 276 |
+
snippet = t[start:end].replace("\n", " ")
|
| 277 |
+
hits.append((d.label, snippet))
|
| 278 |
+
if len(hits) >= max_hits:
|
| 279 |
+
break
|
| 280 |
+
if len(hits) >= max_hits:
|
| 281 |
+
break
|
| 282 |
+
|
| 283 |
+
if not hits:
|
| 284 |
+
st.info("No matches found.")
|
| 285 |
+
else:
|
| 286 |
+
st.success(f"Showing {len(hits)} match(es).")
|
| 287 |
+
for i, (label, snippet) in enumerate(hits, 1):
|
| 288 |
+
st.markdown(f"**{i}.** `{label}`")
|
| 289 |
+
st.code(snippet, language=None)
|
| 290 |
|
| 291 |
def show_generate_page():
|
| 292 |
st.markdown('<h2 class="section-header">Generate Completion</h2>', unsafe_allow_html=True)
|
|
|
|
| 301 |
with col1:
|
| 302 |
provider = st.selectbox("LLM Provider", ["OpenAI", "Anthropic"])
|
| 303 |
model_name = st.text_input("Model Name", value="gpt-4" if provider == "OpenAI" else "claude-3-opus-20240229")
|
| 304 |
+
api_key = st.text_input("API Key (optional)", type="password", help="Prefer setting Space secrets (OPENAI_API_KEY / ANTHROPIC_API_KEY).")
|
| 305 |
max_tokens = st.slider("Max Tokens", 500, 4000, 2000)
|
| 306 |
+
use_audrey_first = st.checkbox("Prefer Audrey-first edited material (if available)", value=True)
|
| 307 |
+
shadow_tail_chars = st.slider("Shadow context (last characters)", 2_000, 40_000, 12_000, 1_000)
|
| 308 |
+
circus_head_chars = st.slider("Circus context (first characters)", 0, 15_000, 4_000, 500)
|
| 309 |
|
| 310 |
with col2:
|
| 311 |
continuation_point = st.text_area("Continuation Point",
|
| 312 |
placeholder="Optional: Specify where to continue from. Leave blank to continue from end of manuscript.",
|
| 313 |
height=100)
|
| 314 |
+
target_words = st.slider("Target words (guideline)", 300, 4000, 1400, 100)
|
| 315 |
+
st.info("💡 If no API key is configured, generation will fail—use the Manuscripts tab to prep Audrey-first material first.")
|
| 316 |
|
| 317 |
if st.button("Generate Completion", type="primary"):
|
| 318 |
with st.spinner("Generating completion..."):
|
| 319 |
try:
|
|
|
|
| 320 |
import sys
|
| 321 |
|
| 322 |
# Build command
|
| 323 |
cmd = [sys.executable, "llm_completion.py",
|
| 324 |
"--model", provider.lower(),
|
| 325 |
"--model-name", model_name,
|
| 326 |
+
"--max-tokens", str(max_tokens),
|
| 327 |
+
"--shadow-tail-chars", str(shadow_tail_chars),
|
| 328 |
+
"--circus-chars", str(circus_head_chars),
|
| 329 |
+
"--target-words", str(target_words)]
|
| 330 |
+
|
| 331 |
+
if use_audrey_first:
|
| 332 |
+
cmd.append("--use-audrey-first")
|
| 333 |
|
| 334 |
if api_key:
|
| 335 |
cmd.extend(["--api-key", api_key])
|
|
|
|
| 347 |
|
| 348 |
if result.returncode == 0:
|
| 349 |
st.success("✅ Completion generated successfully!")
|
| 350 |
+
st.text_area("Output", result.stdout, height=260)
|
| 351 |
else:
|
| 352 |
+
st.error("❌ Error generating completion")
|
| 353 |
+
if result.stdout:
|
| 354 |
+
st.text_area("stdout", result.stdout, height=200)
|
| 355 |
+
if result.stderr:
|
| 356 |
+
st.text_area("stderr", result.stderr, height=240)
|
| 357 |
except Exception as e:
|
| 358 |
st.error(f"❌ Error: {e}")
|
| 359 |
|
llm_completion.py
CHANGED
|
@@ -23,6 +23,12 @@ class LLMCompletion:
|
|
| 23 |
"""Load all manuscript files"""
|
| 24 |
manuscripts = {}
|
| 25 |
manuscripts_dir = Path("manuscripts")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
# Load Circus of the Queens
|
| 28 |
circus_dir = manuscripts_dir / "Circus_of_the_Queens"
|
|
@@ -30,11 +36,12 @@ class LLMCompletion:
|
|
| 30 |
with open(md_file, 'r', encoding='utf-8') as f:
|
| 31 |
manuscripts['circus_of_the_queens'] = f.read()
|
| 32 |
|
| 33 |
-
# Load edited version of Shadow of Lillya
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
# Load unedited material
|
| 40 |
unedited_dir = manuscripts_dir / "Shadow_of_Lillya" / "unedited_material"
|
|
@@ -55,21 +62,41 @@ class LLMCompletion:
|
|
| 55 |
|
| 56 |
return manuscripts
|
| 57 |
|
| 58 |
-
def create_prompt(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
"""Create a prompt for LLM completion"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
prompt = f"""You are completing the novel "The Shadow of Lillya" by Audrey Berger Welz. This is a sequel/prequel to her novel "Circus of the Queens."
|
| 61 |
|
| 62 |
CONTEXT - CIRCUS OF THE QUEENS:
|
| 63 |
-
{
|
| 64 |
|
| 65 |
CURRENT MANUSCRIPT - THE SHADOW OF LILLYA (Edited Version):
|
| 66 |
-
{
|
| 67 |
|
| 68 |
ADDITIONAL MATERIAL - UNEDITED VERSIONS:
|
| 69 |
-
{
|
| 70 |
|
| 71 |
NOTES AND OUTLINES:
|
| 72 |
-
{
|
| 73 |
|
| 74 |
INSTRUCTIONS:
|
| 75 |
1. Continue the story from where Audrey left off, maintaining her unique voice and writing style
|
|
@@ -81,7 +108,11 @@ INSTRUCTIONS:
|
|
| 81 |
CONTINUATION POINT:
|
| 82 |
{continuation_point if continuation_point else 'Continue from the end of the edited manuscript.'}
|
| 83 |
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
return prompt
|
| 87 |
|
|
@@ -205,6 +236,18 @@ def main():
|
|
| 205 |
help='Maximum tokens to generate')
|
| 206 |
parser.add_argument('--continuation-point', type=str,
|
| 207 |
help='Specific point in text to continue from')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
|
| 209 |
args = parser.parse_args()
|
| 210 |
|
|
@@ -216,7 +259,15 @@ def main():
|
|
| 216 |
|
| 217 |
# Create prompt
|
| 218 |
print("📝 Creating prompt...")
|
| 219 |
-
prompt = base_completion.create_prompt(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
print(f" ✓ Prompt created ({len(prompt)} characters)")
|
| 221 |
|
| 222 |
# Initialize LLM
|
|
@@ -255,4 +306,3 @@ def main():
|
|
| 255 |
|
| 256 |
if __name__ == '__main__':
|
| 257 |
exit(main())
|
| 258 |
-
|
|
|
|
| 23 |
"""Load all manuscript files"""
|
| 24 |
manuscripts = {}
|
| 25 |
manuscripts_dir = Path("manuscripts")
|
| 26 |
+
|
| 27 |
+
# Prefer Audrey-first edited material if it exists (generated by edit_audrey_material.py)
|
| 28 |
+
audrey_first_path = Path("manuscripts/Shadow_of_Lillya/audrey_edited/audrey_edited_clean.md")
|
| 29 |
+
if audrey_first_path.exists():
|
| 30 |
+
with open(audrey_first_path, "r", encoding="utf-8") as f:
|
| 31 |
+
manuscripts["shadow_edited"] = f.read()
|
| 32 |
|
| 33 |
# Load Circus of the Queens
|
| 34 |
circus_dir = manuscripts_dir / "Circus_of_the_Queens"
|
|
|
|
| 36 |
with open(md_file, 'r', encoding='utf-8') as f:
|
| 37 |
manuscripts['circus_of_the_queens'] = f.read()
|
| 38 |
|
| 39 |
+
# Load edited version of Shadow of Lillya (fallback if Audrey-first not present)
|
| 40 |
+
if "shadow_edited" not in manuscripts:
|
| 41 |
+
edited_dir = manuscripts_dir / "Shadow_of_Lillya" / "edited_version"
|
| 42 |
+
for md_file in edited_dir.glob("*.md"):
|
| 43 |
+
with open(md_file, 'r', encoding='utf-8') as f:
|
| 44 |
+
manuscripts['shadow_edited'] = f.read()
|
| 45 |
|
| 46 |
# Load unedited material
|
| 47 |
unedited_dir = manuscripts_dir / "Shadow_of_Lillya" / "unedited_material"
|
|
|
|
| 62 |
|
| 63 |
return manuscripts
|
| 64 |
|
| 65 |
+
def create_prompt(
|
| 66 |
+
self,
|
| 67 |
+
manuscripts: Dict[str, str],
|
| 68 |
+
continuation_point: Optional[str] = None,
|
| 69 |
+
*,
|
| 70 |
+
circus_chars: int = 4000,
|
| 71 |
+
shadow_tail_chars: int = 12000,
|
| 72 |
+
unedited_chars: int = 2000,
|
| 73 |
+
notes_chars: int = 4000,
|
| 74 |
+
target_words: int = 1400,
|
| 75 |
+
) -> str:
|
| 76 |
"""Create a prompt for LLM completion"""
|
| 77 |
+
circus = manuscripts.get("circus_of_the_queens", "")
|
| 78 |
+
shadow = manuscripts.get("shadow_edited", "")
|
| 79 |
+
unedited = manuscripts.get("shadow_unedited", "")
|
| 80 |
+
notes = manuscripts.get("notes", "")
|
| 81 |
+
|
| 82 |
+
circus_excerpt = circus[: max(0, circus_chars)]
|
| 83 |
+
shadow_excerpt = shadow[-max(0, shadow_tail_chars) :] if shadow_tail_chars > 0 else ""
|
| 84 |
+
unedited_excerpt = unedited[: max(0, unedited_chars)]
|
| 85 |
+
notes_excerpt = notes[: max(0, notes_chars)]
|
| 86 |
+
|
| 87 |
prompt = f"""You are completing the novel "The Shadow of Lillya" by Audrey Berger Welz. This is a sequel/prequel to her novel "Circus of the Queens."
|
| 88 |
|
| 89 |
CONTEXT - CIRCUS OF THE QUEENS:
|
| 90 |
+
{circus_excerpt}
|
| 91 |
|
| 92 |
CURRENT MANUSCRIPT - THE SHADOW OF LILLYA (Edited Version):
|
| 93 |
+
{shadow_excerpt}
|
| 94 |
|
| 95 |
ADDITIONAL MATERIAL - UNEDITED VERSIONS:
|
| 96 |
+
{unedited_excerpt}
|
| 97 |
|
| 98 |
NOTES AND OUTLINES:
|
| 99 |
+
{notes_excerpt}
|
| 100 |
|
| 101 |
INSTRUCTIONS:
|
| 102 |
1. Continue the story from where Audrey left off, maintaining her unique voice and writing style
|
|
|
|
| 108 |
CONTINUATION POINT:
|
| 109 |
{continuation_point if continuation_point else 'Continue from the end of the edited manuscript.'}
|
| 110 |
|
| 111 |
+
OUTPUT:
|
| 112 |
+
- Write approximately {target_words} words (flexible).
|
| 113 |
+
- Keep continuity with the final paragraphs of the Shadow excerpt above.
|
| 114 |
+
- Do not include analysis or meta commentary—just the next section of the novel text.
|
| 115 |
+
"""
|
| 116 |
|
| 117 |
return prompt
|
| 118 |
|
|
|
|
| 236 |
help='Maximum tokens to generate')
|
| 237 |
parser.add_argument('--continuation-point', type=str,
|
| 238 |
help='Specific point in text to continue from')
|
| 239 |
+
parser.add_argument('--use-audrey-first', action='store_true',
|
| 240 |
+
help='Prefer Audrey-first edited material if present (default behavior). Included for clarity in logs/UI.')
|
| 241 |
+
parser.add_argument('--circus-chars', type=int, default=4000,
|
| 242 |
+
help='How many characters of Circus to include (from the beginning)')
|
| 243 |
+
parser.add_argument('--shadow-tail-chars', type=int, default=12000,
|
| 244 |
+
help='How many characters of Shadow to include (from the end)')
|
| 245 |
+
parser.add_argument('--unedited-chars', type=int, default=2000,
|
| 246 |
+
help='How many characters of unedited Shadow material to include')
|
| 247 |
+
parser.add_argument('--notes-chars', type=int, default=4000,
|
| 248 |
+
help='How many characters of notes/outlines to include')
|
| 249 |
+
parser.add_argument('--target-words', type=int, default=1400,
|
| 250 |
+
help='Approximate word target for the continuation')
|
| 251 |
|
| 252 |
args = parser.parse_args()
|
| 253 |
|
|
|
|
| 259 |
|
| 260 |
# Create prompt
|
| 261 |
print("📝 Creating prompt...")
|
| 262 |
+
prompt = base_completion.create_prompt(
|
| 263 |
+
manuscripts,
|
| 264 |
+
args.continuation_point,
|
| 265 |
+
circus_chars=args.circus_chars,
|
| 266 |
+
shadow_tail_chars=args.shadow_tail_chars,
|
| 267 |
+
unedited_chars=args.unedited_chars,
|
| 268 |
+
notes_chars=args.notes_chars,
|
| 269 |
+
target_words=args.target_words,
|
| 270 |
+
)
|
| 271 |
print(f" ✓ Prompt created ({len(prompt)} characters)")
|
| 272 |
|
| 273 |
# Initialize LLM
|
|
|
|
| 306 |
|
| 307 |
if __name__ == '__main__':
|
| 308 |
exit(main())
|
|
|
requirements.txt
CHANGED
|
@@ -1,6 +1,5 @@
|
|
| 1 |
streamlit>=1.28.0
|
| 2 |
pandas>=2.0.0
|
| 3 |
-
pathlib
|
| 4 |
numpy>=1.24.0
|
| 5 |
plotly>=5.15.0
|
| 6 |
pymupdf>=1.23.0
|
|
|
|
| 1 |
streamlit>=1.28.0
|
| 2 |
pandas>=2.0.0
|
|
|
|
| 3 |
numpy>=1.24.0
|
| 4 |
plotly>=5.15.0
|
| 5 |
pymupdf>=1.23.0
|