```
Please ensure your final output uses the exact key `TARGET_FIRM:` as shown above, alongside the firm name.
### Step 2: Snippet-by-snippet analysis
Step 2 is not the final answer; it is a working note for one snippet at a time.
- Your Step 2 prompt runs independently on each evidence unit, so the model only sees one snippet per call.
- The app passes:
- the Step 1 output
- `Snippet ID: S1..Sn`
- the snippet text
- A good Step 2 prompt says what this snippet supports, what it does not support, and where the evidence is still uncertain.
### Step 3: Reconciliation and final answer
Step 3 receives the Step 1 output plus all Step 2 notes and must turn them into one answer that could survive a hostile redline, including a direct answer to the original user question.
- Step 3 receives the Step 1 output plus all Step 2 outputs.
- It must return valid JSON with this exact schema:
```json
{
"buyer_counsel": "string with citations like \"Firm Name [^2]\" or \"unknown\"",
"seller_counsel": "string with citations like \"Firm Name [^4]\" or \"unknown\"",
"third_party_counsel": "string with citations like \"Firm Name [^1]\" or \"unknown\"",
"user_question": "string with citations like \"true [^2]\", \"false [^4]\", or \"unknown\""
}
```
Use `"unknown"` when a counsel field or the user-question answer cannot be supported by the evidence. When you provide a firm name, include snippet citations such as `[^2]` that point to the relevant Step 2 snippet IDs. The `user_question` field should answer the original query using only `true`, `false`, or `unknown`: if the answer is `true` or `false`, include supporting snippet citations and do not add any extra text.
"""
)
gr.Markdown(
"""
## Workflow At A Glance
This visual shows how the query moves from `LLM1` to `LLM2` to `LLM3`, and how the final JSON is assembled from the APA snippets.
"""
)
gr.Image(
value="mermaid_diagram.png",
label="Pipeline overview",
show_label=True,
interactive=False,
)
with gr.Accordion("Example Workflow", open=False):
gr.Markdown(
"""
**User query**
```text
Is Kirkland & Ellis LLP acting as counsel anywhere in this Asset Purchase Agreement?
```
**Step 1 relevant output**
```text
TARGET_FIRM: Kirkland & Ellis LLP
```
**Example evidence units**
- `S1`: the opening paragraph names the parties and mentions Kirkland & Ellis LLP as transaction counsel to the buyer.
- `S2`: the notices section identifies buyer counsel as Kirkland & Ellis LLP and seller counsel as Wachtell, Lipton, Rosen & Katz.
- `S3`: a boilerplate clause contains no counsel information and should not drive the final answer.
- `S4`: a representative provision states that Gibson, Dunn & Crutcher LLP advises the securityholders' representative.
- `S5`: a later notice block again confirms seller counsel as Wachtell, Lipton, Rosen & Katz.
**Final JSON shape**
```json
{
"buyer_counsel": "Kirkland & Ellis LLP [^1] [^2]",
"seller_counsel": "Wachtell, Lipton, Rosen & Katz [^2] [^5]",
"third_party_counsel": "Gibson, Dunn & Crutcher LLP [^4]",
"user_question": "true [^1] [^2]"
}
```
"""
)
with gr.Accordion("Practice and Final Submission", open=False):
gr.Markdown(
"""
- You may use **one optional practice run** per email to test your prompts against a hidden calibration set.
- The practice run uses 3 hidden calibration cases.
- Each case is run 3 times to check prompt consistency.
- For each run, LLM 1 can earn up to 1 point for correct routing and target-firm normalization, and LLM 3 can earn up to 1 point for a correct final JSON answer with supported citations.
- Step 2 is not scored directly, but it strongly affects the LLM 3 score because Step 3 relies on the snippet-level analysis.
- Practice returns aggregate feedback only: score percentage, an LLM 1 summary, and an LLM 3 summary.
- You may then revise your prompts or keep them as they are.
- You may submit **one final submission** per email against a separate hidden holdout set.
- After the final submission, practice is no longer available.
- No structured decoding is used for you, so your prompts must make Step 3 produce reliable JSON on their own.
"""
)
gr.Markdown(
"""
Enter your name and email exactly as listed in your CV. Both buttons below use the same three prompt boxes.
You have **one** chance to run the practice set and get feedback, and **one** chance to run the final set. After you click a button, wait for the results to load before clicking again or refreshing the page.
**Good Luck!**
"""
)
email_input = gr.Textbox(label="Email", placeholder="your.email@example.com")
name_input = gr.Textbox(label="First Name, Last Name", placeholder="John Smith")
system_prompt_input_1 = gr.Textbox(
label="System Prompt for Step 1",
placeholder="Enter your Step 1 prompt here...",
lines=6,
)
system_prompt_input_2 = gr.Textbox(
label="System Prompt for Step 2",
placeholder="Enter your Step 2 prompt here...",
lines=10,
)
system_prompt_input_3 = gr.Textbox(
label="System Prompt for Step 3",
placeholder="Enter your Step 3 prompt here...",
lines=6,
)
gr.Markdown(
"""
Please note:
Each run may take a couple of minutes.
After you click a button, wait for the result and do not click it again.
"""
)
with gr.Row():
practice_button = gr.Button("Practice Run")
final_button = gr.Button("Submit Final")
output_text = gr.Textbox(label="Results", lines=18)
feedback_md = gr.Markdown("", visible=False)
def practice_submit_and_update(email, name, s1, s2, s3):
return handle_submission("practice", email, name, s1, s2, s3)
def final_submit_and_update(email, name, s1, s2, s3):
return handle_submission("final", email, name, s1, s2, s3)
practice_button.click(
fn=practice_submit_and_update,
inputs=[
email_input,
name_input,
system_prompt_input_1,
system_prompt_input_2,
system_prompt_input_3,
],
outputs=[output_text, practice_button, final_button, feedback_md],
)
final_button.click(
fn=final_submit_and_update,
inputs=[
email_input,
name_input,
system_prompt_input_1,
system_prompt_input_2,
system_prompt_input_3,
],
outputs=[output_text, practice_button, final_button, feedback_md],
)
return demo
if __name__ == "__main__":
interface = build_interface()
interface.launch(server_name="0.0.0.0", server_port=7860, ssr_mode=False)