doclining / plan
bardd's picture
Update plan
2829fbc verified
# AI Feedback Learning β€” Simplified Plan (v2 Enhanced)
> **Status**: Ready for Implementation
> **Last Updated**: 2026-01-14
> **Principle**: First run is clean. Learning happens through rejection feedback. Zero visible tags.
---
## Naming Conventions
> [!IMPORTANT]
> Consistent naming across all components.
| Context | Format | Example |
|---------|--------|--------|
| **JSON keys** | `snake_case` | `document_id`, `operation_type`, `durable_id` |
| **File names** | `snake_case` | `comment_ids_map.json`, `feedback.json` |
| **TypeScript variables/functions** | `snake_case` | `document_id`, `detect_rejections()` |
| **TypeScript interfaces** | `PascalCase` | `OperationIdEntry`, `FeedbackFile` |
| **C# class names** | `PascalCase` | `DocxEditor`, `OperationIdEntry` |
| **C# properties/methods** | `PascalCase` | `DocumentId`, `EnsureDocumentId()` |
| **CLI arguments** | `kebab-case` | `--comment-ids-map`, `--out-redline` |
---
## The Problem (One Sentence)
**The AI has no memory.** It re-suggests the same things the lawyer already rejected.
---
## The Solution (30-Second Summary)
```mermaid
flowchart LR
A["πŸ€– AI Suggests (Pure JSON)"] --> B["⚑ C# Assigns Invisible IDs"]
B --> C["πŸ‘©β€βš–οΈ Lawyer Reviews"]
C --> D["πŸ“€ Re-Upload"]
D --> E["πŸ” MCP Detects Rejections"]
E --> F["πŸ“ Record Rejection"]
F --> G["🚫 AI Learns + Updates Preferences"]
```
## The "Invisible ID" Strategy
We achieve **zero visible noise** by using native Word features:
| ID Type | Purpose | Implementation (Invisible) |
|---------|---------|----------------------------|
| **RealityAIDocID** | Identify the file forever | **Custom Document Property**. Survives renaming. |
| **durableId** | Track individual comments | **Word Durable ID** (in `commentsIds.xml`). Generated by C#. |
| **Content Matching** | Track REWRITE/INSERT/DELETE | Compare original vs current text. |
**The LLM never sees IDs.** It outputs clean text. We attach the invisible tracking infrastructure during the build process.
---
## Status Model: Rejected-Only Tracking
> [!IMPORTANT]
> Simplified model: We only track **rejections**. No "pending" or "accepted" states.
| Status | When Set | What It Means |
|--------|----------|---------------|
| ~~`pending`~~ | ~~Initial~~ | **NOT USED** - eliminates unnecessary state |
| **`rejected`** | User deleted/rejected the operation | AI should never suggest this again |
**Rationale:**
- We only care about **what NOT to suggest**
- Accepted/kept items don't need tracking β€” they're already in the document
- Simpler data model = fewer bugs
---
## Tracking ALL Operations (Invisible Comments Strategy)
> [!IMPORTANT]
> We track **all 4 operation types** using **invisible tracking comments**. This provides 100% reliable rejection detection without fragile content matching.
### The Strategy: Every Operation Gets a Tracking Comment
Instead of content-based matching (fragile), we attach an **invisible tracking comment** to every operation:
| Operation Type | What We Store | How We Track | How We Detect Rejection |
|----------------|---------------|--------------|-------------------------|
| **COMMENT** | `durable_id` | Native comment | Missing from `commentsIds.xml` |
| **REWRITE** | `durable_id` | Invisible tracking comment | Tracking comment deleted |
| **INSERT** | `durable_id` | Invisible tracking comment | Tracking comment deleted |
| **DELETE** | `durable_id` | Hidden paragraph with tracking comment | Tracking comment deleted |
### Tracking Comment Properties
- **Author**: `__RealityAI_Tracker__` (special author name, hidden from normal view)
- **Text**: Empty or minimal marker
- **Purpose**: Generate `durable_id` for reliable tracking
> [!NOTE]
> **Why this works:**
> - User accepts operation β†’ keeps document as-is β†’ tracking comment remains β†’ NOT rejected
> - User rejects operation β†’ reverts change β†’ deletes associated comment β†’ tracking comment gone β†’ REJECTED
> - Works identically for all 4 operation types (no special cases)
### DELETE Operation Tracking
For DELETE operations, we **insert a hidden paragraph** at the deletion location with a tracking comment:
```xml
<!-- After deleting the target paragraph, insert: -->
<w:p>
<w:pPr><w:vanish/></w:pPr> <!-- Hidden paragraph -->
<w:commentRangeStart w:id="N"/>
<w:r><w:t></w:t></w:r>
<w:commentRangeEnd w:id="N"/>
<w:r><w:commentReference w:id="N"/></w:r>
</w:p>
```
**If user restores the deleted paragraph**, they will also delete this hidden tracking paragraph, triggering rejection detection.
---
## Data Structure (`feedback.json`)
```json
{
"document_id": "8A4E12345F9C7821",
"created_at": "2026-01-14T10:00:00Z",
"last_updated": "2026-01-14T11:00:00Z",
"known_entries": [
{
"id": "op_001",
"operation_type": "COMMENT",
"durable_id": "5F9C7821",
"comment_text": "Consider removing this limitation of liability clause",
"document_context": "The Contractor shall not be liable for any indirect...",
"bookmark_id": "BMK_P_abc123",
"created_at": "2026-01-14T10:00:00Z"
},
{
"id": "op_002",
"operation_type": "REWRITE",
"original_text": "Payment within 30 days",
"new_text": "Payment within 45 days",
"comment_text": "Extended payment terms for better cash flow",
"bookmark_id": "BMK_P_def456",
"created_at": "2026-01-14T10:00:00Z"
},
{
"id": "op_003",
"operation_type": "DELETE",
"original_text": "This clause shall be subject to annual review...",
"comment_text": "Removed redundant review clause",
"bookmark_id": "BMK_P_ghi789",
"created_at": "2026-01-14T10:00:00Z"
},
{
"id": "op_004",
"operation_type": "INSERT",
"inserted_text": "Notwithstanding the foregoing...",
"position": "after",
"comment_text": "Added clarifying language",
"bookmark_id": "BMK_P_jkl012",
"created_at": "2026-01-14T10:00:00Z"
}
],
"rejections": [
{
"id": "op_002",
"operation_type": "REWRITE",
"original_text": "Payment within 30 days",
"new_text": "Payment within 45 days",
"comment_text": "Extended payment terms for better cash flow",
"document_context": "Payment within 30 days",
"reason": "Paragraph still contains original text",
"rejected_at": "2026-01-14T11:00:00Z"
}
]
}
```
> [!NOTE]
> - `known_entries` **accumulates** across multiple runs (not replaced)
> - Each entry has a unique `id` field (e.g., `op_001`) for tracking
> - When rejected, the full entry is copied to `rejections` with `reason`
> - `document_context` enables LLM to write nuanced preferences
---
## Deterministic Prompt Selection (MCP Routes to Correct Prompt)
> [!IMPORTANT]
> The LLM does NOT decide which prompt to use. MCP code checks `RealityAIDocID` and loads the appropriate prompt file.
### The Flow
```
User: /analyse_contract document.docx
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MCP: read_docx_for_analysis β”‚
β”‚ β”‚
β”‚ 1. Convert DOCX to markdown β”‚
β”‚ 2. Check: RealityAIDocID exists? β”‚ ← DETERMINISTIC (code)
β”‚ β”‚
β”‚ IF NO β†’ mode: "FIRST_RUN" β”‚
β”‚ IF YES β†’ mode: "RERUN" β”‚
β”‚ β†’ Load rejection history β”‚
β”‚ β†’ Load preferences β”‚
β”‚ β”‚
β”‚ 3. Load appropriate prompt file β”‚
β”‚ FIRST_RUN β†’ analyse_contract_first_run.md
β”‚ RERUN β†’ analyse_contract_rerun.mdβ”‚
β”‚ β”‚
β”‚ 4. Return instructions + data β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM receives: β”‚
β”‚ β”‚
β”‚ 1. Instructions (from the correct β”‚
β”‚ prompt file for this mode) β”‚
β”‚ 2. Document markdown β”‚
β”‚ 3. Rejection history (if rerun) β”‚
β”‚ 4. Preferences (if rerun) β”‚
β”‚ β”‚
β”‚ LLM follows the instructions provided. β”‚
β”‚ NO if/else logic in the prompt. β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Three Prompt Files
| File | Purpose |
|------|---------|
| `.claude/commands/analyse_contract.md` | Minimal router β€” calls MCP, follows returned instructions |
| `.claude/commands/analyse_contract_first_run.md` | First-run logic only (no conditionals) |
| `.claude/commands/analyse_contract_rerun.md` | Rerun logic only (no conditionals) |
**Result:** LLM sees only ONE set of instructions. No conditional logic in prompts. MCP code decides deterministically.
---
## Data Flow Diagram
```mermaid
flowchart TD
subgraph FIRST["🟒 First Run Pipeline"]
direction TB
A1["User: /analyse_contract doc.docx"]
A2["MCP: read_docx_for_analysis"]
A3["MCP: Check RealityAIDocID β†’ NOT FOUND"]
A4["MCP: Load analyse_contract_first_run.md"]
A5["MCP: Return instructions + markdown"]
A6["LLM: Follow first-run instructions"]
A7["LLM: Generate operations[]"]
A8["MCP: apply_edits_to_docx"]
A9["C#: ApplyOperations + Write comment_ids_map.json"]
A10["MCP: Save feedback.json with known_entries"]
A11["Output: transformed.docx"]
A1 --> A2 --> A3 --> A4 --> A5 --> A6 --> A7 --> A8 --> A9 --> A10 --> A11
end
subgraph RERUN["πŸ”„ Rerun Pipeline"]
direction TB
B1["User: Re-uploads reviewed.docx"]
B2["MCP: read_docx_for_analysis"]
B3["MCP: Check RealityAIDocID β†’ FOUND"]
B4["MCP: Detect rejections (compare known vs current)"]
B5["MCP: Record new rejections to feedback.json"]
B6["MCP: Load analyse_contract_rerun.md"]
B7["MCP: Return instructions + markdown + rejections + preferences"]
B8["LLM: Follow rerun instructions"]
B9["LLM: Generate operations[] + preferences_update"]
B10["MCP: apply_edits_to_docx"]
B11["MCP: update_preferences"]
B12["Output: transformed.docx"]
B1 --> B2 --> B3 --> B4 --> B5 --> B6 --> B7 --> B8 --> B9 --> B10 --> B11 --> B12
end
FIRST --> RERUN
```
### Byte-Level Flow: MCP β†’ C# β†’ Filesystem
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MCP: apply_edits_to_docx β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1. Write ops.json to /tmp/...transformed/ops.json β”‚
β”‚ Contents: { "meta": { "author": "Claude" }, "ops": [...] } β”‚
β”‚ β”‚
β”‚ 2. Invoke: dotnet run --project apply-ops -- β”‚
β”‚ --input bookmarked.docx β”‚
β”‚ --sidecar sidecar.json β”‚
β”‚ --ops ops.json β”‚
β”‚ --out-redline transformed.docx β”‚
β”‚ --log run.log β”‚
β”‚ --comment-ids-map comment_ids_map.json β”‚
β”‚ β”‚
β”‚ 3. C# writes: β”‚
β”‚ β”œβ”€β”€ transformed.docx (with comments + RealityAIDocID property) β”‚
β”‚ β”œβ”€β”€ run.log (JSONL of operation results) β”‚
β”‚ └── comment_ids_map.json (ALL operations + metadata) β”‚
β”‚ β”‚
β”‚ 4. MCP reads comment_ids_map.json β”‚
β”‚ Contents: { β”‚
β”‚ "document_id": "8A4E12345F9C7821", β”‚
β”‚ "operations": [ β”‚
β”‚ { β”‚
β”‚ "id": "op_001", β”‚
β”‚ "operation_type": "COMMENT", β”‚
β”‚ "durable_id": "5F9C7821", β”‚
β”‚ "comment_text": "...", β”‚
β”‚ "bookmark_id": "BMK_P_..." β”‚
β”‚ }, β”‚
β”‚ { β”‚
β”‚ "id": "op_002", β”‚
β”‚ "operation_type": "REWRITE", β”‚
β”‚ "original_text": "...", β”‚
β”‚ "new_text": "...", β”‚
β”‚ "bookmark_id": "BMK_P_..." β”‚
β”‚ } β”‚
β”‚ ] β”‚
β”‚ } β”‚
β”‚ β”‚
β”‚ 5. MCP reads sidecar.json to get document_context for each bookmark_id β”‚
β”‚ β”‚
β”‚ 6. MCP writes: /data/workspaces/<ws>/feedback/<doc_id>.feedback.json β”‚
β”‚ (Accumulates known_entries, doesn't replace) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
# DEEP DIVE: Implementation Details
---
## OpenXML Structure Reference
### Custom Document Property (RealityAIDocID)
**File location in DOCX ZIP:** `docProps/custom.xml`
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
<property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
pid="2"
name="RealityAIDocID">
<vt:lpwstr>8A4E12345F9C7821</vt:lpwstr>
</property>
</Properties>
```
### CommentsIds.xml (DurableId Storage)
**File location in DOCX ZIP:** `word/commentsIds.xml`
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w16cid:commentsIds
xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="w16cid">
<w16cid:commentId w16cid:paraId="3A7B9C12" w16cid:durableId="8A4E1234"/>
<w16cid:commentId w16cid:paraId="5D8F2E34" w16cid:durableId="5F9C7821"/>
</w16cid:commentsIds>
```
> [!NOTE]
> The `durableId` is already generated in [DocxEditor.cs:518-524](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs#L518-L524). We just need to capture and expose it.
### Sidecar File Format
**Generated by:** `document-conversion/docx-md-bookmarks`
**Purpose:** Maps bookmark IDs to document locations and text content
```json
{
"BMK_P_7b53b16af5864b3281a3e60c3f57f292": {
"text": "Payment shall be made within 30 days of invoice date.",
"type": "paragraph",
"location": {
"section": 0,
"paragraph": 12
}
},
"BMK_H_abc123": {
"text": "Section 3: Payment Terms",
"type": "heading",
"level": 2
}
}
```
**TypeScript interface:**
```typescript
interface SidecarEntry {
text: string;
type: "paragraph" | "heading";
location?: {
section: number;
paragraph: number;
};
level?: number; // For headings
}
type Sidecar = Record<string, SidecarEntry>;
```
---
## File Structure
```
reality-ai/
β”œβ”€β”€ reality-ai-mcp/src/
β”‚ β”œβ”€β”€ index.ts ← MCP server (MODIFY)
β”‚ └── feedback/ ← NEW FOLDER
β”‚ β”œβ”€β”€ invisible_tag_utils.ts ← Read Custom Props / commentsIds.xml
β”‚ β”œβ”€β”€ outcome_detector.ts ← Detect rejections (uses durable_id for ALL types)
β”‚ └── preferences_manager.ts ← Read/write preferences.md
β”‚
β”œβ”€β”€ content-insersion/apply-ops/
β”‚ β”œβ”€β”€ DocxEditor.cs ← MODIFY: Add ID mapping + RealityAIDocID + tracking comments
β”‚ └── Program.cs ← MODIFY: Add --comment-ids-map CLI option
β”‚
β”œβ”€β”€ .claude/commands/
β”‚ β”œβ”€β”€ analyse_contract.md ← Minimal router (calls MCP, follows instructions)
β”‚ β”œβ”€β”€ analyse_contract_first_run.md ← First-run instructions (NEW)
β”‚ └── analyse_contract_rerun.md ← Rerun instructions (NEW)
β”‚
β”œβ”€β”€ /data/reality-ai/
β”‚ └── preferences.md ← Global preferences (org-wide)
β”‚
└── /data/workspaces/<id>/feedback/
└── <document_id>.feedback.json ← Per-document feedback
```
> [!NOTE]
> `content_matcher.ts` is **no longer needed** since all operations use `durable_id` tracking.
---
## C# Code Changes (Exact Locations)
### Step 0: Add Hash-Based ID Generator
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
**Location:** Add as private method
```csharp
/// <summary>
/// Generates a unique, collision-resistant operation ID using SHA256 hash.
/// Includes timestamp for global uniqueness across runs.
/// </summary>
private string GenerateOperationId(string operationType, string bookmarkId, string content)
{
// Combine inputs that make this operation unique + timestamp
var input = $"{operationType}|{bookmarkId}|{content}|{DateTime.UtcNow.Ticks}";
using (var sha256 = System.Security.Cryptography.SHA256.Create())
{
var hashBytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
// Take first 8 bytes, convert to hex (16 characters)
return BitConverter.ToString(hashBytes, 0, 8).Replace("-", "").ToLowerInvariant();
}
}
```
**Result:** IDs like `a3f5e8c2b1d4f6e9` (unique, collision-resistant across all runs)
### Step 1: Add ID Mapping Fields
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
**Location:** After `_existingParaIds` field
```csharp
// Add new fields to track operation-to-durableId mapping
private readonly List<OperationIdEntry> _operationIdEntries = new();
// Tracking comment author (invisible in normal view)
private const string TRACKING_AUTHOR = "__RealityAI_Tracker__";
// Helper class to store operation details (ALL types now have durable_id)
public class OperationIdEntry
{
public string Id { get; set; } = ""; // Hash-based unique ID
public string OperationType { get; set; } = ""; // COMMENT, REWRITE, INSERT, DELETE
public string DurableId { get; set; } = ""; // ALL types now have this
public string CommentText { get; set; } = "";
public string BookmarkId { get; set; } = "";
public string OriginalText { get; set; } = ""; // For REWRITE, DELETE
public string NewText { get; set; } = ""; // For REWRITE, INSERT
public string Position { get; set; } = ""; // For INSERT (before/after)
}
```
### Step 2: Add Invisible Tracking Comment Method
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
**Purpose:** Create tracking comments that are invisible to users but generate durable_id for tracking
```csharp
/// <summary>
/// Adds an invisible tracking comment (uses special author, empty text).
/// Returns the durableId for tracking.
/// </summary>
private string AddTrackingComment(OpenXmlElement targetElement)
{
// Generate new comment ID
string commentId = GetNextCommentId();
// Add comment with tracking author and empty text
string durableId = AddCommentToPart(
commentId: commentId,
commentText: "", // Empty - invisible
author: TRACKING_AUTHOR // Special author name
);
// Add comment reference to target element
AddCommentReferenceToElement(targetElement, commentId);
return durableId;
}
```
### Step 3: Modify AddCommentToPart to Accept Author
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
**Location:** `AddCommentToPart()` method
```diff
- private void AddCommentToPart(string commentId, string commentText)
+ private string AddCommentToPart(string commentId, string commentText, string author = null)
{
+ // Use provided author or default
+ author ??= _authorName;
// ... existing code ...
string durableId = EnsureCommentsIdsEntry(paraId);
EnsureCommentsExtensibleEntry(durableId, normalizedDate);
- EnsureAuthorInPeoplePart(author);
+ EnsureAuthorInPeoplePart(author); // Now uses parameter
+
+ return durableId;
}
```
### Step 4: Capture Operation Details for ALL Types (with Invisible Tracking)
Modify all operation methods to add tracking comments and use hash-based IDs:
**For COMMENT operations:**
```csharp
public void CommentOnParagraph(Operation op)
{
// ... existing logic ...
string durableId = AddCommentToPart(commentId, op.Comment ?? ""); // Uses default author
_operationIdEntries.Add(new OperationIdEntry
{
Id = GenerateOperationId("COMMENT", op.Id, op.Comment ?? ""),
OperationType = "COMMENT",
DurableId = durableId,
CommentText = op.Comment ?? "",
BookmarkId = op.Id
});
}
```
**For REWRITE operations:**
```csharp
public void RewriteParagraph(Operation op)
{
// Get original text before rewriting
string originalText = GetParagraphText(op.Id);
var paragraph = FindParagraph(op.Id);
// ... existing rewrite logic (replace paragraph text) ...
// NEW: Add invisible tracking comment
string durableId = AddTrackingComment(paragraph);
_operationIdEntries.Add(new OperationIdEntry
{
Id = GenerateOperationId("REWRITE", op.Id, op.NewText ?? ""),
OperationType = "REWRITE",
DurableId = durableId, // NOW has durable_id for tracking
CommentText = op.Comment ?? "",
BookmarkId = op.Id,
OriginalText = originalText,
NewText = op.NewText ?? ""
});
}
```
**For DELETE operations (with hidden paragraph):**
```csharp
public void DeleteParagraph(Operation op)
{
// Get original text before deleting
string originalText = GetParagraphText(op.Id);
var paragraph = FindParagraph(op.Id);
var parent = paragraph.Parent;
int insertIndex = parent.ChildElements.ToList().IndexOf(paragraph);
// ... existing delete logic (remove paragraph) ...
// NEW: Insert hidden tracking paragraph at deletion location
var hiddenPara = CreateHiddenTrackingParagraph();
parent.InsertAt(hiddenPara, insertIndex);
// Add tracking comment to hidden paragraph
string durableId = AddTrackingComment(hiddenPara);
_operationIdEntries.Add(new OperationIdEntry
{
Id = GenerateOperationId("DELETE", op.Id, originalText),
OperationType = "DELETE",
DurableId = durableId, // NOW has durable_id for tracking
CommentText = op.Comment ?? "",
BookmarkId = op.Id,
OriginalText = originalText
});
}
/// <summary>
/// Creates a hidden paragraph for tracking deleted operations.
/// Uses w:vanish to make completely invisible.
/// </summary>
private Paragraph CreateHiddenTrackingParagraph()
{
var para = new Paragraph();
var pPr = new ParagraphProperties();
pPr.Append(new DocumentFormat.OpenXml.Wordprocessing.Vanish()); // Hidden
para.Append(pPr);
para.Append(new Run(new Text(""))); // Empty content
return para;
}
```
**For INSERT operations:**
```csharp
public void InsertParagraph(Operation op)
{
// ... existing insert logic ...
var insertedParagraph = CreateAndInsertParagraph(op); // Your existing method
// NEW: Add invisible tracking comment to inserted paragraph
string durableId = AddTrackingComment(insertedParagraph);
_operationIdEntries.Add(new OperationIdEntry
{
Id = GenerateOperationId("INSERT", op.Id, op.Text ?? ""),
OperationType = "INSERT",
DurableId = durableId, // NOW has durable_id for tracking
CommentText = op.Comment ?? "",
BookmarkId = op.Id,
NewText = op.Text ?? "",
Position = op.Position ?? "after"
});
}
```
### Step 5: Add GetOperationIdEntries Method
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
**Location:** After `ApplyOperations()`
```csharp
/// <summary>
/// Returns the list of operation entries with durableIds and context.
/// </summary>
public List<OperationIdEntry> GetOperationIdEntries()
{
return new List<OperationIdEntry>(_operationIdEntries);
}
```
### Step 6: Add EnsureDocumentId Method
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
```csharp
/// <summary>
/// Ensures the document has a RealityAIDocID custom property.
/// </summary>
public string EnsureDocumentId()
{
var customProps = _doc.CustomFilePropertiesPart ??
_doc.AddCustomFilePropertiesPart();
if (customProps.Properties == null)
customProps.Properties = new DocumentFormat.OpenXml.CustomProperties.Properties();
// Check for existing property
var existing = customProps.Properties
.OfType<DocumentFormat.OpenXml.CustomProperties.CustomDocumentProperty>()
.FirstOrDefault(p => p.Name?.Value == "RealityAIDocID");
if (existing != null)
return existing.InnerText;
// Generate new 16-character hex ID (no dashes)
var bytes = new byte[8];
System.Security.Cryptography.RandomNumberGenerator.Fill(bytes);
string docId = BitConverter.ToString(bytes).Replace("-", "");
// Find next property ID
int nextPid = customProps.Properties
.OfType<DocumentFormat.OpenXml.CustomProperties.CustomDocumentProperty>()
.Select(p => p.PropertyId?.Value ?? 0)
.DefaultIfEmpty(1)
.Max() + 1;
var newProp = new DocumentFormat.OpenXml.CustomProperties.CustomDocumentProperty
{
FormatId = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}",
PropertyId = nextPid,
Name = "RealityAIDocID"
};
newProp.Append(new DocumentFormat.OpenXml.VariantTypes.VTLPWSTR(docId));
customProps.Properties.Append(newProp);
return docId;
}
```
### Step 7: Add GetParagraphText Helper Method
**File:** [DocxEditor.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/DocxEditor.cs)
```csharp
/// <summary>
/// Gets the text content of a paragraph by bookmark ID.
/// </summary>
private string GetParagraphText(string bookmarkId)
{
if (_index.TryGetValue(bookmarkId, out var element))
{
return element.InnerText;
}
return "";
}
```
### Step 8: Modify Program.cs
**File:** [Program.cs](file:///home/ubuntu/Tabular_Review-Private/reality-ai/content-insersion/apply-ops/Program.cs)
**Add CLI option:**
```csharp
var commentIdsMapOption = new Option<FileInfo?>(
aliases: new[] { "--comment-ids-map", "-m" },
description: "Output JSON file mapping operations to Word durableIds with context.");
```
**Register option:**
```csharp
rootCommand.AddOption(commentIdsMapOption);
```
**Update handler signature and binding:**
```csharp
rootCommand.SetHandler(async (
inputFile,
sidecarFile,
opsFile,
outRedlineFile,
logFile,
commentIdsMapFile // ← NEW PARAMETER
) =>
{
// ... existing code to apply operations ...
// Ensure document has tracking ID
string documentId = editor.EnsureDocumentId();
Console.WriteLine($"RealityAIDocID: {documentId}");
// Export operation entries with full context
if (commentIdsMapFile != null)
{
var entries = editor.GetOperationIdEntries();
var mapOutput = new
{
document_id = documentId,
operations = entries.Select(e => new
{
id = e.Id,
index = e.Index,
operation_type = e.OperationType,
durable_id = e.DurableId,
comment_text = e.CommentText,
bookmark_id = e.BookmarkId,
original_text = e.OriginalText,
new_text = e.NewText,
position = e.Position
}).ToArray()
};
var json = System.Text.Json.JsonSerializer.Serialize(mapOutput,
new System.Text.Json.JsonSerializerOptions { WriteIndented = true });
File.WriteAllText(commentIdsMapFile.FullName, json);
Console.WriteLine($"Wrote comment_ids_map.json to: {commentIdsMapFile.FullName}");
}
},
inputOption,
sidecarOption,
opsOption,
outRedlineOption,
logOption,
commentIdsMapOption // ← ADD TO BINDING
);
```
---
## MCP TypeScript Changes
### Enhance read_docx_for_analysis
**File:** [index.ts](file:///home/ubuntu/Tabular_Review-Private/reality-ai/reality-ai-mcp/src/index.ts)
The `read_docx_for_analysis` tool is enhanced to:
1. Check for `RealityAIDocID` (determines first-run vs rerun)
2. If rerun, detect rejections and load history
3. Load appropriate prompt file from disk
4. Return instructions + data
```typescript
if (name === "read_docx_for_analysis") {
const requestId = logToolStart("read_docx_for_analysis", args);
try {
const { filePath } = z.object({ filePath: z.string() }).parse(args);
const workspaceId = getWorkspaceIdFromPath(filePath);
// Step 1: Convert DOCX to markdown (existing logic)
// ... existing conversion code ...
const markdownContent = await fs.readFile(outMd, "utf-8");
// Step 2: Check for RealityAIDocID
const { getDocumentId } = await import("./feedback/invisibleTagUtils");
const documentId = await getDocumentId(filePath);
const isRerun = documentId !== null;
// Step 3: If rerun, detect and record rejections
let rejectedHistory = "(No previous rejections)";
let preferences = "# Review Preferences\n\n(No preferences set yet)";
if (isRerun && workspaceId) {
// Detect rejections
const { detectRejections, recordRejections } = await import("./feedback/outcomeDetector");
const rejectionData = await detectRejections(filePath, workspaceId);
if (rejectionData.newRejections.length > 0) {
await recordRejections(workspaceId, documentId, rejectionData.newRejections);
}
// Load formatted rejection history
const { formatRejectionsForLLM } = await import("./feedback/outcomeDetector");
rejectedHistory = await formatRejectionsForLLM(workspaceId, documentId);
// Load preferences
const { readPreferences } = await import("./feedback/preferencesManager");
preferences = await readPreferences();
}
// Step 4: Load appropriate prompt file
const promptPath = isRerun
? ".claude/commands/analyse_contract_rerun.md"
: ".claude/commands/analyse_contract_first_run.md";
let instructions = "";
try {
instructions = await fs.readFile(promptPath, "utf-8");
} catch {
instructions = "Error: Could not load prompt file.";
}
// Step 5: Return structured response
const responseMetadata = {
mode: isRerun ? "RERUN" : "FIRST_RUN",
document_id: documentId,
original_file_path: filePath,
artifacts: {
markdown_path: outMd,
sidecar_path: outSidecar,
bookmarked_docx_path: outBookmarked
}
};
return {
content: [
{ type: "text", text: JSON.stringify(responseMetadata, null, 2) },
{ type: "text", text: `# Instructions\n\n${instructions}` },
{ type: "text", text: `# Document Content\n\n${markdownContent}` },
{ type: "text", text: `# Previous Rejections\n\n${rejectedHistory}` },
{ type: "text", text: `# Organizational Preferences\n\n${preferences}` }
]
};
} catch (error: any) {
logToolError(requestId, name, error, Date.now() - startTime);
throw error;
}
}
```
### New MCP Tool: update_preferences
**Add to tool list:**
```typescript
{
name: "update_preferences",
description: "Updates organizational preferences based on rejection feedback.",
inputSchema: {
type: "object",
properties: {
action: {
type: "string",
enum: ["append", "replace"],
description: "Whether to append new guidance or replace entire file"
},
lines: {
type: "array",
items: { type: "string" },
description: "Markdown lines to append or use as replacement"
}
},
required: ["action", "lines"]
}
}
```
**Handler implementation:**
```typescript
if (name === "update_preferences") {
const requestId = logToolStart("update_preferences", args);
try {
const { action, lines } = z.object({
action: z.enum(["append", "replace"]),
lines: z.array(z.string())
}).parse(args);
const { updatePreferences } = await import("./feedback/preferencesManager");
await updatePreferences(action, lines);
mcpLog({
event: "PREFERENCES_UPDATED",
tool_name: name,
metadata: { action, linesCount: lines.length }
}, requestId);
logToolSuccess(requestId, name, { action, linesCount: lines.length }, Date.now() - startTime);
return {
content: [{
type: "text",
text: JSON.stringify({ status: "success", action, linesCount: lines.length })
}]
};
} catch (error: any) {
logToolError(requestId, name, error, Date.now() - startTime);
throw error;
}
}
```
### Enhance apply_edits_to_docx (Critical Integration)
> [!IMPORTANT]
> This section wires the C# `--comment-ids-map` output back into the MCP layer to populate `feedback.json`. Without this, the feedback loop is broken.
**File:** [index.ts](file:///home/ubuntu/Tabular_Review-Private/reality-ai/reality-ai-mcp/src/index.ts)
**Add to `apply_edits_to_docx` handler** (after existing dotnet execution):
```typescript
if (name === "apply_edits_to_docx") {
// ... existing validation and setup code ...
const opsFilePath = path.join(outputDir, "ops.json");
const logPath = path.join(outputDir, "run.log");
const commentIdsMapPath = path.join(outputDir, "comment_ids_map.json"); // ← NEW
// ... write ops.json ...
// UPDATED: Add --comment-ids-map argument to dotnet command
const dotnetCmd = [
"run",
"--project", APPLY_OPS_PROJECT_PATH,
"--",
"--input", bookmarked_docx_path,
"--sidecar", sidecar_path,
"--ops", opsFilePath,
"--out-redline", finalDocxPath,
"--log", logPath,
"--comment-ids-map", commentIdsMapPath // ← NEW: Export operation-to-durableId mapping
];
// ... existing dotnet execution ...
// NEW: After successful dotnet execution, populate feedback.json
try {
const commentIdsMapContent = await fs.readFile(commentIdsMapPath, "utf-8");
const commentIdsMap = JSON.parse(commentIdsMapContent);
const workspaceId = getWorkspaceIdFromPath(original_file_path);
if (workspaceId && commentIdsMap.document_id) {
const feedbackDir = path.join(`/data/workspaces/${workspaceId}/feedback`);
await ensureDirectoryExists(feedbackDir);
const feedbackFile = path.join(feedbackDir, `${commentIdsMap.document_id}.feedback.json`);
// Load existing or create new feedback
let feedback = {
document_id: commentIdsMap.document_id,
created_at: new Date().toISOString(),
last_updated: new Date().toISOString(),
known_entries: [],
rejections: []
};
try {
const existing = await fs.readFile(feedbackFile, "utf-8");
feedback = JSON.parse(existing);
} catch { /* New file */ }
// Load sidecar for document_context
const sidecar_content = await fs.readFile(sidecar_path, "utf-8");
const sidecar = JSON.parse(sidecar_content);
// DEDUPLICATION: Get existing IDs to avoid duplicates
const existing_ids = new Set(feedback.known_entries.map(e => e.id));
// Append new operations (with deduplication)
for (const op of comment_ids_map.operations) {
// Skip if already exists
if (existing_ids.has(op.id)) {
continue;
}
const document_context = sidecar[op.bookmark_id]?.text || "";
feedback.known_entries.push({
id: op.id,
operation_type: op.operation_type,
durable_id: op.durable_id, // ALL operations now have this
comment_text: op.comment_text,
document_context: document_context,
bookmark_id: op.bookmark_id,
original_text: op.original_text,
new_text: op.new_text,
position: op.position,
created_at: new Date().toISOString()
});
}
feedback.last_updated = new Date().toISOString();
await fs.writeFile(feedbackFile, JSON.stringify(feedback, null, 2));
mcpLog({
event: "FEEDBACK_POPULATED",
tool_name: name,
metadata: {
documentId: commentIdsMap.document_id,
entriesAdded: commentIdsMap.operations.length
}
}, requestId);
}
} catch (feedbackError: any) {
// Log but don't fail the operation
mcpLog({
event: "FEEDBACK_POPULATION_FAILED",
tool_name: name,
error: feedbackError?.message || String(feedbackError)
}, requestId);
}
// ... existing response return ...
}
```
---
### New File: outcomeDetector.ts (Simplified - All Types Use durable_id)
**File:** `reality-ai-mcp/src/feedback/outcome_detector.ts` (NEW)
> [!IMPORTANT]
> Since ALL operations now have tracking comments with `durable_id`, detection is **uniform and 100% reliable**. No content matching needed.
```typescript
import * as fs from "fs/promises";
import * as path from "path";
import { get_document_id, extract_durable_ids } from "./invisible_tag_utils";
interface KnownEntry {
id: string;
operation_type: string;
durable_id: string; // ALL types now have this (required)
comment_text: string;
document_context?: string;
bookmark_id: string;
original_text?: string;
new_text?: string;
position?: string;
created_at: string;
}
interface Rejection extends Omit<KnownEntry, 'created_at'> {
reason: string;
rejected_at: string;
}
interface FeedbackFile {
document_id: string;
created_at: string;
last_updated: string;
known_entries: KnownEntry[];
rejections: Rejection[];
}
/**
* Detects rejected operations by checking if tracking comments were deleted.
* Works uniformly for ALL operation types (COMMENT, REWRITE, INSERT, DELETE).
*/
export async function detect_rejections(
docx_path: string,
workspace_id: string
): Promise<{ document_id: string | null; new_rejections: Array<KnownEntry & { reason: string }> }> {
const document_id = await get_document_id(docx_path);
if (!document_id) return { document_id: null, new_rejections: [] };
const feedback_dir = path.join(`/data/workspaces/${workspace_id}/feedback`);
const feedback_file = path.join(feedback_dir, `${document_id}.feedback.json`);
let tracking: FeedbackFile;
try {
const content = await fs.readFile(feedback_file, "utf-8");
tracking = JSON.parse(content);
} catch {
return { document_id, new_rejections: [] };
}
// Get all durable_ids currently in the document
const current_durable_ids = new Set(await extract_durable_ids(docx_path));
const already_rejected_ids = new Set(tracking.rejections.map(r => r.id));
const new_rejections: Array<KnownEntry & { reason: string }> = [];
for (const entry of tracking.known_entries) {
// Skip if already rejected
if (already_rejected_ids.has(entry.id)) continue;
// UNIFIED DETECTION: Check if tracking durable_id is missing
// Works for ALL operation types (no switch statement needed)
if (!current_durable_ids.has(entry.durable_id)) {
const reason = get_rejection_reason(entry.operation_type);
new_rejections.push({ ...entry, reason });
}
}
return { document_id, new_rejections };
}
/**
* Returns human-readable rejection reason based on operation type.
*/
function get_rejection_reason(operation_type: string): string {
switch (operation_type) {
case "COMMENT":
return "Comment was deleted by user";
case "REWRITE":
return "Rewrite was rejected - user reverted the change";
case "DELETE":
return "Deletion was rejected - user restored the paragraph";
case "INSERT":
return "Insertion was rejected - user removed the inserted text";
default:
return "Operation was rejected by user";
}
}
export async function recordRejections(
workspaceId: string,
documentId: string,
rejectedEntries: Array<KnownEntry & { reason: string }>
): Promise<void> {
const feedbackDir = path.join(`/data/workspaces/${workspaceId}/feedback`);
const feedbackFile = path.join(feedbackDir, `${documentId}.feedback.json`);
const content = await fs.readFile(feedbackFile, "utf-8");
const tracking: FeedbackFile = JSON.parse(content);
const now = new Date().toISOString();
for (const entry of rejectedEntries) {
tracking.rejections.push({
id: entry.id,
operation_type: entry.operation_type,
durable_id: entry.durable_id,
comment_text: entry.comment_text,
document_context: entry.document_context,
bookmark_id: entry.bookmark_id,
original_text: entry.original_text,
new_text: entry.new_text,
reason: entry.reason,
rejected_at: now
});
}
tracking.last_updated = now;
await fs.writeFile(feedbackFile, JSON.stringify(tracking, null, 2));
}
export async function formatRejectionsForLLM(
workspaceId: string,
documentId: string
): Promise<string> {
const feedbackDir = path.join(`/data/workspaces/${workspaceId}/feedback`);
const feedbackFile = path.join(feedbackDir, `${documentId}.feedback.json`);
let tracking: FeedbackFile;
try {
const content = await fs.readFile(feedbackFile, "utf-8");
tracking = JSON.parse(content);
} catch {
return "(No previous rejections)";
}
if (tracking.rejections.length === 0) {
return "(No previous rejections)";
}
return tracking.rejections.map((r, i) => {
let details = "";
if (r.operation_type === "COMMENT") {
details = `**AI Comment (Rejected):**
> ${r.comment_text}
**Document Text:**
> ${r.document_context || "(context not available)"}`;
} else if (r.operation_type === "REWRITE") {
details = `**AI Suggested Rewrite (Rejected):**
Original: "${r.original_text}"
Suggested: "${r.new_text}"
Comment: ${r.comment_text}`;
} else if (r.operation_type === "DELETE") {
details = `**AI Suggested Deletion (Rejected):**
Paragraph: "${r.original_text}"
Comment: ${r.comment_text}`;
} else if (r.operation_type === "INSERT") {
details = `**AI Suggested Insertion (Rejected):**
Text: "${r.new_text}"
Comment: ${r.comment_text}`;
}
return `### Rejected ${r.operation_type} ${i + 1}
${details}`;
}).join("\n\n---\n\n");
}
```
### New File: invisible_tag_utils.ts
**File:** `reality-ai-mcp/src/feedback/invisible_tag_utils.ts` (NEW)
```typescript
import AdmZip from "adm-zip";
import { XMLParser } from "fast-xml-parser";
export async function get_document_id(docx_path: string): Promise<string | null> {
const zip = new AdmZip(docx_path);
const custom_props_entry = zip.getEntry("docProps/custom.xml");
if (!custom_props_entry) return null;
const xml_content = custom_props_entry.getData().toString("utf-8");
const parser = new XMLParser({ ignoreAttributes: false, attributeNamePrefix: "@_" });
const parsed = parser.parse(xml_content);
const properties = parsed.Properties?.property;
if (!properties) return null;
const prop_array = Array.isArray(properties) ? properties : [properties];
const reality_prop = prop_array.find((p: any) => p["@_name"] === "RealityAIDocID");
return reality_prop?.["vt:lpwstr"] || null;
}
export async function extract_durable_ids(docx_path: string): Promise<string[]> {
const zip = new AdmZip(docx_path);
const comments_ids_entry = zip.getEntry("word/commentsIds.xml");
if (!comments_ids_entry) return [];
const xml_content = comments_ids_entry.getData().toString("utf-8");
const parser = new XMLParser({ ignoreAttributes: false, attributeNamePrefix: "@_" });
const parsed = parser.parse(xml_content);
const comments_ids = parsed["w16cid:commentsIds"];
if (!comments_ids) return [];
const comment_id_elements = comments_ids["w16cid:commentId"];
if (!comment_id_elements) return [];
const elements = Array.isArray(comment_id_elements) ? comment_id_elements : [comment_id_elements];
return elements
.map((el: any) => el["@_w16cid:durableId"])
.filter((id: any): id is string => typeof id === "string");
}
```
**Required npm packages:**
```bash
npm install fast-xml-parser adm-zip
npm install --save-dev @types/adm-zip
```
### New File: preferences_manager.ts
**File:** `reality-ai-mcp/src/feedback/preferences_manager.ts` (NEW)
```typescript
import * as fs from "fs/promises";
const PREFERENCES_PATH = "/data/reality-ai/preferences.md";
export async function read_preferences(): Promise<string> {
try {
return await fs.readFile(PREFERENCES_PATH, "utf-8");
} catch {
return "# Review Preferences\n\n(No preferences set yet)";
}
}
export async function update_preferences(
action: "append" | "replace",
lines: string[]
): Promise<void> {
if (lines.length === 0) return;
// Ensure directory exists
const dir = "/data/reality-ai";
try {
await fs.mkdir(dir, { recursive: true });
} catch { /* ignore */ }
if (action === "replace") {
const content = lines.join("\n");
await fs.writeFile(PREFERENCES_PATH, content);
} else if (action === "append") {
const existing = await read_preferences();
const new_content = existing + "\n\n" + lines.join("\n");
await fs.writeFile(PREFERENCES_PATH, new_content);
}
}
```
---
## Preferences File
**Location:** `/data/reality-ai/preferences.md`
**When read:** By MCP during `read_docx_for_analysis` for reruns
**When written:** By MCP when LLM calls `update_preferences`
**Example content (after learning):**
```markdown
# Review Preferences
## Payment Terms
Standard 30-60 day payment terms are generally acceptable for commercial contracts.
Only flag payment terms shorter than 30 days or longer than 90 days.
## Limitation of Liability
Limitation of liability clauses capped at 100% of contract value are standard
practice. Only flag caps below 50% or uncapped liability.
## Standard Clauses
Standard force majeure language does not require commentary unless it contains
unusual exclusions, extended durations beyond 180 days, or automatic termination
triggers.
```
---
## Error Handling Strategy
| Scenario | Detection | Behavior |
|----------|-----------|----------|
| **Missing RealityAIDocID** | `getDocumentId()` returns null | Treat as first run, load first-run prompt |
| **Corrupt commentsIds.xml** | XML parse throws | Log warning, assume all comments deleted |
| **Tracking file missing** | File read throws | Create new empty tracking file |
| **Sidecar missing** | File read throws | Continue with empty `document_context` |
| **Prompt file missing** | File read throws | Return error message |
| **C# tool failure** | dotnet throws | Propagate error to user via MCP |
| **Content matching fails** | Text comparison fails | Skip that entry, log warning |
---
## How Rejections Are Sent to LLM
When sending `rejected_history` to the LLM, we format it as **clean text blocks** β€” no IDs, no timestamps, no internal data.
**Example `rejected_history` sent to LLM:**
```markdown
### Rejected COMMENT 1
**AI Comment (Rejected):**
> Consider removing this limitation of liability clause as it may expose the Client to significant unrecoverable losses.
**Document Text:**
> The Contractor shall not be liable for any indirect, consequential, or punitive damages arising out of or in connection with this Agreement.
---
### Rejected REWRITE 2
**AI Suggested Rewrite (Rejected):**
Original: "Payment within 30 days"
Suggested: "Payment within 45 days"
Comment: Extended payment terms for better cash flow
---
### Rejected DELETE 3
**AI Suggested Deletion (Rejected):**
Paragraph: "This clause shall be subject to annual review..."
Comment: Removed redundant review clause
```
> [!NOTE]
> The LLM sees the operation type, the original context, and the AI's suggestion. No `durable_id`, no `bookmark_id`, no timestamps. This keeps the prompt clean and focused.
---
## Prompt: `analyse_contract.md` (Minimal Router)
**File:** `.claude/commands/analyse_contract.md`
```markdown
---
description: Analyse a legal contract DOCX using the Reality AI MCP server and apply redlines.
argument-hint: "<path-to-docx>"
allowed-tools:
- "mcp__reality-ai__read_docx_for_analysis"
- "mcp__reality-ai__apply_edits_to_docx"
- "mcp__reality-ai__update_preferences"
---
Call `read_docx_for_analysis` with the file path.
The tool will return:
1. A `mode` field indicating FIRST_RUN or RERUN
2. Instructions to follow (loaded from the appropriate prompt file)
3. Document content
4. Rejection history (if rerun)
5. Preferences (if rerun)
Follow the instructions provided in the response.
```
---
## Prompt: `analyse_contract_first_run.md`
**File:** `.claude/commands/analyse_contract_first_run.md`
```markdown
You are the **Reality AI Legal Analyst** performing a first-time analysis.
## Your Task
1. Read the document content provided
2. Analyse the contract for legal and commercial risks
3. Generate operations[] for each issue found
4. Call `apply_edits_to_docx` with your operations
## Analysis Focus
Perform a legal review as a senior counsel, focusing on:
- **Liability and risk**: Indemnity clauses, caps, exclusions
- **Payment and termination**: Payment terms, suspension rights, fees
- **Scope and standards**: Code exclusions, vague references
- **Tax and pass-through costs**: VAT, withholding tax
## Operation Types
| Type | Required Fields |
|------|-----------------|
| `COMMENT` | `Id`, `Type`, `Comment`, `Scope` |
| `REWRITE_PARAGRAPH` | `Id`, `Type`, `NewText`, `Comment` |
| `INSERT` | `Id`, `Type`, `Text`, `Position`, `Comment` |
| `DELETE_PARAGRAPH` | `Id`, `Type`, `Comment` |
## Output
Generate your operations and call `apply_edits_to_docx`.
After success, report:
```
βœ… Done! Your redlined contract is ready.
**Path:** [final_docx_path]
**Summary:** Added X comments, Y rewrites, Z other operations.
```
```
---
## Prompt: `analyse_contract_rerun.md`
**File:** `.claude/commands/analyse_contract_rerun.md`
```markdown
You are the **Reality AI Legal Analyst** performing a **rerun analysis** with feedback.
This document has been analyzed before. You have access to:
1. **Previously Rejected Operations** β€” Suggestions the user rejected
2. **Preferences** β€” Organizational guidance learned from past feedback
## Instructions
1. **Review the rejection history** β€” Understand what was rejected and why
2. **Follow the preferences** β€” These represent learned organizational wisdom
3. **DO NOT suggest similar items** β€” Avoid patterns matching the rejections
4. **Analyze remaining risks** β€” Focus on issues not covered by rejections
5. **Update preferences** β€” Learn from the rejections by updating preferences
## Preferences Update
**Every rejection should inform your preferences.** Use the rich context (document text + rejected suggestion) to write nuanced professional guidance.
When writing preferences:
- Write as **professional legal guidance**, not a rejection log
- Include **nuance and context**
- Be **specific about thresholds** (e.g., "below 50%" not "low caps")
| Action | When to Use |
|--------|-------------|
| `append` | Adding guidance for a new topic |
| `replace` | Improving/refining existing guidance (copy existing sections you want to keep) |
**Avoid duplicates:** Check the current preferences before appending. If a section already exists, use `replace` to update it instead.
## Output
1. Generate your operations and call `apply_edits_to_docx`
2. Call `update_preferences` with your learned guidance
Example preferences update:
```json
{
"action": "append",
"lines": [
"## Payment Terms",
"Standard 30-60 day payment terms are acceptable for commercial contracts.",
"Only flag terms shorter than 30 days or longer than 90 days."
]
}
```
After success, report:
```
βœ… Done! Your redlined contract is ready (with feedback applied).
**Path:** [final_docx_path]
**Summary:** Added X operations. Avoided Y previously rejected patterns.
**Preferences:** Updated with new guidance.
```
```
---
## Implementation Checklist
| Step | Component | File | Description |
|------|-----------|------|-------------|
| 0 | C# | DocxEditor.cs | Add `GenerateOperationId()` hash-based ID generator |
| 1 | C# | DocxEditor.cs | Add `OperationIdEntry` class (ALL types have `durable_id`) |
| 2 | C# | DocxEditor.cs | Add `AddTrackingComment()` for invisible tracking |
| 3 | C# | DocxEditor.cs | Modify `AddCommentToPart()` to accept author parameter |
| 4 | C# | DocxEditor.cs | Update ALL operation methods to add tracking comments |
| 5 | C# | DocxEditor.cs | Add `CreateHiddenTrackingParagraph()` for DELETE tracking |
| 6 | C# | DocxEditor.cs | Add `GetOperationIdEntries()` method |
| 7 | C# | DocxEditor.cs | Add `EnsureDocumentId()` method |
| 8 | C# | Program.cs | Add `--comment-ids-map` CLI option + handler binding |
| 9 | MCP | index.ts | Enhance `read_docx_for_analysis` with prompt routing |
| 10 | MCP | index.ts | Enhance `apply_edits_to_docx` with `--comment-ids-map` + deduplication |
| 11 | MCP | index.ts | Add `update_preferences` tool |
| 12 | MCP | new file | Create `feedback/invisible_tag_utils.ts` |
| 13 | MCP | new file | Create `feedback/outcome_detector.ts` (simplified durable_id check) |
| 14 | MCP | new file | Create `feedback/preferences_manager.ts` |
| 15 | Prompts | new file | Create `analyse_contract.md` (router) |
| 16 | Prompts | new file | Create `analyse_contract_first_run.md` |
| 17 | Prompts | new file | Create `analyse_contract_rerun.md` |
---
## Testing Strategy
### Unit Tests
**C# Tests:**
- [ ] `GenerateOperationId()` produces unique hashes
- [ ] `EnsureDocumentId()` creates custom property correctly
- [ ] `EnsureDocumentId()` returns existing ID if already present
- [ ] `GetOperationIdEntries()` captures all operation types with durable_id
- [ ] `AddTrackingComment()` creates invisible comment with special author
- [ ] `CreateHiddenTrackingParagraph()` creates paragraph with `<w:vanish/>`
**TypeScript Tests:**
- [ ] `get_document_id()` extracts RealityAIDocID from custom.xml
- [ ] `extract_durable_ids()` parses commentsIds.xml correctly
- [ ] `detect_rejections()` identifies missing durable_ids for ALL operation types
- [ ] `format_rejections_for_llm()` produces correct markdown
- [ ] `update_preferences()` appends and replaces correctly
- [ ] Deduplication logic skips entries with existing IDs
### Integration Tests
**Scenario 1: First Run**
- Upload new contract (no prior analysis)
- Verify RealityAIDocID created in DOCX
- Verify feedback.json created with known_entries (all have durable_id)
- Verify mode = FIRST_RUN in response
**Scenario 2: Rejection Detection (All Types)**
- Upload previously analyzed document
- User has deleted tracking comments for various operations
- Verify rejections detected via durable_id for ALL types (COMMENT, REWRITE, INSERT, DELETE)
- Verify rejection history formatted correctly
**Scenario 3: Multi-Run Accumulation + Deduplication**
- Run 1: AI suggests 3 operations
- Run 2: Re-run WITHOUT changes
- Verify feedback.json does NOT have duplicates (deduplication working)
**Scenario 4: Preferences Learning**
- Rejection occurs
- LLM calls `update_preferences`
- Verify preferences.md updated correctly
- Verify next run uses updated preferences
### Manual Testing
- [ ] Validate durable_ids persist across Word versions (Desktop, Online)
- [ ] Confirm RealityAIDocID survives file rename and email
- [ ] Test invisible tracking comments don't show in normal view
- [ ] Test with 100+ page document (performance)
- [ ] Verify prompt routing works correctly
### Performance Benchmarks
- [ ] 10-page document: Process in < 5s
- [ ] 50-page document: Process in < 15s
- [ ] 100-page document: Process in < 30s
---
## Success Criteria
- [ ] **Zero visible tags** in the Word document (invisible tracking comments)
- [ ] Hash-based operation IDs prevent collisions across runs
- [ ] RealityAIDocID persists even if file is renamed
- [ ] LLM sees zero IDs (pure text in/out)
- [ ] Word Online compatible (durable_id is standard)
- [ ] **ALL operation types tracked via durable_id** (COMMENT, REWRITE, INSERT, DELETE)
- [ ] Deduplication prevents duplicate entries in feedback.json
- [ ] `known_entries` accumulates across multiple runs
- [ ] Rejections include full document_context for LLM learning
- [ ] Every rejection immediately informs preferences
- [ ] Preferences are written as professional guidance
- [ ] Rejected operations are not re-suggested on rerun
- [ ] **Deterministic prompt selection** (MCP routes, not LLM)
- [ ] **snake_case naming** across all JSON keys and TypeScript
---
> **Next Step**: Start with the C# changes (Steps 0-8) as they are the foundation.