Spaces:

masharma
/

roar-item-generator

Sleeping

App Files Files Community

masharma commited on Mar 3

Commit

73da86c

verified ·

1 Parent(s): 3f3323d

Upload 11 files

Browse files

Files changed (11) hide show

README_HF.md +24 -0
app_gradio.py +307 -0
difficulty_estimator.py +189 -0
models/.DS_Store +0 -0
models/grade_columns.pkl +3 -0
models/pca.pkl +3 -0
models/ridge_model.pkl +3 -0
models/scaler_emb.pkl +3 -0
models/scaler_features.pkl +3 -0
prompts/roar_prompt.md +152 -0
requirements_hf.txt +9 -0

README_HF.md ADDED Viewed

	@@ -0,0 +1,24 @@

+---
+title: ROAR Item Generator
+emoji: 🦁
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.11.0
+app_file: app_gradio.py
+pinned: false
+license: mit
+---
+# ROAR Assessment Item Generator
+Generate reading comprehension items with AI-powered difficulty estimation.
+## Features
+- AI-powered item generation using Claude
+- Automatic difficulty estimation using ModernBERT
+- Save and export items to CSV
+- Interactive chat interface
+## Model
+Uses a custom-trained difficulty estimation model (ModernBERT + Ridge Regression)

app_gradio.py ADDED Viewed

	@@ -0,0 +1,307 @@

+import gradio as gr
+import os
+from anthropic import Anthropic
+from difficulty_estimator import DifficultyEstimator
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Initialize Anthropic client
+client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))
+# Initialize difficulty estimator
+MODEL_PATH = os.getenv('MODEL_PATH', './models')
+difficulty_estimator = DifficultyEstimator(MODEL_PATH)
+# Load ROAR prompt
+with open('prompts/roar_prompt.md', 'r') as f:
+    ROAR_PROMPT = f.read()
+SYSTEM_MESSAGE = """You are an expert educational assessment designer specializing in creating reading comprehension items.
+Generate high-quality assessment items following the exact format provided."""
+# Store conversation history and current item
+conversation_state = {"history": [], "current_item": None, "collection": []}
+def parse_item_from_response(text):
+    """Parse item from Claude's response"""
+    # Remove markdown bold formatting
+    text = text.replace('**', '')
+    item = {}
+    # Define field markers and their end markers
+    fields = {
+        'passage': 'Question:',
+        'question': 'Target Answer:',
+        'target_answer': 'Distractor 1:',
+        'distractor_1': 'Distractor 2:',
+        'distractor_2': 'Distractor 3:',
+        'distractor_3': 'Metadata:',
+    }
+    # Parse each field
+    for field, end_marker in fields.items():
+        # Find the field label
+        field_label = field.replace('_', ' ').title().replace('Distractor', 'Distractor')
+        if field == 'passage':
+            field_label = 'Passage:'
+        elif field == 'question':
+            field_label = 'Question:'
+        elif field == 'target_answer':
+            field_label = 'Target Answer:'
+        elif field.startswith('distractor'):
+            num = field.split('_')[1]
+            field_label = f'Distractor {num}:'
+        start_pos = text.find(field_label)
+        if start_pos == -1:
+            continue
+        start_pos += len(field_label)
+        end_pos = text.find(end_marker, start_pos) if end_marker else len(text)
+        if end_pos == -1:
+            end_pos = len(text)
+        content = text[start_pos:end_pos].strip()
+        # For distractors, clean up extra formatting
+        if field.startswith('distractor'):
+            # Remove parenthetical notes
+            if '(' in content:
+                paren_pos = content.find('(')
+                content = content[:paren_pos].strip()
+            # Take only first line
+            if '\n' in content:
+                content = content.split('\n')[0].strip()
+            # Remove dashes
+            content = content.replace('---', '').strip()
+        item[field] = content
+    # Parse metadata
+    metadata_section = text[text.find('Metadata:'):] if 'Metadata:' in text else ''
+    metadata_fields = {
+        'event_chain_relation': 'Event-Chain Relation:',
+        'knowledge_base_inference': 'Knowledge-Base Inference:',
+        'qar_level': 'QAR Level:',
+        'coherence_level': 'Coherence Level:',
+        'explanatory_stance': 'Explanatory Stance:'
+    }
+    for field, label in metadata_fields.items():
+        if label in metadata_section:
+            start = metadata_section.find(label) + len(label)
+            end = metadata_section.find('\n', start)
+            if end == -1:
+                end = len(metadata_section)
+            value = metadata_section[start:end].strip()
+            # Clean up value
+            if '(' in value:
+                value = value[:value.find('(')].strip()
+            item[field] = value
+    return item
+def chat_with_ai(user_message, history):
+    """Handle chat with Claude and generate items"""
+    if not user_message:
+        return history, None, None
+    # Add user message to history
+    conversation_state["history"].append({
+        'role': 'user',
+        'content': user_message
+    })
+    # Get response from Claude
+    messages = [{'role': msg['role'], 'content': msg['content']}
+                for msg in conversation_state["history"]]
+    with client.messages.stream(
+        model='claude-sonnet-4-20250514',
+        max_tokens=4000,
+        temperature=1,
+        system=SYSTEM_MESSAGE + "\n\n" + ROAR_PROMPT,
+        messages=messages
+    ) as stream:
+        assistant_message = ""
+        for text in stream.text_stream:
+            assistant_message += text
+    conversation_state["history"].append({
+        'role': 'assistant',
+        'content': assistant_message
+    })
+    # Parse item from response
+    item = None
+    difficulty = None
+    try:
+        item = parse_item_from_response(assistant_message)
+        if item and (item.get('passage') or item.get('question')):
+            conversation_state["current_item"] = item
+            if difficulty_estimator.is_loaded():
+                difficulty = difficulty_estimator.estimate_difficulty(item)
+    except Exception as e:
+        print(f"Error parsing item: {e}")
+    # Update chat history for display
+    history.append((user_message, assistant_message))
+    # Format item display
+    item_display = format_item_display(item, difficulty) if item else "No item generated yet"
+    return history, item_display, item
+def format_item_display(item, difficulty=None):
+    """Format item for display"""
+    if not item:
+        return "No item to display"
+    display = "# Current Item\n\n"
+    # Add difficulty if available
+    if difficulty:
+        score = difficulty['score']
+        irt_score = difficulty.get('irt_difficulty', 'N/A')
+        label = difficulty.get('interpretation', 'Medium')
+        display += f"**Estimated Difficulty:** {label}\n"
+        display += f"- Normalized: {score*100:.1f}%\n"
+        display += f"- IRT Score: {irt_score:.3f if isinstance(irt_score, float) else irt_score}\n\n"
+    # Add item fields
+    display += f"**Passage:**\n{item.get('passage', 'N/A')}\n\n"
+    display += f"**Question:**\n{item.get('question', 'N/A')}\n\n"
+    display += f"**Target Answer:**\n{item.get('target_answer', 'N/A')}\n\n"
+    display += f"**Distractor 1:**\n{item.get('distractor_1', 'N/A')}\n\n"
+    display += f"**Distractor 2:**\n{item.get('distractor_2', 'N/A')}\n\n"
+    display += f"**Distractor 3:**\n{item.get('distractor_3', 'N/A')}\n\n"
+    # Add metadata
+    display += "---\n**Metadata:**\n"
+    display += f"- Event-Chain Relation: {item.get('event_chain_relation', 'N/A')}\n"
+    display += f"- Knowledge-Base Inference: {item.get('knowledge_base_inference', 'N/A')}\n"
+    display += f"- QAR Level: {item.get('qar_level', 'N/A')}\n"
+    display += f"- Coherence Level: {item.get('coherence_level', 'N/A')}\n"
+    display += f"- Explanatory Stance: {item.get('explanatory_stance', 'N/A')}\n"
+    return display
+def save_to_collection(item_data):
+    """Save current item to collection"""
+    if not conversation_state["current_item"]:
+        return "No item to save", format_collection_display()
+    # Add to collection
+    item_copy = conversation_state["current_item"].copy()
+    item_copy['item_id'] = len(conversation_state["collection"]) + 1
+    # Add difficulty if available
+    if difficulty_estimator.is_loaded():
+        difficulty = difficulty_estimator.estimate_difficulty(item_copy)
+        if difficulty:
+            item_copy['difficulty_score'] = difficulty['score']
+            item_copy['difficulty_irt'] = difficulty.get('irt_difficulty')
+            item_copy['difficulty_label'] = difficulty.get('interpretation')
+    conversation_state["collection"].append(item_copy)
+    return f"✅ Item saved! ({len(conversation_state['collection'])} items total)", format_collection_display()
+def format_collection_display():
+    """Format collection for display"""
+    if not conversation_state["collection"]:
+        return "No items in collection yet"
+    display = f"# Collection ({len(conversation_state['collection'])} items)\n\n"
+    for item in conversation_state["collection"]:
+        display += f"## Item #{item['item_id']}\n"
+        if 'difficulty_label' in item:
+            display += f"**Difficulty:** {item['difficulty_label']} "
+            display += f"({item.get('difficulty_score', 0)*100:.1f}%)\n"
+        display += f"**Question:** {item.get('question', 'N/A')[:100]}...\n\n"
+    return display
+def export_collection():
+    """Export collection as CSV"""
+    if not conversation_state["collection"]:
+        return None
+    import pandas as pd
+    import io
+    from datetime import datetime
+    df = pd.DataFrame(conversation_state["collection"])
+    # Save to file
+    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+    filename = f'roar_items_{timestamp}.csv'
+    df.to_csv(filename, index=False)
+    return filename
+def clear_chat():
+    """Clear chat history"""
+    conversation_state["history"] = []
+    conversation_state["current_item"] = None
+    return [], "No item generated yet"
+# Create Gradio interface
+with gr.Blocks(title="ROAR Item Generator", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("# 🦁 ROAR Assessment Item Generator")
+    gr.Markdown("Generate reading comprehension items with AI guidance and difficulty estimation")
+    with gr.Row():
+        with gr.Column(scale=2):
+            chatbot = gr.Chatbot(label="Chat", height=500)
+            msg = gr.Textbox(
+                label="Your message",
+                placeholder="Try: Generate a reading comprehension item about ocean animals",
+                lines=2
+            )
+            with gr.Row():
+                send_btn = gr.Button("Send", variant="primary")
+                clear_btn = gr.Button("Clear Chat")
+        with gr.Column(scale=1):
+            item_display = gr.Markdown("No item generated yet", label="Current Item")
+            save_btn = gr.Button("💾 Save to Collection", variant="secondary")
+            save_status = gr.Textbox(label="Status", lines=1, interactive=False)
+    gr.Markdown("---")
+    with gr.Accordion("📚 Collection", open=False):
+        collection_display = gr.Markdown("No items in collection yet")
+        export_btn = gr.Button("📥 Export Collection as CSV")
+        export_file = gr.File(label="Download CSV")
+    # Hidden state to pass item data
+    item_state = gr.State(None)
+    # Event handlers
+    msg.submit(chat_with_ai, [msg, chatbot], [chatbot, item_display, item_state]).then(
+        lambda: "", None, msg
+    )
+    send_btn.click(chat_with_ai, [msg, chatbot], [chatbot, item_display, item_state]).then(
+        lambda: "", None, msg
+    )
+    clear_btn.click(clear_chat, None, [chatbot, item_display])
+    save_btn.click(save_to_collection, item_state, [save_status, collection_display])
+    export_btn.click(export_collection, None, export_file)
+if __name__ == "__main__":
+    demo.launch()

difficulty_estimator.py ADDED Viewed

	@@ -0,0 +1,189 @@

+import os
+import joblib
+import numpy as np
+import pandas as pd
+import torch
+from transformers import AutoTokenizer, AutoModel
+class DifficultyEstimator:
+    """
+    Estimates item difficulty using ModernBERT + PCA + Ridge model.
+    Matches the training pipeline from [item_difficulty]_difficulty_estimator_model.py
+    """
+    def __init__(self, model_dir=None):
+        self.ridge = None
+        self.pca = None
+        self.scaler_emb = None
+        self.scaler_features = None
+        self.grade_columns = None
+        self.tokenizer = None
+        self.bert_model = None
+        self.device = None
+        if model_dir and os.path.exists(model_dir):
+            try:
+                print("Loading difficulty model components...")
+                # Load all artifacts
+                self.ridge = joblib.load(f'{model_dir}/ridge_model.pkl')
+                self.pca = joblib.load(f'{model_dir}/pca.pkl')
+                self.scaler_emb = joblib.load(f'{model_dir}/scaler_emb.pkl')
+                self.scaler_features = joblib.load(f'{model_dir}/scaler_features.pkl')
+                self.grade_columns = joblib.load(f'{model_dir}/grade_columns.pkl')
+                print("Loading ModernBERT...")
+                self.tokenizer = AutoTokenizer.from_pretrained('answerdotai/ModernBERT-base')
+                self.bert_model = AutoModel.from_pretrained('answerdotai/ModernBERT-base')
+                self.bert_model.eval()
+                self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+                self.bert_model.to(self.device)
+                print(f"✅ Difficulty model loaded successfully (using {self.device})")
+            except Exception as e:
+                print(f"⚠️ Could not load model: {e}")
+                import traceback
+                traceback.print_exc()
+    def is_loaded(self):
+        """Check if model is fully loaded"""
+        return all([
+            self.ridge is not None,
+            self.pca is not None,
+            self.scaler_emb is not None,
+            self.scaler_features is not None,
+            self.grade_columns is not None,
+            self.tokenizer is not None,
+            self.bert_model is not None
+        ])
+    def build_text(self, item):
+        """
+        Build input text matching training format (Figure 2 in paper).
+        Format:
+        Question: {question}
+        Correct: {target_answer}
+        Wrong 1: {distractor_1}
+        Wrong 2: {distractor_2}
+        Wrong 3: {distractor_3}  # Note: your items only have 2 distractors
+        Passage: {passage}
+        """
+        return (
+            f"Question: {item.get('question', '')}\n"
+            f"Correct: {item.get('target_answer', '')}\n"
+            f"Wrong 1: {item.get('distractor_1', '')}\n"
+            f"Wrong 2: {item.get('distractor_2', '')}\n"
+            f"Wrong 3: \n"  # Empty third distractor since ROAR only has 2
+            f"Passage: {item.get('passage', '')}"
+        )
+    def get_embedding(self, text):
+        """
+        Extract ModernBERT embedding using average pooling over real tokens.
+        Matches training code: average over all tokens up to last non-padding.
+        """
+        inputs = self.tokenizer(
+            text,
+            return_tensors='pt',
+            truncation=True,
+            max_length=512,
+            padding=True
+        )
+        inputs = {k: v.to(self.device) for k, v in inputs.items()}
+        with torch.no_grad():
+            outputs = self.bert_model(**inputs)
+            hidden = outputs.last_hidden_state  # (1, seq_len, hidden_dim)
+            mask = inputs['attention_mask']     # (1, seq_len)
+            # Last non-padding token index
+            last_idx = mask[0].nonzero(as_tuple=True)[0][-1].item()
+            # Average over all real tokens
+            real_hidden = hidden[0, :last_idx+1, :]
+            avg_emb = real_hidden.mean(dim=0).cpu().numpy()
+        return avg_emb
+    def get_grade_ohe(self, grade):
+        """
+        Create one-hot encoded grade vector.
+        ROAR items don't have grade info, so default to Grade4.
+        """
+        grade_ohe = pd.DataFrame(0, index=[0], columns=self.grade_columns)
+        # Try to match grade format
+        if grade:
+            col = f'grade_{grade}'
+            if col in self.grade_columns:
+                grade_ohe[col] = 1
+        else:
+            # Default to Grade4 if no grade specified
+            if 'grade_Grade4' in self.grade_columns:
+                grade_ohe['grade_Grade4'] = 1
+        return grade_ohe.values
+    def estimate_difficulty(self, item):
+        """
+        Estimate difficulty of an item.
+        Returns dict with IRT difficulty score or None if model not loaded.
+        IRT scale interpretation:
+        - Negative values = easier items
+        - Positive values = harder items
+        - Typically ranges from -3 to +3
+        """
+        if not self.is_loaded():
+            return None
+        try:
+            # 1. Build text input
+            text = self.build_text(item)
+            # 2. Get ModernBERT embedding
+            emb = self.get_embedding(text)
+            # 3. Scale -> PCA
+            emb_scaled = self.scaler_emb.transform(emb.reshape(1, -1))
+            emb_pca = self.pca.transform(emb_scaled)
+            # 4. Add grade one-hot (default to Grade4 for ROAR items)
+            grade = item.get('grade', 'Grade4')
+            grade_ohe = self.get_grade_ohe(grade)
+            # 5. Combine features
+            features = np.hstack([emb_pca, grade_ohe])
+            # 6. Scale and predict
+            features_scaled = self.scaler_features.transform(features)
+            irt_score = self.ridge.predict(features_scaled)[0]
+            # Convert IRT score to 0-1 scale for display
+            # IRT typically ranges -3 to +3, so we'll map to 0-1
+            # where 0 = very easy, 1 = very hard
+            normalized_score = (irt_score + 3) / 6
+            normalized_score = np.clip(normalized_score, 0, 1)
+            return {
+                'score': float(normalized_score),  # 0-1 for display
+                'irt_difficulty': float(irt_score),  # raw IRT score
+                'interpretation': self.get_interpretation(normalized_score)
+            }
+        except Exception as e:
+            print(f"Error estimating difficulty: {e}")
+            import traceback
+            traceback.print_exc()
+            return None
+    def get_interpretation(self, score):
+        """Get text interpretation of difficulty score"""
+        if score < 0.4:
+            return "Easy"
+        elif score < 0.7:
+            return "Medium"
+        else:
+            return "Hard"

models/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

models/grade_columns.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6eb4ad5acd0917adbdecc1713e78dd8886f991e2eaf4e88ac68644458df24b0
+size 106

models/pca.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7fa1e465d70f40b09987f5e0cbf6267648c6b62436c3399b9622b76faaf00195
+size 158271

models/ridge_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c10ac18867fd5143f7912c31a5b87267bdd7a23f3dcfe5a94e9fc3e90d27525f
+size 1015

models/scaler_emb.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a70f072019e087abe093450f270bb6ee745219ce8f0d3729502388383c97652b
+size 19047

models/scaler_features.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:efdac3fd91d73c99048ac49772996d742959ea1fd5df1642f9734b10a4209a99
+size 1959

prompts/roar_prompt.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# ROAR Reading Comprehension Item Generation Prompt
+This prompt template can be used for generating ROAR-Inference assessment items.
+To use it, add it to the system message in app.py when needed.
+---
+You are an expert educational content designer creating reading comprehension items for the ROAR-Inference assessment. Generate ONE complete item per request following all rules below.
+---
+## ITEM STRUCTURE
+Create items with:
+- **Passage:** 3-5 sentences, age-appropriate (grades 2-5)
+- **Question:** Targets one inference type
+- **Target Answer:** Full coherence (Level 2)
+- **Distractor 1:** Partial coherence (Level 1) - uses passage info incorrectly
+- **Distractor 2:** Minimal coherence (Level 0) - outside text, world knowledge only
+---
+## CORE FRAMEWORKS (Choose one from each)
+### 1. EVENT-CHAIN RELATION
+- **Logical:** Why/how questions (causes, motivations, enabling conditions)
+- **Informational:** Who/what/when/where questions (referential/spatiotemporal tracking)
+- **Evaluative:** Themes, lessons, significance (global interpretation only)
+### 2. KNOWLEDGE-BASE INFERENCE
+- **Superordinate goal:** Purpose, intent, future goals (teleological)
+- **Causal-antecedent:** Prior causes, mechanisms (mechanistic)
+- **State:** Emotions, traits, beliefs explaining behavior (mechanistic)
+- **Referential:** Pronoun resolution, textual connections
+- **Thematic:** Moral/lesson (evaluative)
+### 3. QAR LEVEL
+**Text-Explicit:**
+- Answer verbatim/near-verbatim in passage
+- Grammatical link between question and answer
+- Use exact passage wording
+**Text-Implicit:**
+- Combine adjacent passage details
+- NO grammatical link
+- Local coherence only
+- Must use passage vocabulary (no synonyms/elevated terms)
+**Script-Implicit:**
+- Requires world knowledge + passage
+- NO grammatical link
+- Global coherence
+- May use terms not in passage
+### 4. COHERENCE LEVEL
+- **Local:** Adjacent sentences, working memory span
+- **Global:** Distant text parts + world knowledge integration
+**Mapping:** Text-Explicit/Implicit → Local | Script-Implicit → Global
+---
+## CRITICAL CONSTRAINTS
+### Vocabulary Matching (Text-Explicit/Implicit ONLY)
+✅ **MUST** use exact passage wording
+❌ **NEVER** replace with synonyms or higher-level terms
+**Violations:**
+- "thin air" → "high elevation" ❌
+- "butterfly emerge" → "metamorphosis" ❌
+- "land was scarce" → "limited land" ❌
+### Target Answer Rules
+**DO NOT ADD:**
+- Teleological additions not in text ("safely", "to be safe")
+- Emotions not stated ("scared", "fearful")
+- Purposes not indicated
+- Higher-level vocabulary (for Text-Explicit/Implicit)
+**Coherence Quality (Breadth + Simplicity):**
+- **Breadth:** Target should connect/explain multiple story elements, not just one detail
+- **Simplicity:** Target should require minimal additional assumptions beyond the passage
+- Best answers integrate multiple pieces of evidence while remaining straightforward
+---
+## DISTRACTOR CONSTRUCTION
+**Psychometric Ordering Requirement:**
+Distractors must follow attractiveness hierarchy:
+- **D1 (Partial Coherence):** Should attract mid-ability students who engage with text but miss full inference
+- **D2 (Minimal Coherence):** Should attract low-ability students who rely on world knowledge without text integration
+- D1 must be MORE plausible than D2 to create proper difficulty ordering
+### Distractor 1 (Partial Coherence)
+**Pattern:** Text-based misconnection
+- References details FROM passage
+- Connects them incorrectly to question
+- Shows partial text engagement
+- Lacks full explanatory integration
+- **Attractiveness:** Plausible enough to tempt students who read the passage but don't make full inference
+### Distractor 2 (Minimal Coherence)
+**Pattern:** Over-reliance on world knowledge
+- Based on question/general knowledge only
+- Ignores passage content
+- Plausible generally, not for this story
+- Represents reading question without passage
+- **Attractiveness:** Less plausible than D1; attracts students who don't engage with passage
+---
+## OUTPUT FORMAT
+```
+Passage: [3-5 sentences]
+Question: [Your question]
+Target Answer: [Full coherence]
+Distractor 1 (Partial Coherence): [Text-based misconnection]
+Distractor 2 (Minimal Coherence): [World knowledge only]
+---
+METADATA:
+Event-Chain Relation: [Logical/Informational/Evaluative]
+Knowledge-Base Inference: [Superordinate Goal/Causal-Antecedent/State/Referential/Thematic]
+QAR Level: [Text-Explicit/Text-Implicit/Script-Implicit]
+Coherence Level: [Local/Global]
+Explanatory Stance: [Teleological/Mechanistic/N/A]
+---
+```
+---
+## KEY PRINCIPLES
+1. **Vocabulary matching mandatory** for Text-Explicit/Implicit (no synonyms/elevated terms)
+2. **Never add to story** (no unstated safety/emotions/purposes)
+3. **Clear distractor hierarchy** (D1=partial text, D2=world knowledge only)
+4. **Attractiveness ordering** (Target > D1 > D2 in plausibility for different ability levels)
+5. **Coherence quality** (Target shows breadth across story elements + simplicity in assumptions)
+6. **No redundancy** (distractors must be qualitatively different)
+7. **Plausible distractors** (wrong due to coherence, not impossibility)
+8. **QAR consistency** (question-answer-passage relationship must match chosen level)
+---
+Generate items that provide diagnostic information about students' inferential reasoning and coherence evaluation processes.

requirements_hf.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+anthropic>=0.40.0
+gradio>=5.0.0
+pandas>=2.2.0
+python-dotenv>=1.0.0
+joblib>=1.3.2
+scikit-learn>=1.4.0
+numpy>=1.26.0
+torch>=2.0.0
+transformers>=4.30.0