Spaces:

fffffwl
/

swe-cefr-sp

Sleeping

App Files Files Community

fffffwl commited on Jan 18

Commit

0b8530c

0 Parent(s):

Initial HF Space for Swedish CEFR web app

Browse files

Files changed (14) hide show

.dockerignore +5 -0
.gitattributes +1 -0
Dockerfile +16 -0
README.md +16 -0
runs/metric-proto-k3/metric_proto.pt +3 -0
web_app/README.md +244 -0
web_app/STARTUP.md +73 -0
web_app/app.py +174 -0
web_app/debug_model.py +57 -0
web_app/model.py +297 -0
web_app/requirements.txt +5 -0
web_app/static/css/style.css +625 -0
web_app/static/js/app.js +273 -0
web_app/templates/index.html +144 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,5 @@

+__pycache__/
+*.pyc
+*.log
+*.tmp
+.venv/

.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ *.pt filter=lfs diff=lfs merge=lfs -text

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.10-slim
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV PORT=7860
+WORKDIR /app
+COPY web_app/requirements.txt /app/web_app/requirements.txt
+RUN pip install --no-cache-dir -r /app/web_app/requirements.txt gunicorn
+COPY . /app
+WORKDIR /app/web_app
+EXPOSE 7860
+CMD ["gunicorn", "-w", "1", "-k", "gthread", "--threads", "4", "-b", "0.0.0.0:7860", "app:app"]

README.md ADDED Viewed

	@@ -0,0 +1,16 @@

+---
+title: Swedish CEFR Sentence Grader
+colorFrom: yellow
+colorTo: blue
+sdk: docker
+app_port: 7860
+pinned: false
+---
+# Swedish CEFR Sentence Grader
+Flask web app for sentence-level CEFR assessment in Swedish using a Metric Proto K3 model.
+- Base model: KB/bert-base-swedish-cased
+- Levels: A1-C2
+- Input: Swedish text, auto sentence splitting

runs/metric-proto-k3/metric_proto.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30137ef1bc2c6def1b17e3e018edc704eda2ea4411840cfc45867d43a727b4fe
+size 498903733

web_app/README.md ADDED Viewed

	@@ -0,0 +1,244 @@

+# CEFR Sentence-Level Assessment Web Application
+A Flask-based web interface for assessing Swedish text at the sentence level using a trained CEFR (Common European Framework of Reference for Languages) classification model.
+## Features
+- **Web Interface**: Clean, modern UI for easy text input and analysis
+- **Sentence Segmentation**: Automatically splits text into sentences
+- **CEFR Level Assessment**: Assigns proficiency levels (A1-C2) to each sentence
+- **Real-time Results**: Visual highlighting of CEFR levels in the text
+- **Statistics Dashboard**: Shows distribution of levels and confidence scores
+- **Detailed Table**: View all sentences with their levels and confidence
+## Model Information
+- **Architecture**: Metric Proto K3 (Prototype-based Classification)
+- **Base Model**: KB/bert-base-swedish-cased
+- **Prototypes**: 3 prototypes per CEFR level (K=3)
+- **Temperature**: 10.0
+- **Performance**: 84.1% macro F1, 87.3% accuracy, 94.5% QWK
+- **Device**: CUDA (if available) or CPU
+## Project Structure
+```
+web_app/
+├── app.py                      # Flask application
+├── model.py                    # Model loading and inference
+├── requirements.txt            # Python dependencies
+├── templates/
+│   └── index.html             # Main HTML template
+└── static/
+    ├── css/
+    │   └── style.css          # Styling
+    └── js/
+        └── app.js             # Frontend JavaScript
+```
+## Installation
+### Prerequisites
+- Python 3.8+
+- CUDA-compatible GPU (optional but recommended)
+- Linux/macOS
+### Setup
+1. **Ensure virtual environment is set up** (from project root):
+```bash
+cd /home/fwl/src/textmining
+# Virtual environment should already exist at .venv/
+```
+2. **Activate virtual environment**:
+```bash
+source .venv/bin/activate
+```
+3. **Install Flask** (if not already installed):
+```bash
+pip install flask flask-cors
+```
+4. **Navigate to web app directory**:
+```bash
+cd web_app
+```
+5. **Verify model weights exist**:
+```bash
+ls ../runs/metric-proto-k3/metric_proto.pt
+```
+## Running the Application
+### Development Server
+1. **Start the Flask application**:
+```bash
+# Make sure virtual environment is activated
+source /home/fwl/src/textmining/.venv/bin/activate
+# Run the app
+cd /home/fwl/src/textmining/web_app
+python -m flask run --host=0.0.0.0 --port=5000
+```
+2. **Access the web interface**:
+Open your browser and go to: http://localhost:5000
+### Production Deployment (Gunicorn)
+For production use, install Gunicorn and run:
+```bash
+pip install gunicorn
+gunicorn --bind 0.0.0.0:5000 app:app --workers 4
+```
+## Usage
+### Web Interface
+1. **Enter Swedish text** in the large text area
+2. **Click "Analyze Text"** button
+3. **View results**:
+   - Statistics overview (total sentences, average confidence, dominant level)
+   - CEFR level distribution bar chart
+   - Annotated text with color-coded levels
+   - Detailed table of all sentences
+### API Endpoints
+#### Assess Text
+```http
+POST /assess
+Content-Type: application/json
+{
+  "text": "Jag heter Anna. Jag kommer från Sverige."
+}
+```
+Response:
+```json
+{
+  "results": [
+    {
+      "sentence": "Jag heter Anna.",
+      "level": "A1",
+      "confidence": 0.85
+    }
+  ],
+  "stats": {
+    "total_sentences": 1,
+    "avg_confidence": 0.85,
+    "level_distribution": {"A1": 1},
+    "most_common_level": {"level": "A1", "count": 1, "percentage": 100}
+  }
+}
+```
+#### Batch Predict API
+```http
+POST /api/predict
+Content-Type: application/json
+{
+  "sentences": ["Sentence 1", "Sentence 2", ...]
+}
+```
+Response:
+```json
+{
+  "predictions": [
+    {
+      "sentence": "Sentence 1",
+      "level": "B1",
+      "confidence": 0.72
+    }
+  ],
+  "count": 1
+}
+```
+## CEFR Level Reference
+| Level | Name | Description | Color |
+|-------|------|-------------|-------|
+| A1 | Beginner | Basic phrases and simple sentences | 🔴 Red |
+| A2 | Elementary | Simple direct exchanges of information | 🟠 Orange |
+| B1 | Intermediate | Simple connected text on familiar topics | 🟡 Yellow |
+| B2 | Upper Intermediate | Complex text, technical discussions | 🟢 Green |
+| C1 | Advanced | Flexible, effective, nuanced expression | 🔵 Blue |
+| C2 | Proficient | Precise, sophisticated, complex content | 🟣 Purple |
+## Troubleshooting
+### Model Loading Issues
+If model fails to load:
+1. Check that model weights exist: `runs/metric-proto-k3/metric_proto.pt`
+2. Verify virtual environment is activated
+3. Check CUDA availability: `python -c "import torch; print(torch.cuda.is_available())"`
+### Out of Memory Errors
+If you encounter OOM errors:
+1. Reduce batch size in `model.py` (modify `predict_batch`)
+2. Use CPU instead of GPU: Set `device='cpu'` in CEFRModel initialization
+3. Process text in smaller chunks
+### Prediction Time
+First prediction may take longer due to model loading. Subsequent predictions are faster.
+## Model Details
+### Architecture
+The model uses a prototype-based approach:
+- Encodes sentences using Swedish BERT
+- Computes cosine similarity to learned prototypes
+- Each CEFR level has 3 prototypes (K=3)
+- Temperature scaling (T=10.0) sharpens predictions
+### Training Data
+Model trained on Swedish CEFR-labeled sentences from:
+- SUC 3.0 corpus
+- COCTAILL corpus
+- Filtered for quality and length constraints
+### Performance Metrics
+- **Accuracy**: 87.3%
+- **Macro F1**: 84.1%
+- **Quadratic Weighted Kappa**: 94.5%
+## Development
+### Adding Features
+To add new features:
+1. Modify `app.py` for backend logic
+2. Update `templates/index.html` for UI
+3. Add styles to `static/css/style.css`
+4. Implement frontend logic in `static/js/app.js`
+### Frontend Structure
+- Vanilla JavaScript (no frameworks required)
+- Responsive design with CSS Grid and Flexbox
+- Modern UI with animations and transitions
+## License
+Same as parent project.
+## Citation
+If you use this web application in your research, please cite the original CEFR-SP paper and this implementation.

web_app/STARTUP.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# CEFR Auto-Grader Web App - Quick Start Guide
+## Application Status
+✅ **RUNNING** - Fully functional
+## Quick Access
+- **Web Interface**: http://localhost:5000
+- **LAN Access**: http://192.168.1.11:5000
+## Starting the Application
+If the app is not running, start it from the project root:
+```bash
+cd /home/fwl/src/textmining
+source .venv/bin/activate
+python web_app/app.py
+```
+Or run in background:
+```bash
+nohup python web_app/app.py > web_app/flask.log 2>&1 &
+```
+## Model Information
+- **Architecture**: Metric Proto K3
+- **Base Model**: KB/bert-base-swedish-cased
+- **Device**: CUDA (GPU)
+- **Performance**: 84.1% macro F1, 87.3% accuracy
+## Testing Examples
+| Sentence | Predicted Level | Confidence |
+|----------|----------------|------------|
+| "Hej." | A1 | 98.9% |
+| "Jag heter Anna." | A1 | 98.9% |
+| "Jag studerar svenska." | A1 | 99.1% |
+| "Den komplexa algoritmen..." | B2 | 99.0% |
+| "Det metodologiska ramverket..." | C1 | 99.1% |
+## Features
+- 📝 Large text input area
+- 🔍 Automatic sentence segmentation
+- 🎨 Color-coded CEFR levels (A1-C2)
+- 📊 Statistics dashboard
+- 📈 Level distribution visualization
+- 📋 Detailed results table
+- ⚡ Real-time processing
+## Files
+- `app.py` - Flask application
+- `model.py` - Model loading & inference
+- `templates/index.html` - Web interface
+- `static/css/style.css` - Styling
+- `static/js/app.js` - Frontend logic
+## Troubleshooting
+If predictions are all the same level:
+1. Check model loaded: `grep "Loading model" web_app/flask.log`
+2. Verify model path: `ls runs/metric-proto-k3/metric_proto.pt`
+3. Restart from project root: `cd /home/fwl/src/textmining`
+## API Endpoint
+```bash
+curl -X POST http://localhost:5000/assess \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Jag heter Anna."}'
+```

web_app/app.py ADDED Viewed

	@@ -0,0 +1,174 @@

+"""
+CEFR Sentence Level Assessment Web Application
+Flask-based web interface for assessing Swedish text at sentence level
+"""
+import os
+from pathlib import Path
+from flask import Flask, render_template, request, jsonify
+from model import CEFRModel, assess_text
+# Initialize Flask app
+app = Flask(__name__)
+app.config['SECRET_KEY'] = 'cefr-assessment-app'
+# Initialize model
+print("Loading CEFR assessment model...")
+model_path = os.environ.get('MODEL_PATH', 'runs/metric-proto-k3/metric_proto.pt')
+model = CEFRModel(model_path=model_path)
+print(f"Model loaded successfully! Using device: {model.device}")
+# CEFR level styles for HTML display
+CEFR_STYLES = {
+    'A1': {'color': '#E74C3C', 'name': 'A1 - Beginner'},
+    'A2': {'color': '#E67E22', 'name': 'A2 - Elementary'},
+    'B1': {'color': '#F39C12', 'name': 'B1 - Intermediate'},
+    'B2': {'color': '#27AE60', 'name': 'B2 - Upper Intermediate'},
+    'C1': {'color': '#3498DB', 'name': 'C1 - Advanced'},
+    'C2': {'color': '#9B59B6', 'name': 'C2 - Proficient'},
+}
+@app.route('/')
+def index():
+    """Home page with text input form"""
+    return render_template('index.html')
+@app.route('/assess', methods=['POST'])
+def assess():
+    """Assess text and return results"""
+    try:
+        # Get text from form
+        data = request.get_json()
+        text = data.get('text', '').strip()
+        if not text:
+            return jsonify({'error': 'Please enter some text to assess'}), 400
+        # Limit text length
+        if len(text) > 50000:  # ~50KB limit
+            return jsonify({'error': 'Text is too long. Please limit to 50,000 characters.'}), 400
+        # Assess text
+        results = assess_text(text, model)
+        if not results:
+            return jsonify({'error': 'No valid sentences found in the text'}), 400
+        # Prepare response
+        response = {
+            'results': results,
+            'cefr_styles': CEFR_STYLES,
+            'stats': compute_stats(results)
+        }
+        return jsonify(response)
+    except Exception as e:
+        print(f"Error in assessment: {str(e)}")
+        return jsonify({'error': f'An error occurred during assessment: {str(e)}'}), 500
+@app.route('/api/predict', methods=['POST'])
+def api_predict():
+    """API endpoint for batch predictions"""
+    try:
+        data = request.get_json()
+        sentences = data.get('sentences', [])
+        if not sentences:
+            return jsonify({'error': 'No sentences provided'}), 400
+        if not isinstance(sentences, list):
+            return jsonify({'error': 'Sentences must be a list'}), 400
+        # Limit batch size
+        if len(sentences) > 100:
+            return jsonify({'error': 'Batch size limited to 100 sentences'}), 400
+        # Predict
+        predictions = model.predict_batch(sentences)
+        # Format response
+        results = []
+        for sent, (level, confidence) in zip(sentences, predictions):
+            results.append({
+                'sentence': sent,
+                'level': level,
+                'confidence': confidence
+            })
+        return jsonify({
+            'predictions': results,
+            'count': len(results)
+        })
+    except Exception as e:
+        print(f"Error in API prediction: {str(e)}")
+        return jsonify({'error': str(e)}), 500
+def compute_stats(results: list) -> dict:
+    """Compute statistics about the assessment results"""
+    if not results:
+        return {}
+    # Count levels
+    level_counts = {}
+    for item in results:
+        level = item['level']
+        level_counts[level] = level_counts.get(level, 0) + 1
+    # Average confidence
+    avg_confidence = sum(item['confidence'] for item in results) / len(results)
+    # Most common level
+    if level_counts:
+        most_common = max(level_counts, key=level_counts.get)
+        most_common_count = level_counts[most_common]
+        most_common_pct = (most_common_count / len(results)) * 100
+    else:
+        most_common = None
+        most_common_count = 0
+        most_common_pct = 0
+    return {
+        'total_sentences': len(results),
+        'level_distribution': level_counts,
+        'avg_confidence': avg_confidence,
+        'most_common_level': {
+            'level': most_common,
+            'count': most_common_count,
+            'percentage': round(most_common_pct, 1)
+        }
+    }
+@app.context_processor
+def utility_processor():
+    """Utility functions for Jinja templates"""
+    return dict(
+        round=round,
+        len=len
+    )
+if __name__ == '__main__':
+    # Create uploads directory
+    os.makedirs('uploads', exist_ok=True)
+    print("Starting CEFR Assessment Web App...")
+    print(f"\nModel path: {model_path}")
+    print(f"Model device: {model.device}")
+    print("\nStarting Flask server...")
+    # Run app
+    app.run(
+        debug=True,
+        host='0.0.0.0',
+        port=5000,
+        threaded=True
+    )

web_app/debug_model.py ADDED Viewed

	@@ -0,0 +1,57 @@

+#!/usr/bin/env python
+import sys
+sys.path.append('/home/fwl/src/textmining')
+from web_app.model import CEFRModel
+import torch
+print("Loading model...")
+model = CEFRModel(model_path='runs/metric-proto-k3/metric_proto.pt')
+# Test simple sentence
+sentences = ["Jag är bra."]
+print(f"\nTesting: {sentences}")
+# Tokenize
+encoded = model.tokenize(sentences)
+input_ids = encoded["input_ids"].to(model.device)
+attention_mask = encoded["attention_mask"].to(model.device)
+print(f"Input shape: {input_ids.shape}")
+print(f"Device: {model.device}")
+# Predict
+with torch.no_grad():
+    logits = model.model(input_ids, attention_mask)["logits"]
+    print(f"Logits shape: {logits.shape}")
+    print(f"Logits: {logits}")
+    probs = torch.softmax(logits, dim=1)
+    print(f"Probs shape: {probs.shape}")
+    print(f"Probs: {probs}")
+    predictions = torch.argmax(logits, dim=1)
+    print(f"Predictions: {predictions}")
+    # Test different ways to extract confidence
+    cpu_probs = probs.cpu()
+    for i, pred in enumerate(predictions.cpu().numpy()):
+        print(f"\nSentence {i}: '{sentences[i]}'")
+        print(f"  Predicted class: {pred}")
+        print(f"  Predicted level: {model.id_to_label[pred]}")
+        print(f"  Method 1 - probs[i][pred]: {probs[i][pred].item()}")
+        print(f"  Method 2 - cpu_probs[i][pred]: {cpu_probs[i][pred].item()}")
+        print(f"  Method 3 - float(cpu_probs[i][pred].item()): {float(cpu_probs[i][pred].item())}")
+# Test using predict_batch
+print("\n" + "="*60)
+print("Using predict_batch method:")
+results = model.predict_batch(sentences)
+for sent, (level, conf) in zip(sentences, results):
+    print(f"  {level} ({conf*100:.1f}%): {sent}")
+# Test using predict_sentence
+print("\n" + "="*60)
+print("Using predict_sentence method:")
+level, conf = model.predict_sentence(sentences[0])
+print(f"  {level} ({conf*100:.1f}%): {sentences[0]}")

web_app/model.py ADDED Viewed

	@@ -0,0 +1,297 @@

+"""
+CEFR Sentence Level Assessment Model
+Loads and runs inference with the metric proto k3 model
+"""
+import re
+from pathlib import Path
+from typing import List, Tuple, Dict
+import torch
+from transformers import AutoTokenizer, AutoModel
+class PrototypeClassifier(torch.nn.Module):
+    """Metric-based prototype classifier for CEFR level assessment"""
+    def __init__(
+        self,
+        encoder,
+        num_labels: int,
+        hidden_size: int,
+        prototypes_per_class: int,
+        temperature: float = 10.0,
+        layer_index: int = -2,
+    ):
+        super().__init__()
+        self.encoder = encoder
+        self.num_labels = num_labels
+        self.prototypes_per_class = prototypes_per_class
+        self.temperature = temperature
+        self.layer_index = layer_index
+        self.prototypes = torch.nn.Parameter(
+            torch.empty(num_labels, prototypes_per_class, hidden_size)
+        )
+    def set_prototypes(self, proto_tensor: torch.Tensor) -> None:
+        """Set prototype weights"""
+        with torch.no_grad():
+            self.prototypes.copy_(proto_tensor)
+    def encode(self, input_ids, attention_mask, token_type_ids=None) -> torch.Tensor:
+        """Encode input sentences to normalized embeddings"""
+        outputs = self.encoder(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            output_hidden_states=True,
+        )
+        hidden = outputs.hidden_states[self.layer_index]
+        # mean pooling
+        mask = attention_mask.unsqueeze(-1).float()
+        summed = torch.sum(hidden * mask, dim=1)
+        counts = torch.clamp(mask.sum(dim=1), min=1e-9)
+        pooled = summed / counts
+        pooled = torch.nn.functional.normalize(pooled, p=2, dim=1)
+        return pooled
+    def forward(self, input_ids, attention_mask, token_type_ids=None):
+        """Forward pass returning logits"""
+        x = self.encode(input_ids, attention_mask, token_type_ids)
+        # cosine similarity with prototypes, average over K for each class
+        protos = torch.nn.functional.normalize(self.prototypes, p=2, dim=-1)
+        # [B, H] x [C,K,H] -> [B,C,K]
+        sim = torch.einsum("bh,ckh->bck", x, protos)
+        sim_mean = sim.mean(dim=2)  # average over K
+        logits = sim_mean * self.temperature
+        return {"logits": logits}
+    def predict(self, input_ids, attention_mask, token_type_ids=None) -> torch.Tensor:
+        """Predict CEFR levels"""
+        outputs = self.forward(input_ids, attention_mask, token_type_ids)
+        return torch.argmax(outputs["logits"], dim=1)
+class CEFRModel:
+    """Wrapper class for CEFR assessment model"""
+    def __init__(self, model_path: str = None, device: str = None):
+        """
+        Initialize the CEFR assessment model
+        Args:
+            model_path: Path to the trained model checkpoint
+            device: Device to run inference on ('cuda' or 'cpu')
+        """
+        if device is None:
+            self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        else:
+            self.device = torch.device(device)
+        # CEFR level mapping
+        self.id_to_label = {0: "A1", 1: "A2", 2: "B1", 3: "B2", 4: "C1", 5: "C2"}
+        self.label_to_id = {v: k for k, v in self.id_to_label.items()}
+        # Model parameters
+        self.model_name = "KB/bert-base-swedish-cased"
+        self.hidden_size = 768
+        self.num_labels = 6
+        self.prototypes_per_class = 3
+        self.temperature = 10.0
+        # Load tokenizer
+        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
+        # Load model
+        encoder = AutoModel.from_pretrained(self.model_name)
+        self.model = PrototypeClassifier(
+            encoder=encoder,
+            num_labels=self.num_labels,
+            hidden_size=self.hidden_size,
+            prototypes_per_class=self.prototypes_per_class,
+            temperature=self.temperature,
+        )
+        # Load trained weights
+        if model_path is None:
+            # Try to find the model automatically
+            default_paths = [
+                "runs/metric-proto-k3/metric_proto.pt",
+                "runs/metric-proto/metric_proto.pt",
+                "runs/bert-baseline/bert_baseline.pt",
+                "../runs/metric-proto-k3/metric_proto.pt",  # Relative to web_app/
+            ]
+            for path in default_paths:
+                if Path(path).exists():
+                    model_path = path
+                    print(f"Auto-detected model: {model_path}")
+                    break
+        if model_path:
+            # Try different relative paths
+            possible_paths = [
+                Path(model_path),
+                Path(__file__).parent / model_path,
+                Path(__file__).parent.parent / model_path,
+            ]
+            checkpoint = None
+            for path in possible_paths:
+                if path.exists():
+                    print(f"Loading model from {path}")
+                    checkpoint = torch.load(path, map_location=self.device, weights_only=False)
+                    break
+            if checkpoint is None:
+                print(f"Warning: Model file not found at {model_path}")
+                print("Model will be initialized with random weights!")
+        else:
+            print("Warning: No model path specified. Model will be initialized with random weights!")
+            checkpoint = None
+        if checkpoint is not None:
+            # Load model state dict
+            if "state_dict" in checkpoint:
+                state_dict = checkpoint["state_dict"]
+                # Handle DataParallel state dict
+                new_state_dict = {}
+                for key, value in state_dict.items():
+                    if key.startswith("model."):
+                        new_key = key[6:]  # Remove 'model.' prefix
+                    else:
+                        new_key = key
+                    new_state_dict[new_key] = value
+                self.model.load_state_dict(new_state_dict, strict=False)
+            else:
+                self.model.load_state_dict(checkpoint)
+            # Load prototypes if available
+            if "prototypes" in checkpoint:
+                self.model.set_prototypes(checkpoint["prototypes"].to(self.device))
+        self.model.to(self.device)
+        self.model.eval()
+    def tokenize(self, texts: List[str], max_length: int = 128) -> Dict[str, torch.Tensor]:
+        """Tokenize input texts"""
+        encoded = self.tokenizer(
+            texts,
+            truncation=True,
+            padding=True,
+            max_length=max_length,
+            return_tensors="pt",
+        )
+        return encoded
+    def predict_batch(self, sentences: List[str]) -> List[Tuple[str, float]]:
+        """
+        Predict CEFR levels for a batch of sentences
+        Args:
+            sentences: List of sentences to assess
+        Returns:
+            List of (level, confidence) tuples
+        """
+        if not sentences:
+            return []
+        # Tokenize
+        encoded = self.tokenize(sentences)
+        input_ids = encoded["input_ids"].to(self.device)
+        attention_mask = encoded["attention_mask"].to(self.device)
+        # Predict
+        with torch.no_grad():
+            logits = self.model(input_ids, attention_mask)["logits"]
+            probs = torch.softmax(logits, dim=1)
+            predictions = torch.argmax(logits, dim=1)
+        # Format results
+        results = []
+        cpu_probs = probs.cpu()
+        for i, pred in enumerate(predictions.cpu().numpy()):
+            level = self.id_to_label[pred]
+            confidence = float(cpu_probs[i][pred].item())
+            # Handle NaN values
+            if torch.isnan(cpu_probs[i][pred]):
+                confidence = 1.0 / self.num_labels
+            results.append((level, confidence))
+        return results
+    def predict_sentence(self, sentence: str) -> Tuple[str, float]:
+        """Predict CEFR level for a single sentence"""
+        results = self.predict_batch([sentence])
+        return results[0]
+def split_into_sentences(text: str) -> List[str]:
+    """
+    Split text into sentences
+    Args:
+        text: Input text (Swedish)
+    Returns:
+        List of sentences
+    """
+    # Simple sentence splitting based on punctuation
+    # Swedish sentence endings: . ! ?
+    # Split on punctuation followed by space and uppercase letter, or end of string
+    sentences = re.split(r'([.!?])\s+', text)
+    # Combine punctuation with previous sentence
+    combined = []
+    for i in range(0, len(sentences) - 1, 2):
+        if i + 1 < len(sentences):
+            combined.append(sentences[i] + sentences[i + 1])
+        else:
+            combined.append(sentences[i])
+    # Handle the last sentence if there's no punctuation
+    if len(sentences) % 2 == 1 and sentences[-1].strip():
+        combined.append(sentences[-1])
+    # Clean up sentences
+    cleaned = []
+    for sent in combined:
+        sent = sent.strip()
+        if sent:
+            cleaned.append(sent)
+    return cleaned
+def assess_text(text: str, model: CEFRModel) -> List[Dict[str, any]]:
+    """
+    Assess a text and return sentence-level CEFR annotations
+    Args:
+        text: Input text (Swedish)
+        model: CEFR assessment model
+    Returns:
+        List of dictionaries with sentence and level information
+    """
+    # Split text into sentences
+    sentences = split_into_sentences(text)
+    if not sentences:
+        return []
+    # Predict CEFR levels
+    predictions = model.predict_batch(sentences)
+    # Format results
+    results = []
+    for sent, (level, confidence) in zip(sentences, predictions):
+        results.append({
+            "sentence": sent,
+            "level": level,
+            "confidence": confidence,
+        })
+    return results

web_app/requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+torch>=1.9.0
+transformers>=4.0.0
+flask>=2.0.0
+flask-cors>=3.0.0
+numpy>=1.21.0

web_app/static/css/style.css ADDED Viewed

	@@ -0,0 +1,625 @@

+/* CSS Reset and Base Styles */
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+:root {
+    /* Color Palette - Modern Deep Blues */
+    --primary-color: #1A3A6C;
+    --primary-dark: #0D2147;
+    --primary-light: #2C5282;
+    --accent-color: #2B89E0;
+    --success-color: #27AE60;
+    --warning-color: #F39C12;
+    --error-color: #E74C3C;
+    /* CEFR Level Colors */
+    --a1-color: #E74C3C;
+    --a2-color: #E67E22;
+    --b1-color: #F39C12;
+    --b2-color: #27AE60;
+    --c1-color: #3498DB;
+    --c2-color: #9B59B6;
+    /* Neutral Colors */
+    --bg-color: #F8FAFC;
+    --card-bg: #FFFFFF;
+    --text-primary: #1E293B;
+    --text-secondary: #64748B;
+    --border-color: #E2E8F0;
+    --shadow: 0 1px 3px 0 rgba(0, 0, 0, 0.1);
+    --shadow-lg: 0 10px 15px -3px rgba(0, 0, 0, 0.1);
+    /* Typography */
+    --font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+    --font-size-sm: 0.875rem;
+    --font-size-base: 1rem;
+    --font-size-lg: 1.125rem;
+    --font-size-xl: 1.25rem;
+    --font-size-2xl: 1.5rem;
+    /* Spacing */
+    --spacing-xs: 0.5rem;
+    --spacing-sm: 0.75rem;
+    --spacing-md: 1rem;
+    --spacing-lg: 1.5rem;
+    --spacing-xl: 2rem;
+    --spacing-2xl: 3rem;
+    /* Border Radius */
+    --radius-sm: 4px;
+    --radius-md: 8px;
+    --radius-lg: 12px;
+}
+/* Base Styles */
+body {
+    font-family: var(--font-family);
+    background-color: var(--bg-color);
+    color: var(--text-primary);
+    line-height: 1.6;
+}
+.container {
+    max-width: 1200px;
+    margin: 0 auto;
+    padding: var(--spacing-md);
+}
+/* Header */
+.header {
+    padding: var(--spacing-lg) 0;
+    margin-bottom: var(--spacing-lg);
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    border-bottom: 1px solid var(--border-color);
+}
+.logo h1 {
+    font-size: var(--font-size-xl);
+    font-weight: 600;
+    color: var(--text-primary);
+}
+.github-link a {
+    color: var(--primary-color);
+    text-decoration: none;
+    font-weight: 500;
+    font-size: var(--font-size-base);
+}
+.github-link a:hover {
+    text-decoration: underline;
+}
+/* Cards */
+.card {
+    background: var(--card-bg);
+    border-radius: var(--radius-lg);
+    padding: var(--spacing-xl);
+    box-shadow: var(--shadow);
+    margin-bottom: var(--spacing-xl);
+    border: 1px solid var(--border-color);
+}
+/* Section Headers */
+h2, h3 {
+    color: var(--text-primary);
+    margin-bottom: var(--spacing-md);
+}
+h2 {
+    font-size: var(--font-size-xl);
+    font-weight: 600;
+}
+h3 {
+    font-size: var(--font-size-lg);
+    font-weight: 600;
+}
+.section-description {
+    color: var(--text-secondary);
+    margin-bottom: var(--spacing-lg);
+    font-size: var(--font-size-base);
+}
+/* Forms */
+.form-group {
+    margin-bottom: var(--spacing-lg);
+}
+.form-label {
+    display: block;
+    margin-bottom: var(--spacing-sm);
+    font-weight: 500;
+    color: var(--text-primary);
+    font-size: var(--font-size-base);
+}
+.text-input {
+    width: 100%;
+    padding: var(--spacing-md);
+    border: 2px solid var(--border-color);
+    border-radius: var(--radius-md);
+    font-size: var(--font-size-base);
+    font-family: inherit;
+    resize: vertical;
+    min-height: 200px;
+    transition: border-color 0.2s ease;
+}
+.text-input:focus {
+    outline: none;
+    border-color: var(--accent-color);
+    box-shadow: 0 0 0 3px rgba(43, 137, 224, 0.1);
+}
+.input-hint {
+    margin-top: var(--spacing-sm);
+    font-size: var(--font-size-sm);
+    color: var(--text-secondary);
+}
+/* Buttons */
+.button-group {
+    display: flex;
+    gap: var(--spacing-md);
+    flex-wrap: wrap;
+}
+.btn {
+    padding: var(--spacing-sm) var(--spacing-lg);
+    border: none;
+    border-radius: var(--radius-md);
+    font-size: var(--font-size-base);
+    font-weight: 500;
+    cursor: pointer;
+    display: inline-flex;
+    align-items: center;
+    gap: var(--spacing-sm);
+    transition: all 0.2s ease;
+    position: relative;
+}
+.btn-primary {
+    background: linear-gradient(135deg, var(--accent-color) 0%, var(--primary-light) 100%);
+    color: white;
+}
+.btn-primary:hover {
+    transform: translateY(-1px);
+    box-shadow: 0 4px 12px rgba(43, 137, 224, 0.3);
+}
+.btn-primary:active {
+    transform: translateY(0);
+}
+.btn-primary:disabled {
+    opacity: 0.6;
+    cursor: not-allowed;
+    transform: none;
+}
+.btn-secondary {
+    background: var(--card-bg);
+    color: var(--text-primary);
+    border: 1px solid var(--border-color);
+}
+.btn-secondary:hover {
+    background: #F1F5F9;
+    border-color: #CBD5E1;
+}
+.btn-small {
+    padding: var(--spacing-xs) var(--spacing-sm);
+    font-size: var(--font-size-sm);
+}
+.btn-outline {
+    background: transparent;
+    border: 1px solid var(--primary-light);
+    color: var(--primary-light);
+}
+.btn-outline:hover {
+    background: rgba(44, 82, 130, 0.05);
+}
+/* Button Loader */
+.btn-loader {
+    display: none;
+    width: 16px;
+    height: 16px;
+    border: 2px solid #ffffff;
+    border-radius: 50%;
+    border-top-color: transparent;
+    animation: spin 0.8s linear infinite;
+}
+.btn-loader.active {
+    display: block;
+}
+@keyframes spin {
+    to { transform: rotate(360deg); }
+}
+/* Compact Stats */
+.compact-stats {
+    display: flex;
+    gap: var(--spacing-lg);
+    margin: var(--spacing-md) 0;
+    padding: 0 var(--spacing-lg);
+    font-size: var(--font-size-sm);
+    color: var(--text-secondary);
+}
+.stat-item {
+    display: flex;
+    align-items: baseline;
+    gap: var(--spacing-xs);
+}
+.stat-value {
+    font-weight: 600;
+    color: var(--text-primary);
+}
+.stat-name {
+    font-weight: 400;
+}
+/* Distribution Bars */
+.distribution-container h3 {
+    margin-bottom: var(--spacing-md);
+    font-size: var(--font-size-lg);
+    font-weight: 600;
+}
+.distribution-bars {
+    display: flex;
+    flex-direction: column;
+    gap: var(--spacing-sm);
+}
+.distribution-bar {
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-md);
+    font-size: var(--font-size-sm);
+}
+.distribution-label {
+    width: 40px;
+    font-weight: 500;
+}
+.distribution-track {
+    flex: 1;
+    height: 20px;
+    background: var(--border-color);
+    border-radius: var(--radius-sm);
+    overflow: hidden;
+    position: relative;
+}
+.distribution-fill {
+    height: 100%;
+    border-radius: var(--radius-sm);
+    display: flex;
+    align-items: center;
+    justify-content: flex-end;
+    padding-right: var(--spacing-sm);
+    color: white;
+    font-size: var(--font-size-sm);
+    font-weight: 500;
+    transition: width 0.5s ease;
+}
+.distribution-count {
+    width: 30px;
+    text-align: right;
+    color: var(--text-secondary);
+}
+/* Annotated Text */
+.container-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-bottom: var(--spacing-md);
+}
+.main-result {
+    padding: var(--spacing-xl);
+}
+.annotated-text {
+    line-height: 2.5; /* Generous line height for the underlines */
+    font-size: var(--font-size-lg);
+    font-family: var(--font-family);
+    white-space: pre-wrap;
+    word-wrap: break-word;
+}
+.annotation {
+    display: inline;
+    padding-bottom: 2px;
+    border-bottom-width: 3px;
+    border-bottom-style: solid;
+    background-color: transparent; /* Override any potential background utilities */
+    box-decoration-break: clone;
+    -webkit-box-decoration-break: clone;
+    transition: background-color 0.2s ease, border-color 0.2s ease;
+    cursor: help;
+}
+.annotation:hover {
+    background-color: rgba(0, 0, 0, 0.03); /* Very subtle hover effect */
+}
+/* Specific border colors for annotations - Overriding the general background utility classes */
+.annotated-text .annotation.level-a1 { border-bottom-color: var(--a1-color); background-color: transparent; }
+.annotated-text .annotation.level-a2 { border-bottom-color: var(--a2-color); background-color: transparent; }
+.annotated-text .annotation.level-b1 { border-bottom-color: var(--b1-color); background-color: transparent; }
+.annotated-text .annotation.level-b2 { border-bottom-color: var(--b2-color); background-color: transparent; }
+.annotated-text .annotation.level-c1 { border-bottom-color: var(--c1-color); background-color: transparent; }
+.annotated-text .annotation.level-c2 { border-bottom-color: var(--c2-color); background-color: transparent; }
+/* In case the utility classes win specificity wise, we ensure these apply */
+.annotation.level-a1, .annotation.level-a2, .annotation.level-b1,
+.annotation.level-b2, .annotation.level-c1, .annotation.level-c2 {
+    background-color: transparent;
+}
+.annotation:hover.level-a1 { background-color: rgba(231, 76, 60, 0.1); }
+.annotation:hover.level-a2 { background-color: rgba(230, 126, 34, 0.1); }
+.annotation:hover.level-b1 { background-color: rgba(243, 156, 18, 0.1); }
+.annotation:hover.level-b2 { background-color: rgba(39, 174, 96, 0.1); }
+.annotation:hover.level-c1 { background-color: rgba(52, 152, 219, 0.1); }
+.annotation:hover.level-c2 { background-color: rgba(155, 89, 182, 0.1); }
+.annotation-hidden {
+    border-bottom-color: transparent !important;
+}
+.cefr-badge {
+    /* Deprecated but kept to prevent errors if stale JS runs */
+    display: none;
+}
+/* Sentence Table */
+.table-wrapper {
+    overflow-x: auto;
+    border-radius: var(--radius-md);
+    border: 1px solid var(--border-color);
+}
+.sentence-table {
+    width: 100%;
+    border-collapse: collapse;
+    font-size: var(--font-size-base);
+}
+.sentence-table th {
+    background: #F1F5F9;
+    padding: var(--spacing-md);
+    text-align: left;
+    font-weight: 600;
+    color: var(--text-primary);
+    border-bottom: 2px solid var(--border-color);
+}
+.sentence-table td {
+    padding: var(--spacing-md);
+    border-bottom: 1px solid var(--border-color);
+}
+.sentence-table tbody tr:last-child td {
+    border-bottom: none;
+}
+.sentence-table tbody tr:nth-child(even) {
+    background: #F8FAFC;
+}
+.sentence-table tbody tr:hover {
+    background: #E2E8F0;
+}
+.sentence-text {
+    max-width: 600px;
+    word-wrap: break-word;
+}
+.level-cell {
+    display: flex;
+    align-items: center;
+    gap: var(--spacing-sm);
+}
+.level-indicator {
+    width: 12px;
+    height: 12px;
+    border-radius: 50%;
+    flex-shrink: 0;
+}
+.confidence-bar {
+    display: inline-block;
+    width: 60px;
+    height: 6px;
+    background: var(--border-color);
+    border-radius: 3px;
+    position: relative;
+    margin-left: var(--spacing-sm);
+}
+.confidence-fill {
+    position: absolute;
+    left: 0;
+    top: 0;
+    height: 100%;
+    background: var(--accent-color);
+    border-radius: 3px;
+}
+/* Modal */
+.modal {
+    position: fixed;
+    top: 0;
+    left: 0;
+    right: 0;
+    bottom: 0;
+    background: rgba(0, 0, 0, 0.5);
+    z-index: 1000;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    padding: var(--spacing-md);
+}
+.modal-content {
+    background: white;
+    border-radius: var(--radius-lg);
+    max-width: 500px;
+    width: 100%;
+    box-shadow: var(--shadow-lg);
+    overflow: hidden;
+}
+.modal-header {
+    padding: var(--spacing-lg);
+    background: var(--primary-color);
+    color: white;
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+}
+.modal-header h3 {
+    margin: 0;
+}
+.modal-close {
+    background: none;
+    border: none;
+    color: white;
+    font-size: 1.5rem;
+    cursor: pointer;
+    width: 32px;
+    height: 32px;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    border-radius: 50%;
+    transition: background 0.2s ease;
+}
+.modal-close:hover {
+    background: rgba(255, 255, 255, 0.1);
+}
+.modal-body {
+    padding: var(--spacing-lg);
+    color: var(--text-primary);
+}
+.modal-footer {
+    padding: var(--spacing-lg);
+    border-top: 1px solid var(--border-color);
+    display: flex;
+    justify-content: flex-end;
+}
+/* Footer */
+.footer {
+    text-align: center;
+    padding: var(--spacing-lg);
+    color: var(--text-secondary);
+    font-size: var(--font-size-sm);
+    margin-top: var(--spacing-xl);
+}
+/* Responsive Design */
+@media (max-width: 768px) {
+    .container {
+        padding: var(--spacing-sm);
+    }
+    .header {
+        flex-direction: column;
+        gap: var(--spacing-md);
+        text-align: center;
+        padding: var(--spacing-lg);
+    }
+    .stats-grid {
+        grid-template-columns: 1fr;
+    }
+    .button-group {
+        flex-direction: column;
+    }
+    .btn {
+        width: 100%;
+        justify-content: center;
+    }
+    .container-header {
+        flex-direction: column;
+        gap: var(--spacing-md);
+        align-items: flex-start;
+    }
+    .sentence-table {
+        font-size: var(--font-size-sm);
+    }
+    .sentence-table th,
+    .sentence-table td {
+        padding: var(--spacing-sm);
+    }
+}
+/* Animations */
+@keyframes fadeIn {
+    from {
+        opacity: 0;
+        transform: translateY(20px);
+    }
+    to {
+        opacity: 1;
+        transform: translateY(0);
+    }
+}
+.card {
+    animation: fadeIn 0.4s ease forwards;
+}
+/* CEFR Level Colors */
+.level-a1 { background-color: var(--a1-color); }
+.level-a2 { background-color: var(--a2-color); }
+.level-b1 { background-color: var(--b1-color); }
+.level-b2 { background-color: var(--b2-color); }
+.level-c1 { background-color: var(--c1-color); }
+.level-c2 { background-color: var(--c2-color); }
+/* Utility Classes */
+.text-center { text-align: center; }
+.text-left { text-align: left; }
+.text-right { text-align: right; }
+.mt-sm { margin-top: var(--spacing-sm); }
+.mt-md { margin-top: var(--spacing-md); }
+.mt-lg { margin-top: var(--spacing-lg); }
+.mb-sm { margin-bottom: var(--spacing-sm); }
+.mb-md { margin-bottom: var(--spacing-md); }
+.mb-lg { margin-bottom: var(--spacing-lg); }

web_app/static/js/app.js ADDED Viewed

	@@ -0,0 +1,273 @@

+// CEFR Assessment Web App JavaScript
+class CEFRApp {
+    constructor() {
+        this.elements = {
+            form: document.getElementById('assessment-form'),
+            textInput: document.getElementById('text-input'),
+            charCount: document.getElementById('char-count'),
+            assessBtn: document.getElementById('assess-btn'),
+            btnLoader: document.getElementById('btn-loader'),
+            btnText: document.querySelector('#assess-btn .btn-text'),
+            clearBtn: document.getElementById('clear-btn'),
+            resultsSection: document.getElementById('results-section'),
+            totalSentences: document.getElementById('total-sentences'),
+            avgConfidence: document.getElementById('avg-confidence'),
+            dominantLevel: document.getElementById('dominant-level'),
+            distributionBars: document.getElementById('distribution-bars'),
+            annotatedText: document.getElementById('annotated-text'),
+            sentenceTbody: document.getElementById('sentence-tbody'),
+            toggleHighlight: document.getElementById('toggle-highlight'),
+            errorModal: document.getElementById('error-modal'),
+            errorMessage: document.getElementById('error-message'),
+        };
+        this.cefrStyles = {
+            'A1': { color: '#E74C3C', name: 'A1 - Beginner' },
+            'A2': { color: '#E67E22', name: 'A2 - Elementary' },
+            'B1': { color: '#F39C12', name: 'B1 - Intermediate' },
+            'B2': { color: '#27AE60', name: 'B2 - Upper Intermediate' },
+            'C1': { color: '#3498DB', name: 'C1 - Advanced' },
+            'C2': { color: '#9B59B6', name: 'C2 - Proficient' },
+        };
+        this.showHighlights = true;
+        this.init();
+    }
+    init() {
+        // Event Listeners
+        this.elements.form.addEventListener('submit', (e) => this.handleSubmit(e));
+        this.elements.clearBtn.addEventListener('click', () => this.clearText());
+        this.elements.textInput.addEventListener('input', () => this.updateCharCount());
+        this.elements.toggleHighlight.addEventListener('click', () => this.toggleHighlighting());
+        // Modal close events
+        document.querySelectorAll('.modal-close').forEach(btn => {
+            btn.addEventListener('click', () => this.hideError());
+        });
+        this.elements.errorModal.addEventListener('click', (e) => {
+            if (e.target === this.elements.errorModal) {
+                this.hideError();
+            }
+        });
+        // Initial char count
+        this.updateCharCount();
+    }
+    updateCharCount() {
+        const count = this.elements.textInput.value.length;
+        const maxLength = 50000;
+        this.elements.charCount.textContent = `${count.toLocaleString()} / ${maxLength.toLocaleString()} characters`;
+        if (count > maxLength * 0.9) {
+            this.elements.charCount.style.color = '#E74C3C';
+        } else if (count > maxLength * 0.8) {
+            this.elements.charCount.style.color = '#F39C12';
+        } else {
+            this.elements.charCount.style.color = '#64748B';
+        }
+    }
+    async handleSubmit(e) {
+        e.preventDefault();
+        const text = this.elements.textInput.value.trim();
+        if (!text) {
+            this.showError('Please enter some text to analyze.');
+            return;
+        }
+        this.setLoading(true);
+        this.hideResults();
+        try {
+            const response = await fetch('/assess', {
+                method: 'POST',
+                headers: {
+                    'Content-Type': 'application/json',
+                },
+                body: JSON.stringify({ text }),
+            });
+            const data = await response.json();
+            if (!response.ok) {
+                throw new Error(data.error || 'An error occurred');
+            }
+            this.displayResults(data);
+            this.showResults();
+            // Scroll to results
+            setTimeout(() => {
+                this.elements.resultsSection.scrollIntoView({ behavior: 'smooth' });
+            }, 100);
+        } catch (error) {
+            console.error('Error:', error);
+            this.showError(error.message);
+        } finally {
+            this.setLoading(false);
+        }
+    }
+    setLoading(loading) {
+        if (loading) {
+            this.elements.assessBtn.disabled = true;
+            this.elements.btnLoader.classList.add('active');
+            this.elements.btnText.textContent = 'Analyzing...';
+        } else {
+            this.elements.assessBtn.disabled = false;
+            this.elements.btnLoader.classList.remove('active');
+            this.elements.btnText.textContent = 'Analyze Text';
+        }
+    }
+    displayResults(data) {
+        // Update stats
+        this.elements.totalSentences.textContent = data.stats.total_sentences;
+        this.elements.avgConfidence.textContent =
+            Math.round(data.stats.avg_confidence * 100) + '%';
+        this.elements.dominantLevel.textContent = data.stats.most_common_level.level;
+        this.elements.dominantLevel.style.color =
+            this.cefrStyles[data.stats.most_common_level.level]?.color || '#000';
+        // Update distribution
+        this.displayDistribution(data.stats.level_distribution, data.stats.total_sentences);
+        // Update annotated text
+        this.displayAnnotatedText(data.results);
+        // Update table
+        this.displayTable(data.results);
+    }
+    displayDistribution(distribution, total) {
+        const levels = ['A1', 'A2', 'B1', 'B2', 'C1', 'C2'];
+        this.elements.distributionBars.innerHTML = '';
+        levels.forEach(level => {
+            const count = distribution[level] || 0;
+            const percentage = total > 0 ? (count / total) * 100 : 0;
+            const style = this.cefrStyles[level] || { color: '#000' };
+            const bar = document.createElement('div');
+            bar.className = 'distribution-bar';
+            bar.innerHTML = `
+                <div class="distribution-label" style="color: ${style.color}">
+                    ${level}
+                </div>
+                <div class="distribution-track">
+                    <div class="distribution-fill level-${level.toLowerCase()}"
+                         style="width: ${percentage}%;">
+                        ${percentage > 10 ? Math.round(percentage) + '%' : ''}
+                    </div>
+                </div>
+                <div class="distribution-count">${count}</div>
+            `;
+            this.elements.distributionBars.appendChild(bar);
+        });
+    }
+    displayAnnotatedText(results) {
+        this.elements.annotatedText.innerHTML = '';
+        results.forEach((item, index) => {
+            const style = this.cefrStyles[item.level] || { color: '#000' };
+            const annotation = document.createElement('span');
+            annotation.className = `annotation level-${item.level.toLowerCase()}`;
+            annotation.title = `${item.level} - ${this.cefrStyles[item.level].name}`;
+            annotation.textContent = item.sentence;
+            this.elements.annotatedText.appendChild(annotation);
+            // Add single space between sentences instead of newline
+            if (index < results.length - 1) {
+                this.elements.annotatedText.appendChild(document.createTextNode(' '));
+            }
+        });
+    }
+    displayTable(results) {
+        this.elements.sentenceTbody.innerHTML = '';
+        results.forEach((item, index) => {
+            const style = this.cefrStyles[item.level] || { color: '#000' };
+            const confidence = Math.round(item.confidence * 100);
+            const confidenceWidth = confidence;
+            const row = document.createElement('tr');
+            row.innerHTML = `
+                <td class="sentence-text">${item.sentence}</td>
+                <td>
+                    <div class="level-cell">
+                        <div class="level-indicator level-${item.level.toLowerCase()}"
+                             style="background-color: ${style.color}">
+                        </div>
+                        <span>${item.level}</span>
+                    </div>
+                </td>
+                <td>
+                    ${confidence}%
+                    <div class="confidence-bar">
+                        <div class="confidence-fill" style="width: ${confidenceWidth}%"></div>
+                    </div>
+                </td>
+            `;
+            this.elements.sentenceTbody.appendChild(row);
+        });
+    }
+    toggleHighlighting() {
+        this.showHighlights = !this.showHighlights;
+        if (this.showHighlights) {
+            this.elements.toggleHighlight.textContent = 'Hide Markers';
+            document.querySelectorAll('.annotation').forEach(annotation => {
+                annotation.classList.remove('annotation-hidden');
+            });
+        } else {
+            this.elements.toggleHighlight.textContent = 'Show Markers';
+            document.querySelectorAll('.annotation').forEach(annotation => {
+                annotation.classList.add('annotation-hidden');
+            });
+        }
+    }
+    clearText() {
+        this.elements.textInput.value = '';
+        this.updateCharCount();
+        this.hideResults();
+    }
+    showResults() {
+        this.elements.resultsSection.style.display = 'block';
+    }
+    hideResults() {
+        this.elements.resultsSection.style.display = 'none';
+    }
+    showError(message) {
+        this.elements.errorMessage.textContent = message;
+        this.elements.errorModal.style.display = 'flex';
+    }
+    hideError() {
+        this.elements.errorModal.style.display = 'none';
+        this.elements.errorMessage.textContent = '';
+    }
+}
+// Initialize app when DOM is loaded
+document.addEventListener('DOMContentLoaded', () => {
+    new CEFRApp();
+});

web_app/templates/index.html ADDED Viewed

	@@ -0,0 +1,144 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>CEFR Sentence Level Assessment</title>
+    <link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
+    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
+</head>
+<body>
+    <div class="container">
+        <!-- Header -->
+        <header class="header">
+            <div class="logo">
+                <h1>Swedish sentence-level CEFR analyzer</h1>
+            </div>
+            <div class="github-link">
+                <a href="https://github.com/fanwenlin/swe-cefr-sp" target="_blank">GitHub</a>
+            </div>
+        </header>
+        <!-- Main Content -->
+        <main class="main-content">
+            <!-- Input Section -->
+            <section class="input-section card">
+                <h2>Analyze Swedish Text</h2>
+                <p class="section-description">
+                    Enter Swedish text below to assess the CEFR level of each sentence.
+                    The model will analyze sentence complexity and assign proficiency levels from A1 to C2.
+                </p>
+                <form id="assessment-form">
+                    <div class="form-group">
+                        <label for="text-input" class="form-label">Input Text (Swedish)</label>
+                        <textarea
+                            id="text-input"
+                            name="text"
+                            class="text-input"
+                            placeholder="Skriv din text här... (Write your text here...)
+Example:
+Jag heter Anna. Jag kommer från Sverige. Jag studerar datavetenskap på universitetet."
+                            rows="12"
+                            maxlength="50000"
+                        ></textarea>
+                        <div class="input-hint">
+                            <span id="char-count">0 / 50,000 characters</span>
+                        </div>
+                    </div>
+                    <div class="button-group">
+                        <button type="submit" id="assess-btn" class="btn btn-primary">
+                            <span class="btn-text">Analyze Text</span>
+                            <div class="btn-loader" id="btn-loader"></div>
+                        </button>
+                        <button type="button" id="clear-btn" class="btn btn-secondary">Clear</button>
+                    </div>
+                </form>
+            </section>
+            <!-- Results Section -->
+            <section class="results-section" id="results-section" style="display: none;">
+                <!-- Annotated Text - Main visual focus -->
+                <div class="annotated-text-container card main-result">
+                    <div class="container-header">
+                        <h3>Analyzed Text</h3>
+                        <button id="toggle-highlight" class="btn btn-small btn-outline">Hide Markers</button>
+                    </div>
+                    <div class="annotated-text" id="annotated-text">
+                        <!-- Results will be populated by JavaScript -->
+                    </div>
+                </div>
+                <!-- Compact Stats -->
+                <div class="compact-stats">
+                    <div class="stat-item">
+                        <span class="stat-value" id="total-sentences">0</span>
+                        <span class="stat-name">sentences</span>
+                    </div>
+                    <div class="stat-item">
+                        <span class="stat-value" id="avg-confidence">0%</span>
+                        <span class="stat-name">avg confidence</span>
+                    </div>
+                    <div class="stat-item">
+                        <span class="stat-value" id="dominant-level">-</span>
+                        <span class="stat-name">dominant level</span>
+                    </div>
+                </div>
+                <!-- Level Distribution -->
+                <div class="distribution-container card">
+                    <h3>Level Distribution</h3>
+                    <div class="distribution-bars" id="distribution-bars">
+                        <!-- Bars will be populated by JavaScript -->
+                    </div>
+                </div>
+                <!-- Sentence Table -->
+                <div class="sentence-table-container card">
+                    <h3>Detailed Results</h3>
+                    <div class="table-wrapper">
+                        <table class="sentence-table" id="sentence-table">
+                            <thead>
+                                <tr>
+                                    <th>Sentence</th>
+                                    <th>Level</th>
+                                    <th>Confidence</th>
+                                </tr>
+                            </thead>
+                            <tbody id="sentence-tbody">
+                                <!-- Results will be populated by JavaScript -->
+                            </tbody>
+                        </table>
+                    </div>
+                </div>
+            </section>
+        </main>
+        <!-- Footer -->
+        <footer class="footer">
+            <p>Powered by Metric Proto K3 • Swedish BERT-base Model</p>
+            <p>CEFR Levels: A1 (Beginner) → C2 (Proficient)</p>
+        </footer>
+    </div>
+    <!-- Error Modal -->
+    <div id="error-modal" class="modal" style="display: none;">
+        <div class="modal-content">
+            <div class="modal-header">
+                <h3>Error</h3>
+                <button class="modal-close">&times;</button>
+            </div>
+            <div class="modal-body" id="error-message">
+                <!-- Error message will be populated -->
+            </div>
+            <div class="modal-footer">
+                <button class="btn btn-primary modal-close">OK</button>
+            </div>
+        </div>
+    </div>
+    <script src="{{ url_for('static', filename='js/app.js') }}"></script>
+</body>
+</html>