Spaces:

eventdata-utd
/

ConfliBERT-GUI-v3

Running

Shreyas Meher commited on 19 days ago

Commit

2be141f

1 Parent(s): c0557bb

Add LoRA/QLoRA fine-tuning and active learning

- LoRA/QLoRA support in Fine-tune tab (via PEFT) for parameter-efficient
training with lower VRAM usage; also available in model comparison
- New Active Learning tab with iterative uncertainty-based labeling:
entropy, margin, and least-confidence query strategies, round-by-round
metrics chart, and example dataset
- Add peft and bitsandbytes to requirements.txt
- Update README with new features and quick-start guides
- Fix .gitignore to exclude conflibertr/ directory

Files changed (7) hide show

.gitignore +2 -0
README.md +28 -2
app.py +677 -6
examples/active_learning/pool.txt +61 -0
examples/active_learning/pool_with_labels.tsv +61 -0
examples/active_learning/seed.tsv +20 -0
requirements.txt +2 -0

.gitignore CHANGED Viewed

@@ -6,6 +6,8 @@ env/
 __pycache__/
 *.pyc
 conflibertR/
 screenshots/
 finetuned_model/
 ft_output/

 __pycache__/
 *.pyc
 conflibertR/
+conflibertr/
+al_model/
 screenshots/
 finetuned_model/
 ft_output/

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ Provide a context passage and a question. The model extracts the most relevant a
 ### Fine-tuning
-Train your own binary or multiclass classifier directly in the browser. Upload data (or load a built-in example), pick a base model, configure training, and go. After training, results and a "Try Your Model" panel appear side by side. You can also save the model and run batch predictions.
 ### Model Comparison
@@ -50,7 +50,9 @@ Compare multiple base model architectures on the same dataset. The comparison pr
 <!-- Take a screenshot of the Fine-tune tab and save as screenshots/finetune.png -->
 ![Fine-tune](./screenshots/finetune.png)
 ## Supported Models
@@ -144,7 +146,8 @@ Opens at `http://localhost:7860` and generates a public shareable link. The firs
 | Binary Classification | Conflict vs. non-conflict, supports custom models |
 | Multilabel Classification | Multi-event-type scoring |
 | Question Answering | Extract answers from a context passage |
-| Fine-tune | Train classifiers, compare models, ROC curves |
 ### Fine-tuning Quick Start
@@ -154,6 +157,13 @@ Opens at `http://localhost:7860` and generates a public shareable link. The firs
 4. Review metrics and try your model on new text
 5. Save the model and load it in the **Binary Classification** tab
 ### Model Comparison Quick Start
 1. Upload data (or load an example) in the **Fine-tune** tab
@@ -162,6 +172,16 @@ Opens at `http://localhost:7860` and generates a public shareable link. The firs
 4. Click **"Compare Models"**
 5. View the metrics table, bar chart, and ROC-AUC curves
 ### Data Format
 Tab-separated values (TSV), no header row. Each line: `text<TAB>label`
@@ -209,10 +229,16 @@ conflibert-gui/
       train.tsv          #   0=Diplomacy, 1=Armed Conflict,
       dev.tsv            #   2=Protest, 3=Humanitarian
       test.tsv
 ```
 ## Training Features
 - Early stopping with configurable patience
 - Learning rate schedulers: linear, cosine, constant, constant with warmup
 - Mixed precision training (FP16) on CUDA GPUs

 ### Fine-tuning
+Train your own binary or multiclass classifier directly in the browser. Upload data (or load a built-in example), pick a base model, configure training, and go. Supports **LoRA** and **QLoRA** for parameter-efficient training with lower VRAM usage. After training, results and a "Try Your Model" panel appear side by side. You can also save the model and run batch predictions.
 ### Model Comparison
 <!-- Take a screenshot of the Fine-tune tab and save as screenshots/finetune.png -->
 ![Fine-tune](./screenshots/finetune.png)
+### Active Learning
+Iteratively build a strong classifier with fewer labels. Start with a small labeled seed set and a pool of unlabeled text. The model identifies the most uncertain samples for you to label, retrains, and repeats. Supports entropy, margin, and least-confidence query strategies.
 ## Supported Models
 | Binary Classification | Conflict vs. non-conflict, supports custom models |
 | Multilabel Classification | Multi-event-type scoring |
 | Question Answering | Extract answers from a context passage |
+| Fine-tune | Train classifiers with optional LoRA/QLoRA, compare models, ROC curves |
+| Active Learning | Iterative uncertainty-based labeling and retraining |
 ### Fine-tuning Quick Start
 4. Review metrics and try your model on new text
 5. Save the model and load it in the **Binary Classification** tab
+### LoRA / QLoRA Fine-tuning
+1. Go to the **Fine-tune** tab
+2. Open **Advanced Settings** and check **Use LoRA** (optionally enable **QLoRA** for 4-bit quantization on CUDA GPUs)
+3. Adjust LoRA rank and alpha as needed (defaults of r=8, alpha=16 work well)
+4. Train as usual — LoRA weights are merged back automatically so the saved model works like any other
 ### Model Comparison Quick Start
 1. Upload data (or load an example) in the **Fine-tune** tab
 4. Click **"Compare Models"**
 5. View the metrics table, bar chart, and ROC-AUC curves
+### Active Learning Quick Start
+1. Go to the **Active Learning** tab
+2. Click **"Load Example: Binary Active Learning"** (or upload your own seed + pool)
+3. Configure the query strategy and samples per round
+4. Click **"Initialize Active Learning"**
+5. Label the uncertain samples shown in the table (fill in 0 or 1)
+6. Click **"Submit Labels & Next Round"** to retrain and get the next batch
+7. Repeat until satisfied, then save the model
 ### Data Format
 Tab-separated values (TSV), no header row. Each line: `text<TAB>label`
       train.tsv          #   0=Diplomacy, 1=Armed Conflict,
       dev.tsv            #   2=Protest, 3=Humanitarian
       test.tsv
+    active_learning/     # Example active learning dataset
+      seed.tsv           #   20 labeled seed samples
+      pool.txt           #   61 unlabeled pool texts
+      pool_with_labels.tsv  # Ground truth for pool (cheat sheet)
 ```
 ## Training Features
+- **LoRA / QLoRA** parameter-efficient fine-tuning (via [PEFT](https://github.com/huggingface/peft))
+- **Active learning** with entropy, margin, and least-confidence query strategies
 - Early stopping with configurable patience
 - Learning rate schedulers: linear, cosine, constant, constant with warmup
 - Mixed precision training (FP16) on CUDA GPUs

app.py CHANGED Viewed

@@ -45,6 +45,19 @@ from sklearn.preprocessing import label_binarize
 from torch.utils.data import Dataset as TorchDataset
 import gc
 # ============================================================================
 # CONFIGURATION
@@ -613,6 +626,7 @@ def run_finetuning(
     train_file, dev_file, test_file, task_type, model_display_name,
     epochs, batch_size, lr, weight_decay, warmup_ratio, max_seq_len,
     grad_accum, fp16, patience, scheduler,
     progress=gr.Progress(track_tqdm=True),
 ):
     """Main finetuning function. Returns logs, metrics, model state, and visibility updates."""
@@ -644,9 +658,42 @@ def run_finetuning(
         # Load model and tokenizer
         model_id = FINETUNE_MODELS[model_display_name]
         tokenizer = AutoTokenizer.from_pretrained(model_id)
-        model = AutoModelForSequenceClassification.from_pretrained(
-            model_id, num_labels=num_labels
-        )
         # Create datasets
         train_ds = TextClassificationDataset(
@@ -709,6 +756,10 @@ def run_finetuning(
         test_results = trainer.evaluate(test_ds, metric_key_prefix='test')
         # Build log text
         header = (
             f"=== Configuration ===\n"
             f"Model: {model_display_name}\n"
@@ -716,6 +767,7 @@ def run_finetuning(
             f"Task:  {task_type} Classification ({num_labels} classes)\n"
             f"Data:  {len(train_texts)} train / {len(dev_texts)} dev / {len(test_texts)} test\n"
             f"Epochs: {epochs}  Batch: {batch_size}  LR: {lr}  Scheduler: {scheduler}\n"
             f"\n=== Training Log ===\n"
         )
         runtime = train_result.metrics.get('train_runtime', 0)
@@ -733,8 +785,11 @@ def run_finetuning(
                 metrics_data.append([name, f"{float(v):.4f}"])
         metrics_df = pd.DataFrame(metrics_data, columns=['Metric', 'Score'])
-        # Move trained model to CPU for inference
-        trained_model = trainer.model.cpu()
         trained_model.eval()
         return (
@@ -864,9 +919,412 @@ def load_example_multiclass():
     )
 def run_comparison(
     train_file, dev_file, test_file, task_type, selected_models,
-    epochs, batch_size, lr,
     progress=gr.Progress(track_tqdm=True),
 ):
     """Train multiple models on the same data and compare performance + ROC curves."""
@@ -913,6 +1371,18 @@ def run_comparison(
                 model = AutoModelForSequenceClassification.from_pretrained(
                     model_id, num_labels=num_labels,
                 )
                 train_ds = TextClassificationDataset(train_texts, train_labels, tokenizer, 512)
                 dev_ds = TextClassificationDataset(dev_texts, dev_labels, tokenizer, 512)
                 test_ds = TextClassificationDataset(test_texts, test_labels, tokenizer, 512)
@@ -949,6 +1419,10 @@ def run_comparison(
                 train_result = trainer.train()
                 # Get predictions for ROC curves
                 pred_output = trainer.predict(test_ds)
                 logits = pred_output.predictions
@@ -1194,6 +1668,10 @@ with gr.Blocks(theme=theme, css=custom_css, title="ConfliBERT") as demo:
                         "8. Save the model and load it later in the "
                         "Classification tab\n\n"
                         "**Advanced features:**\n"
                         "- Early stopping with configurable patience\n"
                         "- Learning rate schedulers (linear, cosine, constant)\n"
                         "- Mixed precision training (FP16 on CUDA GPUs)\n"
@@ -1429,6 +1907,22 @@ with gr.Blocks(theme=theme, css=custom_css, title="ConfliBERT") as demo:
                     ["linear", "cosine", "constant", "constant_with_warmup"],
                     label="LR Scheduler", value="linear",
                 )
             # -- Train --
             ft_train_btn = gr.Button(
@@ -1502,6 +1996,10 @@ with gr.Blocks(theme=theme, css=custom_css, title="ConfliBERT") as demo:
                     cmp_epochs = gr.Number(label="Epochs", value=3, minimum=1, precision=0)
                     cmp_batch = gr.Number(label="Batch Size", value=8, minimum=1, precision=0)
                     cmp_lr = gr.Number(label="Learning Rate", value=2e-5, minimum=1e-7)
                 cmp_btn = gr.Button("Compare Models", variant="primary")
                 cmp_log = gr.Textbox(
                     label="Comparison Log", lines=8,
@@ -1514,6 +2012,135 @@ with gr.Blocks(theme=theme, css=custom_css, title="ConfliBERT") as demo:
                     cmp_plot = gr.Plot(label="Metrics Comparison")
                     cmp_roc = gr.Plot(label="ROC Curves")
     # ---- FOOTER ----
     gr.Markdown(
         "<div style='text-align: center; padding: 1rem 0; margin-top: 0.5rem; "
@@ -1600,6 +2227,7 @@ with gr.Blocks(theme=theme, css=custom_css, title="ConfliBERT") as demo:
             ft_epochs, ft_batch, ft_lr,
             ft_weight_decay, ft_warmup, ft_max_len,
             ft_grad_accum, ft_fp16, ft_patience, ft_scheduler,
         ],
         outputs=[
             ft_log, ft_metrics,
@@ -1630,12 +2258,55 @@ with gr.Blocks(theme=theme, css=custom_css, title="ConfliBERT") as demo:
         outputs=[ft_batch_out],
     )
     # Model comparison
     cmp_btn.click(
         fn=run_comparison,
         inputs=[
             ft_train_file, ft_dev_file, ft_test_file,
             ft_task, cmp_models, cmp_epochs, cmp_batch, cmp_lr,
         ],
         outputs=[cmp_log, cmp_table, cmp_plot, cmp_roc, cmp_results_col],
         concurrency_limit=1,

 from torch.utils.data import Dataset as TorchDataset
 import gc
+# LoRA / QLoRA support (optional)
+try:
+    from peft import LoraConfig, get_peft_model, TaskType
+    PEFT_AVAILABLE = True
+except ImportError:
+    PEFT_AVAILABLE = False
+try:
+    from transformers import BitsAndBytesConfig
+    BNB_AVAILABLE = True
+except ImportError:
+    BNB_AVAILABLE = False
 # ============================================================================
 # CONFIGURATION
     train_file, dev_file, test_file, task_type, model_display_name,
     epochs, batch_size, lr, weight_decay, warmup_ratio, max_seq_len,
     grad_accum, fp16, patience, scheduler,
+    use_lora, lora_rank, lora_alpha, use_qlora,
     progress=gr.Progress(track_tqdm=True),
 ):
     """Main finetuning function. Returns logs, metrics, model state, and visibility updates."""
         # Load model and tokenizer
         model_id = FINETUNE_MODELS[model_display_name]
         tokenizer = AutoTokenizer.from_pretrained(model_id)
+        lora_active = False
+        if use_qlora:
+            if not (PEFT_AVAILABLE and BNB_AVAILABLE and torch.cuda.is_available()):
+                raise ValueError(
+                    "QLoRA requires a CUDA GPU and the peft + bitsandbytes packages."
+                )
+            bnb_config = BitsAndBytesConfig(
+                load_in_4bit=True,
+                bnb_4bit_quant_type="nf4",
+                bnb_4bit_compute_dtype=torch.float16,
+                bnb_4bit_use_double_quant=True,
+            )
+            model = AutoModelForSequenceClassification.from_pretrained(
+                model_id, num_labels=num_labels, quantization_config=bnb_config,
+            )
+        else:
+            model = AutoModelForSequenceClassification.from_pretrained(
+                model_id, num_labels=num_labels,
+            )
+        if use_lora or use_qlora:
+            if not PEFT_AVAILABLE:
+                raise ValueError(
+                    "LoRA requires the 'peft' package. Install: pip install peft"
+                )
+            lora_config = LoraConfig(
+                task_type=TaskType.SEQ_CLS,
+                r=int(lora_rank),
+                lora_alpha=int(lora_alpha),
+                lora_dropout=0.1,
+                bias="none",
+            )
+            model.enable_input_require_grads()
+            model = get_peft_model(model, lora_config)
+            lora_active = True
         # Create datasets
         train_ds = TextClassificationDataset(
         test_results = trainer.evaluate(test_ds, metric_key_prefix='test')
         # Build log text
+        lora_info = ""
+        if lora_active:
+            method = "QLoRA (4-bit)" if use_qlora else "LoRA"
+            lora_info = f"PEFT:  {method}  r={int(lora_rank)}  alpha={int(lora_alpha)}\n"
         header = (
             f"=== Configuration ===\n"
             f"Model: {model_display_name}\n"
             f"Task:  {task_type} Classification ({num_labels} classes)\n"
             f"Data:  {len(train_texts)} train / {len(dev_texts)} dev / {len(test_texts)} test\n"
             f"Epochs: {epochs}  Batch: {batch_size}  LR: {lr}  Scheduler: {scheduler}\n"
+            f"{lora_info}"
             f"\n=== Training Log ===\n"
         )
         runtime = train_result.metrics.get('train_runtime', 0)
                 metrics_data.append([name, f"{float(v):.4f}"])
         metrics_df = pd.DataFrame(metrics_data, columns=['Metric', 'Score'])
+        # Merge LoRA weights back into base model for clean save/inference
+        trained_model = trainer.model
+        if lora_active and hasattr(trained_model, 'merge_and_unload'):
+            trained_model = trained_model.merge_and_unload()
+        trained_model = trained_model.cpu()
         trained_model.eval()
         return (
     )
+# ============================================================================
+# ACTIVE LEARNING
+# ============================================================================
+def parse_pool_file(file_path):
+    """Parse an unlabeled text pool. Accepts CSV with 'text' column, or one text per line."""
+    path = get_path(file_path)
+    # Try CSV/TSV with 'text' column first
+    try:
+        df = pd.read_csv(path)
+        if 'text' in df.columns:
+            texts = [str(t) for t in df['text'].dropna().tolist()]
+            if texts:
+                return texts
+    except Exception:
+        pass
+    # Fallback: one text per line
+    texts = []
+    with open(path, 'r', encoding='utf-8') as f:
+        for line in f:
+            line = line.strip()
+            if line:
+                texts.append(line)
+    if not texts:
+        raise ValueError("No texts found in pool file.")
+    return texts
+def compute_uncertainty(model, tokenizer, texts, strategy='entropy',
+                        max_seq_len=512, batch_size=32):
+    """Compute uncertainty scores for unlabeled texts. Higher = more uncertain."""
+    model.eval()
+    dev = next(model.parameters()).device
+    scores = []
+    for i in range(0, len(texts), batch_size):
+        batch_texts = texts[i:i + batch_size]
+        inputs = tokenizer(
+            batch_texts, return_tensors='pt', truncation=True,
+            padding=True, max_length=max_seq_len,
+        )
+        inputs = {k: v.to(dev) for k, v in inputs.items()}
+        with torch.no_grad():
+            logits = model(**inputs).logits
+        probs = torch.softmax(logits, dim=1).cpu().numpy()
+        if strategy == 'entropy':
+            s = -np.sum(probs * np.log(probs + 1e-10), axis=1)
+        elif strategy == 'margin':
+            sorted_p = np.sort(probs, axis=1)
+            s = -(sorted_p[:, -1] - sorted_p[:, -2])
+        else:  # least_confidence
+            s = -np.max(probs, axis=1)
+        scores.extend(s.tolist())
+    return scores
+def _build_al_metrics_chart(metrics_history, task_type):
+    """Build a Plotly chart of active-learning metrics across rounds."""
+    import plotly.graph_objects as go
+    if not metrics_history:
+        return None
+    rounds = [m['round'] for m in metrics_history]
+    train_sizes = [m.get('train_size', 0) for m in metrics_history]
+    metric_keys = (['f1', 'accuracy', 'precision', 'recall']
+                    if task_type == 'Binary'
+                    else ['f1_macro', 'accuracy'])
+    fig = go.Figure()
+    colors = ['#ff6b35', '#3b82f6', '#10b981', '#8b5cf6']
+    for i, key in enumerate(metric_keys):
+        values = [m.get(key) for m in metrics_history]
+        if any(v is not None for v in values):
+            fig.add_trace(go.Scatter(
+                x=rounds, y=values, mode='lines+markers',
+                name=key.replace('_', ' ').title(),
+                line=dict(color=colors[i % len(colors)], width=2),
+            ))
+    fig.add_trace(go.Bar(
+        x=rounds, y=train_sizes, name='Train Size',
+        marker_color='rgba(200,200,200,0.4)', yaxis='y2',
+    ))
+    fig.update_layout(
+        xaxis_title='Round', yaxis_title='Score', yaxis_range=[0, 1.05],
+        yaxis2=dict(title='Train Size', overlaying='y', side='right'),
+        template='plotly_white',
+        legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1),
+        height=350, margin=dict(t=40, b=40),
+    )
+    return fig
+def _train_al_model(texts, labels, num_labels, dev_texts, dev_labels,
+                    task_type, model_id, epochs, batch_size, lr, max_seq_len,
+                    use_lora, lora_rank, lora_alpha):
+    """Train a model for one active-learning round. Returns (model, tokenizer, eval_metrics)."""
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    model = AutoModelForSequenceClassification.from_pretrained(
+        model_id, num_labels=num_labels,
+    )
+    if use_lora and PEFT_AVAILABLE:
+        lora_cfg = LoraConfig(
+            task_type=TaskType.SEQ_CLS,
+            r=int(lora_rank), lora_alpha=int(lora_alpha),
+            lora_dropout=0.1, bias="none",
+        )
+        model.enable_input_require_grads()
+        model = get_peft_model(model, lora_cfg)
+    train_ds = TextClassificationDataset(texts, labels, tokenizer, max_seq_len)
+    dev_ds = None
+    if dev_texts is not None:
+        dev_ds = TextClassificationDataset(dev_texts, dev_labels, tokenizer, max_seq_len)
+    output_dir = tempfile.mkdtemp(prefix='conflibert_al_')
+    training_args = TrainingArguments(
+        output_dir=output_dir,
+        num_train_epochs=epochs,
+        per_device_train_batch_size=batch_size,
+        per_device_eval_batch_size=batch_size * 2,
+        learning_rate=lr,
+        weight_decay=0.01,
+        warmup_ratio=0.1,
+        eval_strategy='epoch' if dev_ds else 'no',
+        save_strategy='no',
+        logging_steps=10,
+        report_to='none',
+        seed=42,
+    )
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=train_ds,
+        eval_dataset=dev_ds,
+        compute_metrics=make_compute_metrics(task_type) if dev_ds else None,
+    )
+    trainer.train()
+    eval_metrics = {}
+    if dev_ds:
+        results = trainer.evaluate()
+        for k, v in results.items():
+            if isinstance(v, (int, float, np.floating)):
+                eval_metrics[k.replace('eval_', '')] = round(float(v), 4)
+    trained_model = trainer.model
+    if use_lora and PEFT_AVAILABLE and hasattr(trained_model, 'merge_and_unload'):
+        trained_model = trained_model.merge_and_unload()
+    return trained_model, tokenizer, eval_metrics
+def al_initialize(
+    seed_file, pool_file, dev_file, task_type, model_display_name,
+    query_strategy, query_size, epochs, batch_size, lr, max_seq_len,
+    use_lora, lora_rank, lora_alpha,
+    progress=gr.Progress(track_tqdm=True),
+):
+    """Initialize active learning: train on seed data, query first uncertain batch."""
+    try:
+        if seed_file is None or pool_file is None:
+            raise ValueError("Upload both a labeled seed file and an unlabeled pool file.")
+        seed_texts, seed_labels, num_labels = parse_data_file(seed_file)
+        pool_texts = parse_pool_file(pool_file)
+        dev_texts, dev_labels = None, None
+        if dev_file is not None:
+            dev_texts, dev_labels, _ = parse_data_file(dev_file)
+        if task_type == "Binary":
+            num_labels = 2
+        query_size = int(query_size)
+        model_id = FINETUNE_MODELS[model_display_name]
+        trained_model, tokenizer, eval_metrics = _train_al_model(
+            seed_texts, seed_labels, num_labels, dev_texts, dev_labels,
+            task_type, model_id, int(epochs), int(batch_size), lr,
+            int(max_seq_len), use_lora, lora_rank, lora_alpha,
+        )
+        # Build round-0 metrics
+        round_metrics = {'round': 0, 'train_size': len(seed_texts)}
+        round_metrics.update(eval_metrics)
+        # Query uncertain samples from pool
+        scores = compute_uncertainty(
+            trained_model, tokenizer, pool_texts, query_strategy, int(max_seq_len),
+        )
+        top_indices = np.argsort(scores)[-query_size:][::-1].tolist()
+        query_texts_batch = [pool_texts[i] for i in top_indices]
+        annotation_df = pd.DataFrame({
+            'Text': query_texts_batch,
+            'Label': [''] * len(query_texts_batch),
+        })
+        al_state = {
+            'labeled_texts': list(seed_texts),
+            'labeled_labels': list(seed_labels),
+            'pool_texts': pool_texts,
+            'pool_available': [i for i in range(len(pool_texts)) if i not in set(top_indices)],
+            'current_query_indices': top_indices,
+            'dev_texts': dev_texts,
+            'dev_labels': dev_labels,
+            'num_labels': num_labels,
+            'round': 1,
+            'metrics_history': [round_metrics],
+            'model_id': model_id,
+            'model_display_name': model_display_name,
+            'task_type': task_type,
+            'query_strategy': query_strategy,
+            'query_size': query_size,
+            'epochs': int(epochs),
+            'batch_size': int(batch_size),
+            'lr': lr,
+            'max_seq_len': int(max_seq_len),
+            'use_lora': use_lora,
+            'lora_rank': int(lora_rank) if use_lora else 8,
+            'lora_alpha': int(lora_alpha) if use_lora else 16,
+        }
+        trained_model = trained_model.cpu()
+        trained_model.eval()
+        log_text = (
+            f"=== Active Learning Initialized ===\n"
+            f"Seed: {len(seed_texts)} labeled  |  Pool: {len(pool_texts)} unlabeled\n"
+            f"Model: {model_display_name}\n"
+            f"Strategy: {query_strategy}  |  Samples/round: {query_size}\n\n"
+            f"--- Round 0 (seed) ---\n"
+            f"Train size: {len(seed_texts)}\n"
+        )
+        for k, v in eval_metrics.items():
+            log_text += f"  {k}: {v}\n"
+        log_text += (
+            f"\n--- Round 1: {len(query_texts_batch)} samples queried ---\n"
+            f"Label the samples below, then click 'Submit Labels & Next Round'.\n"
+        )
+        chart = _build_al_metrics_chart([round_metrics], task_type)
+        return (
+            al_state, trained_model, tokenizer,
+            annotation_df, log_text, chart,
+            gr.Column(visible=True),
+        )
+    except Exception as e:
+        return (
+            {}, None, None,
+            pd.DataFrame(columns=['Text', 'Label']),
+            f"Initialization failed:\n{str(e)}",
+            None,
+            gr.Column(visible=False),
+        )
+def al_submit_and_continue(
+    annotation_df, al_state, al_model, al_tokenizer, prev_log,
+    progress=gr.Progress(track_tqdm=True),
+):
+    """Accept user labels, retrain, query next uncertain batch."""
+    try:
+        if not al_state or al_model is None:
+            raise ValueError("No active session. Initialize first.")
+        new_texts = annotation_df['Text'].tolist()
+        new_labels = []
+        for i, raw in enumerate(annotation_df['Label'].tolist()):
+            s = str(raw).strip()
+            if s in ('', 'nan'):
+                raise ValueError(f"Row {i + 1} has no label. Label all samples first.")
+            new_labels.append(int(s))
+        num_labels = al_state['num_labels']
+        for l in new_labels:
+            if l < 0 or l >= num_labels:
+                raise ValueError(f"Label {l} out of range [0, {num_labels - 1}].")
+        # Add newly labeled samples
+        al_state['labeled_texts'].extend(new_texts)
+        al_state['labeled_labels'].extend(new_labels)
+        queried_set = set(al_state['current_query_indices'])
+        al_state['pool_available'] = [
+            i for i in al_state['pool_available'] if i not in queried_set
+        ]
+        current_round = al_state['round']
+        # Retrain on all labeled data
+        trained_model, tokenizer, eval_metrics = _train_al_model(
+            al_state['labeled_texts'], al_state['labeled_labels'],
+            num_labels, al_state['dev_texts'], al_state['dev_labels'],
+            al_state['task_type'], al_state['model_id'],
+            al_state['epochs'], al_state['batch_size'], al_state['lr'],
+            al_state['max_seq_len'], al_state['use_lora'],
+            al_state['lora_rank'], al_state['lora_alpha'],
+        )
+        round_metrics = {
+            'round': current_round,
+            'train_size': len(al_state['labeled_texts']),
+        }
+        round_metrics.update(eval_metrics)
+        al_state['metrics_history'].append(round_metrics)
+        # Query next batch from remaining pool
+        remaining_pool = al_state['pool_available']
+        remaining_texts = [al_state['pool_texts'][i] for i in remaining_pool]
+        log_add = (
+            f"\n--- Round {current_round} complete ---\n"
+            f"Added {len(new_labels)} labels  |  "
+            f"Total train: {len(al_state['labeled_texts'])}\n"
+        )
+        for k, v in eval_metrics.items():
+            log_add += f"  {k}: {v}\n"
+        if remaining_texts:
+            scores = compute_uncertainty(
+                trained_model, tokenizer, remaining_texts,
+                al_state['query_strategy'], al_state['max_seq_len'],
+            )
+            q = min(al_state['query_size'], len(remaining_texts))
+            top_local = np.argsort(scores)[-q:][::-1].tolist()
+            top_pool_indices = [remaining_pool[i] for i in top_local]
+            query_texts = [al_state['pool_texts'][i] for i in top_pool_indices]
+            al_state['current_query_indices'] = top_pool_indices
+            al_state['round'] = current_round + 1
+            annotation_out = pd.DataFrame({
+                'Text': query_texts,
+                'Label': [''] * len(query_texts),
+            })
+            pool_left = len(remaining_pool) - len(top_pool_indices)
+            log_add += (
+                f"Pool remaining: {pool_left}\n"
+                f"\n--- Round {current_round + 1}: {len(query_texts)} samples queried ---\n"
+            )
+        else:
+            annotation_out = pd.DataFrame(columns=['Text', 'Label'])
+            al_state['current_query_indices'] = []
+            al_state['round'] = current_round + 1
+            log_add += "\nPool exhausted. Active learning complete!\n"
+        trained_model = trained_model.cpu()
+        trained_model.eval()
+        chart = _build_al_metrics_chart(al_state['metrics_history'], al_state['task_type'])
+        log_text = prev_log + log_add
+        return (
+            al_state, trained_model, tokenizer,
+            annotation_out, log_text, chart,
+        )
+    except Exception as e:
+        return (
+            al_state, al_model, al_tokenizer,
+            pd.DataFrame(columns=['Text', 'Label']),
+            prev_log + f"\nError: {str(e)}\n",
+            None,
+        )
+def al_save_model(save_path, al_model, al_tokenizer):
+    """Save the active-learning model to disk."""
+    if al_model is None:
+        return "No model to save. Run at least one round first."
+    if not save_path:
+        return "Please specify a save directory."
+    try:
+        os.makedirs(save_path, exist_ok=True)
+        al_model.save_pretrained(save_path)
+        al_tokenizer.save_pretrained(save_path)
+        return f"Model saved to: {save_path}"
+    except Exception as e:
+        return f"Error saving model: {str(e)}"
+def load_example_active_learning():
+    """Load the active learning example dataset."""
+    return (
+        os.path.join(EXAMPLES_DIR, "active_learning", "seed.tsv"),
+        os.path.join(EXAMPLES_DIR, "active_learning", "pool.txt"),
+        os.path.join(EXAMPLES_DIR, "binary", "dev.tsv"),
+        "Binary",
+    )
 def run_comparison(
     train_file, dev_file, test_file, task_type, selected_models,
+    epochs, batch_size, lr, cmp_use_lora, cmp_lora_rank, cmp_lora_alpha,
     progress=gr.Progress(track_tqdm=True),
 ):
     """Train multiple models on the same data and compare performance + ROC curves."""
                 model = AutoModelForSequenceClassification.from_pretrained(
                     model_id, num_labels=num_labels,
                 )
+                cmp_lora_active = False
+                if cmp_use_lora and PEFT_AVAILABLE:
+                    lora_cfg = LoraConfig(
+                        task_type=TaskType.SEQ_CLS,
+                        r=int(cmp_lora_rank), lora_alpha=int(cmp_lora_alpha),
+                        lora_dropout=0.1, bias="none",
+                    )
+                    model.enable_input_require_grads()
+                    model = get_peft_model(model, lora_cfg)
+                    cmp_lora_active = True
                 train_ds = TextClassificationDataset(train_texts, train_labels, tokenizer, 512)
                 dev_ds = TextClassificationDataset(dev_texts, dev_labels, tokenizer, 512)
                 test_ds = TextClassificationDataset(test_texts, test_labels, tokenizer, 512)
                 train_result = trainer.train()
+                # Merge LoRA weights before prediction
+                if cmp_lora_active and hasattr(trainer.model, 'merge_and_unload'):
+                    trainer.model = trainer.model.merge_and_unload()
                 # Get predictions for ROC curves
                 pred_output = trainer.predict(test_ds)
                 logits = pred_output.predictions
                         "8. Save the model and load it later in the "
                         "Classification tab\n\n"
                         "**Advanced features:**\n"
+                        "- **LoRA / QLoRA** for parameter-efficient training "
+                        "(lower VRAM, faster)\n"
+                        "- **Active Learning** tab for iterative labeling "
+                        "with uncertainty sampling\n"
                         "- Early stopping with configurable patience\n"
                         "- Learning rate schedulers (linear, cosine, constant)\n"
                         "- Mixed precision training (FP16 on CUDA GPUs)\n"
                     ["linear", "cosine", "constant", "constant_with_warmup"],
                     label="LR Scheduler", value="linear",
                 )
+                gr.Markdown("**Parameter-Efficient Fine-Tuning (PEFT)**")
+                with gr.Row():
+                    ft_use_lora = gr.Checkbox(
+                        label="Use LoRA", value=False,
+                    )
+                    ft_lora_rank = gr.Number(
+                        label="LoRA Rank (r)", value=8,
+                        minimum=1, maximum=256, precision=0,
+                    )
+                    ft_lora_alpha = gr.Number(
+                        label="LoRA Alpha", value=16,
+                        minimum=1, maximum=512, precision=0,
+                    )
+                    ft_use_qlora = gr.Checkbox(
+                        label="QLoRA (4-bit, CUDA only)", value=False,
+                    )
             # -- Train --
             ft_train_btn = gr.Button(
                     cmp_epochs = gr.Number(label="Epochs", value=3, minimum=1, precision=0)
                     cmp_batch = gr.Number(label="Batch Size", value=8, minimum=1, precision=0)
                     cmp_lr = gr.Number(label="Learning Rate", value=2e-5, minimum=1e-7)
+                with gr.Row():
+                    cmp_use_lora = gr.Checkbox(label="Use LoRA", value=False)
+                    cmp_lora_rank = gr.Number(label="LoRA Rank", value=8, minimum=1, maximum=256, precision=0)
+                    cmp_lora_alpha = gr.Number(label="LoRA Alpha", value=16, minimum=1, maximum=512, precision=0)
                 cmp_btn = gr.Button("Compare Models", variant="primary")
                 cmp_log = gr.Textbox(
                     label="Comparison Log", lines=8,
                     cmp_plot = gr.Plot(label="Metrics Comparison")
                     cmp_roc = gr.Plot(label="ROC Curves")
+        # ================================================================
+        # ACTIVE LEARNING TAB
+        # ================================================================
+        with gr.Tab("Active Learning"):
+            gr.Markdown(info_callout(
+                "**Active learning** iteratively selects the most uncertain "
+                "samples from an unlabeled pool for you to label, then retrains. "
+                "This lets you build a strong classifier with far fewer labels."
+            ))
+            # -- Data --
+            gr.Markdown("### Data")
+            gr.Markdown(
+                "**Seed file** — small labeled set (TSV, `text[TAB]label`).  \n"
+                "**Pool file** — unlabeled texts (one per line, or CSV with `text` column).  \n"
+                "**Dev file** *(optional)* — held-out labeled set to track metrics."
+            )
+            al_ex_btn = gr.Button(
+                "Load Example: Binary Active Learning",
+                variant="secondary", size="sm",
+            )
+            with gr.Row():
+                al_seed_file = gr.File(
+                    label="Labeled Seed (TSV)",
+                    file_types=[".tsv", ".csv", ".txt"],
+                )
+                al_pool_file = gr.File(
+                    label="Unlabeled Pool",
+                    file_types=[".tsv", ".csv", ".txt"],
+                )
+                al_dev_file = gr.File(
+                    label="Dev / Validation (optional)",
+                    file_types=[".tsv", ".csv", ".txt"],
+                )
+            # -- Configuration --
+            gr.Markdown("### Configuration")
+            with gr.Row():
+                al_task = gr.Radio(
+                    ["Binary", "Multiclass"],
+                    label="Task Type", value="Binary",
+                )
+                al_model_dd = gr.Dropdown(
+                    choices=list(FINETUNE_MODELS.keys()),
+                    label="Base Model",
+                    value=list(FINETUNE_MODELS.keys())[0],
+                )
+            with gr.Row():
+                al_strategy = gr.Dropdown(
+                    ["entropy", "margin", "least_confidence"],
+                    label="Query Strategy", value="entropy",
+                )
+                al_query_size = gr.Number(
+                    label="Samples per Round", value=20,
+                    minimum=1, maximum=500, precision=0,
+                )
+            with gr.Row():
+                al_epochs = gr.Number(
+                    label="Epochs per Round", value=3,
+                    minimum=1, maximum=50, precision=0,
+                )
+                al_batch_size = gr.Number(
+                    label="Batch Size", value=8,
+                    minimum=1, maximum=128, precision=0,
+                )
+                al_lr = gr.Number(
+                    label="Learning Rate", value=2e-5,
+                    minimum=1e-7, maximum=1e-2,
+                )
+            with gr.Accordion("Advanced", open=False):
+                with gr.Row():
+                    al_max_len = gr.Number(
+                        label="Max Sequence Length", value=512,
+                        minimum=32, maximum=8192, precision=0,
+                    )
+                    al_use_lora = gr.Checkbox(label="Use LoRA", value=False)
+                    al_lora_rank = gr.Number(
+                        label="LoRA Rank", value=8,
+                        minimum=1, maximum=256, precision=0,
+                    )
+                    al_lora_alpha = gr.Number(
+                        label="LoRA Alpha", value=16,
+                        minimum=1, maximum=512, precision=0,
+                    )
+            al_init_btn = gr.Button(
+                "Initialize Active Learning", variant="primary", size="lg",
+            )
+            # -- State --
+            al_state = gr.State({})
+            al_model_state = gr.State(None)
+            al_tokenizer_state = gr.State(None)
+            with gr.Accordion("Log", open=False):
+                al_log = gr.Textbox(
+                    lines=12, interactive=False, elem_classes="log-output",
+                    show_label=False,
+                )
+            # -- Annotation panel (hidden until init) --
+            with gr.Column(visible=False) as al_annotation_col:
+                gr.Markdown("### Label These Samples")
+                gr.Markdown(
+                    "Fill in the **Label** column with integer class labels "
+                    "(e.g. 0 or 1 for binary). Then click **Submit**."
+                )
+                al_annotation_df = gr.Dataframe(
+                    headers=["Text", "Label"],
+                    interactive=True,
+                    wrap=True,
+                    row_count=(1, "dynamic"),
+                )
+                with gr.Row():
+                    al_submit_btn = gr.Button(
+                        "Submit Labels & Next Round",
+                        variant="primary",
+                    )
+                al_chart = gr.Plot(label="Metrics Across Rounds")
+                gr.Markdown("### Save Model")
+                with gr.Row():
+                    al_save_path = gr.Textbox(
+                        label="Save Directory", value="./al_model",
+                    )
+                    al_save_btn = gr.Button("Save", variant="secondary")
+                    al_save_status = gr.Markdown("")
     # ---- FOOTER ----
     gr.Markdown(
         "<div style='text-align: center; padding: 1rem 0; margin-top: 0.5rem; "
             ft_epochs, ft_batch, ft_lr,
             ft_weight_decay, ft_warmup, ft_max_len,
             ft_grad_accum, ft_fp16, ft_patience, ft_scheduler,
+            ft_use_lora, ft_lora_rank, ft_lora_alpha, ft_use_qlora,
         ],
         outputs=[
             ft_log, ft_metrics,
         outputs=[ft_batch_out],
     )
+    # Active Learning: example loader
+    al_ex_btn.click(
+        fn=load_example_active_learning,
+        outputs=[al_seed_file, al_pool_file, al_dev_file, al_task],
+    )
+    # Active Learning
+    al_init_btn.click(
+        fn=al_initialize,
+        inputs=[
+            al_seed_file, al_pool_file, al_dev_file,
+            al_task, al_model_dd, al_strategy, al_query_size,
+            al_epochs, al_batch_size, al_lr, al_max_len,
+            al_use_lora, al_lora_rank, al_lora_alpha,
+        ],
+        outputs=[
+            al_state, al_model_state, al_tokenizer_state,
+            al_annotation_df, al_log, al_chart,
+            al_annotation_col,
+        ],
+        concurrency_limit=1,
+    )
+    al_submit_btn.click(
+        fn=al_submit_and_continue,
+        inputs=[
+            al_annotation_df, al_state, al_model_state, al_tokenizer_state,
+            al_log,
+        ],
+        outputs=[
+            al_state, al_model_state, al_tokenizer_state,
+            al_annotation_df, al_log, al_chart,
+        ],
+        concurrency_limit=1,
+    )
+    al_save_btn.click(
+        fn=al_save_model,
+        inputs=[al_save_path, al_model_state, al_tokenizer_state],
+        outputs=[al_save_status],
+    )
     # Model comparison
     cmp_btn.click(
         fn=run_comparison,
         inputs=[
             ft_train_file, ft_dev_file, ft_test_file,
             ft_task, cmp_models, cmp_epochs, cmp_batch, cmp_lr,
+            cmp_use_lora, cmp_lora_rank, cmp_lora_alpha,
         ],
         outputs=[cmp_log, cmp_table, cmp_plot, cmp_roc, cmp_results_col],
         concurrency_limit=1,

examples/active_learning/pool.txt ADDED Viewed

	@@ -0,0 +1,61 @@

+A car bomb exploded near a military checkpoint killing at least twelve soldiers
+The oceanographic institute published research on coral reef restoration
+Annual tourism numbers reached an all-time high at the coastal resorts
+Gunmen opened fire on a convoy of government officials killing two bodyguards
+Cross-border shelling between the two nations continued for the third consecutive day
+Public transit ridership increased following improvements to the subway system
+Armed bandits attacked a refugee camp displacing thousands of people
+The guerrilla fighters ambushed a supply convoy on the main highway
+The bakery chain announced plans to expand into twelve new locations
+The city hosted a successful international food and wine festival
+The university announced a new scholarship program for students in engineering
+A new study found that regular exercise significantly reduces heart disease risk
+The national football team secured a convincing victory in the qualifying match
+The rebel forces captured a strategic town after weeks of intense battles
+The solar energy project is expected to power thousands of homes by year end
+Insurgents attacked a police station in the capital overnight leaving several officers wounded
+Military helicopters were deployed to support ground troops fighting in the eastern region
+The opposition forces breached the defensive perimeter around the government compound
+A mortar attack on the military base resulted in significant casualties
+The technology company unveiled its latest smartphone with improved camera capabilities
+Government aircraft bombed suspected rebel strongholds in the mountainous region
+Security forces conducted raids targeting suspected members of the armed opposition
+A suicide bomber detonated explosives at a crowded marketplace injuring dozens of civilians
+Local farmers reported an excellent harvest this season due to favorable weather
+The agricultural ministry launched a program to support organic farming
+An airstrike destroyed a weapons depot used by the insurgent group
+The orchestra performed a sold-out concert of works by contemporary composers
+The opposing forces exchanged heavy gunfire throughout the night
+The gaming company released a new title that quickly became a bestseller
+The swimming team broke the national record at the regional championships
+The government declared a state of emergency following widespread political violence
+The city council approved plans for a new public park in the downtown area
+Government forces launched an offensive against rebel positions in the northern province early this morning
+Researchers published findings on a promising treatment for a rare disorder
+A popular streaming service announced an original series based on the classic novel
+A drone strike targeted a meeting of senior militant commanders
+The winter ski season opened early due to heavy snowfall in the mountains
+The museum opened a new exhibition showcasing contemporary sculpture and painting
+The separatist movement launched coordinated attacks on government installations
+An improvised explosive device was found near the parliament building
+The film festival announced its lineup featuring works from emerging directors
+Temperatures are expected to reach record highs this weekend according to forecasters
+Scientists discovered a new species of deep-sea fish in the Pacific Ocean
+Two soldiers were killed when their vehicle struck a landmine on a rural road
+The cycling tour attracted international competitors to the coastal route
+The pharmaceutical company received approval for a new vaccine formulation
+A grenade attack on a busy intersection killed four people and wounded many more
+Ethnic tensions erupted into open violence as rival communities clashed in the market
+The hospital inaugurated a state-of-the-art wing dedicated to pediatric care
+The cookbook featuring traditional regional recipes became an unexpected bestseller
+Heavy fighting broke out between rival armed factions in the disputed border region
+The automotive company revealed plans to launch three new electric vehicle models
+Armed men attacked a village killing several residents and burning homes
+The annual science fair showcased innovative projects by high school students
+A roadside bomb targeted a military patrol wounding three soldiers
+An explosion at a government building was attributed to opposition fighters
+Armed opposition forces shelled the outskirts of the capital city
+Fighting between government troops and rebels displaced thousands of families
+Coalition forces conducted a night raid capturing several high-value targets
+The militant group claimed responsibility for the ambush on a military convoy
+The armed group kidnapped aid workers operating in the conflict zone

examples/active_learning/pool_with_labels.tsv ADDED Viewed

	@@ -0,0 +1,61 @@

+A car bomb exploded near a military checkpoint killing at least twelve soldiers	1
+The oceanographic institute published research on coral reef restoration	0
+Annual tourism numbers reached an all-time high at the coastal resorts	0
+Gunmen opened fire on a convoy of government officials killing two bodyguards	1
+Cross-border shelling between the two nations continued for the third consecutive day	1
+Public transit ridership increased following improvements to the subway system	0
+Armed bandits attacked a refugee camp displacing thousands of people	1
+The guerrilla fighters ambushed a supply convoy on the main highway	1
+The bakery chain announced plans to expand into twelve new locations	0
+The city hosted a successful international food and wine festival	0
+The university announced a new scholarship program for students in engineering	0
+A new study found that regular exercise significantly reduces heart disease risk	0
+The national football team secured a convincing victory in the qualifying match	0
+The rebel forces captured a strategic town after weeks of intense battles	1
+The solar energy project is expected to power thousands of homes by year end	0
+Insurgents attacked a police station in the capital overnight leaving several officers wounded	1
+Military helicopters were deployed to support ground troops fighting in the eastern region	1
+The opposition forces breached the defensive perimeter around the government compound	1
+A mortar attack on the military base resulted in significant casualties	1
+The technology company unveiled its latest smartphone with improved camera capabilities	0
+Government aircraft bombed suspected rebel strongholds in the mountainous region	1
+Security forces conducted raids targeting suspected members of the armed opposition	1
+A suicide bomber detonated explosives at a crowded marketplace injuring dozens of civilians	1
+Local farmers reported an excellent harvest this season due to favorable weather	0
+The agricultural ministry launched a program to support organic farming	0
+An airstrike destroyed a weapons depot used by the insurgent group	1
+The orchestra performed a sold-out concert of works by contemporary composers	0
+The opposing forces exchanged heavy gunfire throughout the night	1
+The gaming company released a new title that quickly became a bestseller	0
+The swimming team broke the national record at the regional championships	0
+The government declared a state of emergency following widespread political violence	1
+The city council approved plans for a new public park in the downtown area	0
+Government forces launched an offensive against rebel positions in the northern province early this morning	1
+Researchers published findings on a promising treatment for a rare disorder	0
+A popular streaming service announced an original series based on the classic novel	0
+A drone strike targeted a meeting of senior militant commanders	1
+The winter ski season opened early due to heavy snowfall in the mountains	0
+The museum opened a new exhibition showcasing contemporary sculpture and painting	0
+The separatist movement launched coordinated attacks on government installations	1
+An improvised explosive device was found near the parliament building	1
+The film festival announced its lineup featuring works from emerging directors	0
+Temperatures are expected to reach record highs this weekend according to forecasters	0
+Scientists discovered a new species of deep-sea fish in the Pacific Ocean	0
+Two soldiers were killed when their vehicle struck a landmine on a rural road	1
+The cycling tour attracted international competitors to the coastal route	0
+The pharmaceutical company received approval for a new vaccine formulation	0
+A grenade attack on a busy intersection killed four people and wounded many more	1
+Ethnic tensions erupted into open violence as rival communities clashed in the market	1
+The hospital inaugurated a state-of-the-art wing dedicated to pediatric care	0
+The cookbook featuring traditional regional recipes became an unexpected bestseller	0
+Heavy fighting broke out between rival armed factions in the disputed border region	1
+The automotive company revealed plans to launch three new electric vehicle models	0
+Armed men attacked a village killing several residents and burning homes	1
+The annual science fair showcased innovative projects by high school students	0
+A roadside bomb targeted a military patrol wounding three soldiers	1
+An explosion at a government building was attributed to opposition fighters	1
+Armed opposition forces shelled the outskirts of the capital city	1
+Fighting between government troops and rebels displaced thousands of families	1
+Coalition forces conducted a night raid capturing several high-value targets	1
+The militant group claimed responsibility for the ambush on a military convoy	1
+The armed group kidnapped aid workers operating in the conflict zone	1

examples/active_learning/seed.tsv ADDED Viewed

	@@ -0,0 +1,20 @@

+Astronomers observed a rare celestial event visible from the southern hemisphere	0
+Sniper fire killed two civilians in the besieged neighborhood	1
+A major software update was released improving performance and adding new features	0
+The airline announced new direct flights connecting the capital with European cities	0
+Security operations intensified after a series of bombings in the commercial district	1
+Protesters clashed violently with police during demonstrations against the military regime	1
+A popular author released the highly anticipated sequel to her bestselling novel	0
+Artillery shells struck residential areas as the conflict between the two sides intensified	1
+Archaeologists uncovered ancient pottery at a dig site near the monument	0
+The military junta deployed tanks to suppress the growing resistance movement	1
+The electric vehicle charging network expanded to cover all major highways	0
+The marathon attracted over twenty thousand runners from across the country	0
+The ongoing civil war has resulted in thousands of casualties and widespread destruction	1
+The dairy industry adopted new standards for sustainable milk production	0
+Paramilitary groups carried out targeted assassinations of political opponents	1
+A local nonprofit organized a community cleanup event at the riverside park	0
+The construction of the new high-speed rail line is ahead of schedule	0
+Stock markets rallied on news of stronger than expected economic growth	0
+The tech startup raised significant funding in its latest investment round	0
+A militia group took control of a key oil facility in the contested region	1

requirements.txt CHANGED Viewed

@@ -10,3 +10,5 @@ accelerate
 scikit-learn
 pandas
 plotly

 scikit-learn
 pandas
 plotly
+peft>=0.6
+bitsandbytes