McGill NLP Group

university

https://mcgill-nlp.github.io/

McGill_NLP

McGill-NLP

Activity Feed

AI & ML interests

computational linguistics, natural language processing

Recent Activity

xhluca updated a Space 3 days ago

McGill-NLP/retalk-relay

xhluca published a Space 3 days ago

McGill-NLP/retalk-relay

Jinny1208 updated a dataset 3 days ago

McGill-NLP/speech-translation-and-summarization

View all activity

Papers

Forecasting Downstream Performance of LLMs With Proxy Metrics

Structured Distillation of Web Agent Capabilities Enables Generalization

View all Papers

McGill-NLP 's collections 21

Tiered Language Models

McGill-NLP/TLM-180M

0.2B • Updated 8 days ago • 41
McGill-NLP/TLM-650M

0.6B • Updated 8 days ago • 14

A3: Agent-as-Annotators

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776)

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 23
McGill-NLP/A3-Qwen3.5-9B

Image-Text-to-Text • 9B • Updated Apr 16 • 379 • 6
McGill-NLP/A3-Qwen3.5-4B

Image-Text-to-Text • 5B • Updated Apr 16 • 99 • 2
McGill-NLP/A3-Qwen3.5-2B

Image-Text-to-Text • 3B • Updated Apr 16 • 32 • 2

LatentLens Contextual Embeddings

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens

McGill-NLP/contextual_embeddings-llama3.1-8b

Updated Feb 19
McGill-NLP/contextual_embeddings-gemma2-9b

Updated Feb 19
McGill-NLP/contextual_embeddings-qwen2.5-7b

Updated Feb 19
McGill-NLP/latentlens-qwen2vl-embeddings

Updated Feb 7

The Markovian Thinker

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

McGill-NLP/delethink-24k-1.5b

2B • Updated Oct 9, 2025 • 40 • 5
McGill-NLP/longcot-24k-1.5b

2B • Updated Oct 9, 2025 • 7 • 2
McGill-NLP/longcot-8k-1.5b

2B • Updated Oct 9, 2025 • 6 • 1
McGill-NLP/delethink-96k-base-1.5b

2B • Updated Oct 3, 2025 • 5 • 1

SSA-COMET

McGill-NLP/ssa-comet-qe

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-mtl

Translation • Updated Mar 11 • 3
McGill-NLP/ssa-comet-stl

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-qe-final

Translation • Updated Apr 27

AgentRewardBench

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 29
McGill-NLP/agent-reward-bench

Viewer • Updated Apr 21, 2025 • 1.41k • 6.28k • 4
Running

Agents

5

Agent Reward Bench Demo

💻

5

Explore agent trajectories and judgments in web benchmarks
Sleeping

Agents

3

Agent Reward Bench Leaderboard

🥇

3

Leaderboard for AgentRewardBench

SafeArena

McGill-NLP/safearena

Updated Apr 23, 2025 • 382 • 5
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
Running

Agents

3

Safearena Leaderboard

🏃

3

SafeArena Leaderboard

LLM2Vec

McGill-NLP/LLM2Vec-Meta-Llama-32-3B-Instruct-mntp-supervised

Updated Nov 15, 2025
McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Oct 8, 2024 • 352 • 5
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Apr 30, 2024 • 118k • 52
McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

Sentence Similarity • Updated Apr 11, 2024 • 409 • 13

AURORA

Repository: https://github.com/McGill-NLP/AURORA

McGill-NLP/AURORA

Viewer • Updated Jul 25, 2024 • 169k • 102 • 7
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 5 • 4
McGill-NLP/aurora-bench

Viewer • Updated Jul 9, 2024 • 400 • 7 • 2
Runtime error

Agents

5

AURORA

🌖

5

Statcan Dialogue Dataset & Models

mcgill-nlp.github.io/statcan-dialogue-dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Paper • 2304.01412 • Published Apr 3, 2023 • 2
McGill-NLP/statcan-dialogue-dataset

Preview • Updated May 24, 2024 • 4 • 7
McGill-NLP/dpr-statcan-conversation_encoder-title

Feature Extraction • 0.1B • Updated Jul 17, 2023 • 9
McGill-NLP/tapas-statcan-large-conversation_encoder-cell_tokens

Feature Extraction • Updated Apr 5, 2023 • 6

MLQuestions

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Paper • 2104.08801 • Published Apr 18, 2021 • 1
McGill-NLP/mlquestions

Updated Nov 11, 2021 • 193 • 3
McGill-NLP/bart-qg-mlquestions-backtraining

Updated Apr 8, 2022 • 10
McGill-NLP/bart-qg-mlquestions-selftraining

Updated Apr 12, 2022 • 8

AfriqueLLM

Best open African LLM

AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages

Paper • 2601.06395 • Published Jan 10 • 5
McGill-NLP/AfriqueQwen-14B

Text Generation • 15B • Updated 6 days ago • 2.28k • • 4
McGill-NLP/AfriqueQwen-8B

Text Generation • 8B • Updated 6 days ago • 1.56k • • 2
McGill-NLP/AfriqueQwen3.5-4B-50Langs

Text Generation • 5B • Updated 6 days ago • 206 • 5

LLM2Vec-Gen

Generative Embeddings from Large Language Models

McGill-NLP/llm2vec-gen-tulu

Viewer • Updated Mar 3 • 10.5M • 1.25k • 1
McGill-NLP/llm2vec-gen-tulu-w-hard-negative

Viewer • Updated Mar 2 • 3.22M • 521
McGill-NLP/llm2vec-gen-echo-rewritten-w-hard-negative

Viewer • Updated Mar 8 • 7.17M • 77
McGill-NLP/LLM2Vec-Gen-Llama32-1B

Sentence Similarity • Updated Apr 4 • 23 • 2

CRAG-MM-Diagnostics

McGill-NLP/crag-mm-diagnostics

Viewer • Updated Feb 5 • 1.15k • 88
McGill-NLP/crag-mm-qwen3_vl_embedding_2b_image-index

Updated Feb 16 • 7
McGill-NLP/crag-mm-vlm2vecv2.0_image-index

Updated Feb 16 • 7

INJONGO

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

McGill-NLP/AfroXLMR-large-76L-Injongo-intent

Text Classification • 0.6B • Updated May 25, 2025 • 5
McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Token Classification • 0.6B • Updated May 25, 2025 • 26
McGill-NLP/gemma-2-9b-it-Injongo-intent

Text Generation • 9B • Updated May 26, 2025 • 5
McGill-NLP/gemma-2-9b-it-Injongo-slot

Text Generation • 9B • Updated May 26, 2025 • 5

Unequal unlearning

Datasets used for the OLMo experiments in the "Not All Data are Unlearned Equally" paper https://arxiv.org/abs/2504.05058

McGill-NLP/country_capital_qa

Viewer • Updated Apr 16, 2025 • 1.39k • 14
McGill-NLP/book_author_qa

Viewer • Updated Apr 16, 2025 • 1.48k • 6
McGill-NLP/zsre_qa

Viewer • Updated Apr 16, 2025 • 1.52k • 28

Malicious-IR

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12, 2025 • 520 • 17 • 4
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
McGill-NLP/AdvBench-IR-Small-Wiki-100

Viewer • Updated May 29, 2025 • 50.9k • 9

CHASE

Generate challenging synthetic data to evaluate LLMs

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21, 2025 • 671 • 40
McGill-NLP/CHASE-Code

Viewer • Updated Feb 21, 2025 • 500 • 30
McGill-NLP/CHASE-Math

Viewer • Updated Feb 21, 2025 • 500 • 34
How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

WebLINX

https://mcgill-nlp.github.io/weblinx

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39
McGill-NLP/WebLINX

Viewer • Updated Dec 7, 2024 • 79.8k • 2.59k • 65
McGill-NLP/WebLINX-full

Updated Sep 21, 2025 • 16.9k • 7
McGill-NLP/weblinx-browsergym

Updated Dec 7, 2024 • 25.6k • 4

WebLINX Models

https://mcgill-nlp.github.io/weblinx

McGill-NLP/Llama-3-8B-Web

Text Generation • 8B • Updated Apr 26, 2024 • 317 • 215
McGill-NLP/MiniLM-L6-dmr

Sentence Similarity • Updated Feb 9, 2024 • 10 • 5
McGill-NLP/bge-small-dmr

Sentence Similarity • Updated Feb 9, 2024 • 4 • 1
McGill-NLP/gte-base-dmr

Sentence Similarity • Updated Feb 9, 2024 • 15 • 2

FaithDial

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Paper • 2204.10757 • Published Apr 22, 2022 • 2
McGill-NLP/FaithDial

Viewer • Updated Feb 5, 2023 • 32.3k • 205 • 18
McGill-NLP/roberta-large-faithcritic

Text Classification • Updated Jul 31, 2022 • 86 • 1

Tiered Language Models

McGill-NLP/TLM-180M

0.2B • Updated 8 days ago • 41
McGill-NLP/TLM-650M

0.6B • Updated 8 days ago • 14

AfriqueLLM

Best open African LLM

AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages

Paper • 2601.06395 • Published Jan 10 • 5
McGill-NLP/AfriqueQwen-14B

Text Generation • 15B • Updated 6 days ago • 2.28k • • 4
McGill-NLP/AfriqueQwen-8B

Text Generation • 8B • Updated 6 days ago • 1.56k • • 2
McGill-NLP/AfriqueQwen3.5-4B-50Langs

Text Generation • 5B • Updated 6 days ago • 206 • 5

A3: Agent-as-Annotators

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776)

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 23
McGill-NLP/A3-Qwen3.5-9B

Image-Text-to-Text • 9B • Updated Apr 16 • 379 • 6
McGill-NLP/A3-Qwen3.5-4B

Image-Text-to-Text • 5B • Updated Apr 16 • 99 • 2
McGill-NLP/A3-Qwen3.5-2B

Image-Text-to-Text • 3B • Updated Apr 16 • 32 • 2

LLM2Vec-Gen

Generative Embeddings from Large Language Models

McGill-NLP/llm2vec-gen-tulu

Viewer • Updated Mar 3 • 10.5M • 1.25k • 1
McGill-NLP/llm2vec-gen-tulu-w-hard-negative

Viewer • Updated Mar 2 • 3.22M • 521
McGill-NLP/llm2vec-gen-echo-rewritten-w-hard-negative

Viewer • Updated Mar 8 • 7.17M • 77
McGill-NLP/LLM2Vec-Gen-Llama32-1B

Sentence Similarity • Updated Apr 4 • 23 • 2

LatentLens Contextual Embeddings

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens

McGill-NLP/contextual_embeddings-llama3.1-8b

Updated Feb 19
McGill-NLP/contextual_embeddings-gemma2-9b

Updated Feb 19
McGill-NLP/contextual_embeddings-qwen2.5-7b

Updated Feb 19
McGill-NLP/latentlens-qwen2vl-embeddings

Updated Feb 7

CRAG-MM-Diagnostics

McGill-NLP/crag-mm-diagnostics

Viewer • Updated Feb 5 • 1.15k • 88
McGill-NLP/crag-mm-qwen3_vl_embedding_2b_image-index

Updated Feb 16 • 7
McGill-NLP/crag-mm-vlm2vecv2.0_image-index

Updated Feb 16 • 7

The Markovian Thinker

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

McGill-NLP/delethink-24k-1.5b

2B • Updated Oct 9, 2025 • 40 • 5
McGill-NLP/longcot-24k-1.5b

2B • Updated Oct 9, 2025 • 7 • 2
McGill-NLP/longcot-8k-1.5b

2B • Updated Oct 9, 2025 • 6 • 1
McGill-NLP/delethink-96k-base-1.5b

2B • Updated Oct 3, 2025 • 5 • 1

INJONGO

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

McGill-NLP/AfroXLMR-large-76L-Injongo-intent

Text Classification • 0.6B • Updated May 25, 2025 • 5
McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Token Classification • 0.6B • Updated May 25, 2025 • 26
McGill-NLP/gemma-2-9b-it-Injongo-intent

Text Generation • 9B • Updated May 26, 2025 • 5
McGill-NLP/gemma-2-9b-it-Injongo-slot

Text Generation • 9B • Updated May 26, 2025 • 5

SSA-COMET

McGill-NLP/ssa-comet-qe

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-mtl

Translation • Updated Mar 11 • 3
McGill-NLP/ssa-comet-stl

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-qe-final

Translation • Updated Apr 27

Unequal unlearning

Datasets used for the OLMo experiments in the "Not All Data are Unlearned Equally" paper https://arxiv.org/abs/2504.05058

McGill-NLP/country_capital_qa

Viewer • Updated Apr 16, 2025 • 1.39k • 14
McGill-NLP/book_author_qa

Viewer • Updated Apr 16, 2025 • 1.48k • 6
McGill-NLP/zsre_qa

Viewer • Updated Apr 16, 2025 • 1.52k • 28

AgentRewardBench

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 29
McGill-NLP/agent-reward-bench

Viewer • Updated Apr 21, 2025 • 1.41k • 6.28k • 4
Running

Agents

5

Agent Reward Bench Demo

💻

5

Explore agent trajectories and judgments in web benchmarks
Sleeping

Agents

3

Agent Reward Bench Leaderboard

🥇

3

Leaderboard for AgentRewardBench

Malicious-IR

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12, 2025 • 520 • 17 • 4
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
McGill-NLP/AdvBench-IR-Small-Wiki-100

Viewer • Updated May 29, 2025 • 50.9k • 9

SafeArena

McGill-NLP/safearena

Updated Apr 23, 2025 • 382 • 5
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
Running

Agents

3

Safearena Leaderboard

🏃

3

SafeArena Leaderboard

CHASE

Generate challenging synthetic data to evaluate LLMs

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21, 2025 • 671 • 40
McGill-NLP/CHASE-Code

Viewer • Updated Feb 21, 2025 • 500 • 30
McGill-NLP/CHASE-Math

Viewer • Updated Feb 21, 2025 • 500 • 34
How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

LLM2Vec

McGill-NLP/LLM2Vec-Meta-Llama-32-3B-Instruct-mntp-supervised

Updated Nov 15, 2025
McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Oct 8, 2024 • 352 • 5
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Apr 30, 2024 • 118k • 52
McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

Sentence Similarity • Updated Apr 11, 2024 • 409 • 13

WebLINX

https://mcgill-nlp.github.io/weblinx

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39
McGill-NLP/WebLINX

Viewer • Updated Dec 7, 2024 • 79.8k • 2.59k • 65
McGill-NLP/WebLINX-full

Updated Sep 21, 2025 • 16.9k • 7
McGill-NLP/weblinx-browsergym

Updated Dec 7, 2024 • 25.6k • 4

AURORA

Repository: https://github.com/McGill-NLP/AURORA

McGill-NLP/AURORA

Viewer • Updated Jul 25, 2024 • 169k • 102 • 7
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 5 • 4
McGill-NLP/aurora-bench

Viewer • Updated Jul 9, 2024 • 400 • 7 • 2
Runtime error

Agents

5

AURORA

🌖

5

WebLINX Models

https://mcgill-nlp.github.io/weblinx

McGill-NLP/Llama-3-8B-Web

Text Generation • 8B • Updated Apr 26, 2024 • 317 • 215
McGill-NLP/MiniLM-L6-dmr

Sentence Similarity • Updated Feb 9, 2024 • 10 • 5
McGill-NLP/bge-small-dmr

Sentence Similarity • Updated Feb 9, 2024 • 4 • 1
McGill-NLP/gte-base-dmr

Sentence Similarity • Updated Feb 9, 2024 • 15 • 2

Statcan Dialogue Dataset & Models

mcgill-nlp.github.io/statcan-dialogue-dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Paper • 2304.01412 • Published Apr 3, 2023 • 2
McGill-NLP/statcan-dialogue-dataset

Preview • Updated May 24, 2024 • 4 • 7
McGill-NLP/dpr-statcan-conversation_encoder-title

Feature Extraction • 0.1B • Updated Jul 17, 2023 • 9
McGill-NLP/tapas-statcan-large-conversation_encoder-cell_tokens

Feature Extraction • Updated Apr 5, 2023 • 6

FaithDial

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Paper • 2204.10757 • Published Apr 22, 2022 • 2
McGill-NLP/FaithDial

Viewer • Updated Feb 5, 2023 • 32.3k • 205 • 18
McGill-NLP/roberta-large-faithcritic

Text Classification • Updated Jul 31, 2022 • 86 • 1

MLQuestions

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Paper • 2104.08801 • Published Apr 18, 2021 • 1
McGill-NLP/mlquestions

Updated Nov 11, 2021 • 193 • 3
McGill-NLP/bart-qg-mlquestions-backtraining

Updated Apr 8, 2022 • 10
McGill-NLP/bart-qg-mlquestions-selftraining

Updated Apr 12, 2022 • 8

AI & ML interests

Recent Activity

Papers

Team members 51

McGill-NLP 's collections 21

Agent Reward Bench Demo

Agent Reward Bench Leaderboard

Safearena Leaderboard

AURORA

Agent Reward Bench Demo

Agent Reward Bench Leaderboard

Safearena Leaderboard

AURORA