Tiep Claude Opus 4.6 commited on 29 days ago

Commit

ef06968

1 Parent(s): 1e215f7

Add references folder and research skills

- Add 12 research papers (PhoBERT, ViSoBERT, vELECTRA, SMTCE, RoBERTa, XLM-R, UIT-VSMEC, UIT-VSFC, etc.) with PDF, LaTeX source, and Markdown
- Add research synthesis: literature review, benchmark comparison, SOTA summary, bibliography
- Add references website (index.html) with Vellum-style leaderboard, paper database, benchmarks, citation network, and research methodology guide
- Add paper management tools (paper_db.py, fetch_papers.py, paper_db.json)
- Add Claude Code skills for paper-fetch, paper-research, paper-review, paper-write
- Configure git-lfs for binary files (PDF, PNG, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.claude/skills/paper-fetch/SKILL.md +1010 -0
.claude/skills/paper-research/SKILL.md +502 -0
.claude/skills/paper-review/SKILL.md +275 -0
.claude/skills/paper-write/SKILL.md +402 -0
.gitattributes +6 -0
references/2007.rivf.hoang/paper.md +39 -0
references/2018.kse.nguyen/paper.md +32 -0
references/2019.arxiv.conneau/paper.md +35 -0
references/2019.arxiv.conneau/paper.pdf +3 -0
references/2019.arxiv.conneau/paper.tex +45 -0
references/2019.arxiv.conneau/source/XLMR Paper/acl2020.bib +739 -0
references/2019.arxiv.conneau/source/XLMR Paper/acl2020.sty +560 -0
references/2019.arxiv.conneau/source/XLMR Paper/acl_natbib.bst +1975 -0
references/2019.arxiv.conneau/source/XLMR Paper/appendix.tex +45 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/batchsize.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/capacity.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/datasize.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/dilution.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/langsampling.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/tables.tex +398 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/vocabsize.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/content/wikicc.pdf +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/texput.log +21 -0
references/2019.arxiv.conneau/source/XLMR Paper/xlmr.bbl +285 -0
references/2019.arxiv.conneau/source/XLMR Paper/xlmr.synctex +3 -0
references/2019.arxiv.conneau/source/XLMR Paper/xlmr.tex +307 -0
references/2019.arxiv.conneau/source/acl2020.bib +739 -0
references/2019.arxiv.conneau/source/acl2020.sty +560 -0
references/2019.arxiv.conneau/source/acl_natbib.bst +1975 -0
references/2019.arxiv.conneau/source/appendix.tex +45 -0
references/2019.arxiv.conneau/source/content/batchsize.pdf +3 -0
references/2019.arxiv.conneau/source/content/capacity.pdf +3 -0
references/2019.arxiv.conneau/source/content/datasize.pdf +3 -0
references/2019.arxiv.conneau/source/content/dilution.pdf +3 -0
references/2019.arxiv.conneau/source/content/langsampling.pdf +3 -0
references/2019.arxiv.conneau/source/content/tables.tex +398 -0
references/2019.arxiv.conneau/source/content/vocabsize.pdf +3 -0
references/2019.arxiv.conneau/source/content/wikicc.pdf +3 -0
references/2019.arxiv.conneau/source/texput.log +21 -0
references/2019.arxiv.conneau/source/xlmr.bbl +285 -0
references/2019.arxiv.conneau/source/xlmr.synctex +3 -0
references/2019.arxiv.conneau/source/xlmr.tex +307 -0
references/2019.arxiv.ho/paper.md +220 -0
references/2019.arxiv.ho/paper.pdf +3 -0
references/2019.arxiv.ho/paper.tex +239 -0
references/2019.arxiv.ho/source/bibliography.bib +289 -0
references/2019.arxiv.ho/source/images/DataProcessing.pdf +3 -0
references/2019.arxiv.ho/source/images/cnnmodel.pdf +3 -0
references/2019.arxiv.ho/source/images/con_matrix.png +3 -0
references/2019.arxiv.ho/source/images/confusion_matrix.pdf +3 -0

.claude/skills/paper-fetch/SKILL.md ADDED Viewed

	@@ -0,0 +1,1010 @@

+---
+name: paper-fetch
+description: Fetch research papers from arXiv, ACL Anthology, or Semantic Scholar. Extracts to both Markdown (for LLM/RAG) and LaTeX (for compilation) formats.
+argument-hint: "<paper-id-or-url>"
+---
+# Paper Fetcher
+Fetch research papers and store them in the `references/` folder with extracted text content.
+## Target
+**Arguments**: $ARGUMENTS
+The argument can be:
+- **arXiv ID**: `2301.10140`, `arxiv:2301.10140`, or full URL
+- **ACL Anthology ID**: `P19-1017`, `2023.acl-long.1`, or full URL
+- **Semantic Scholar ID**: `s2:649def34...` or search query
+- **DOI**: `10.18653/v1/P19-1017`
+## Output Structure
+Each paper gets its own folder named `year.<conference>.main-author`:
+```
+references/
+  2020.emnlp.nguyen/          # PhoBERT paper (has arXiv source)
+    paper.tex                 # Original LaTeX source from arXiv
+    paper.md                  # Generated from LaTeX (with YAML front matter)
+    paper.pdf                 # PDF for reference
+    source/                   # Full arXiv source files
+  2014.eacl.nguyen/           # RDRPOSTagger (no arXiv)
+    paper.tex                 # Generated from PDF
+    paper.md                  # Extracted from PDF (with YAML front matter)
+    paper.pdf
+```
+### paper.md Format
+Metadata stored in YAML front matter:
+```markdown
+---
+title: "PhoBERT: Pre-trained language models for Vietnamese"
+authors:
+  - "Dat Quoc Nguyen"
+  - "Anh Tuan Nguyen"
+year: 2020
+venue: "EMNLP Findings 2020"
+url: "https://aclanthology.org/2020.findings-emnlp.92/"
+arxiv: "2003.00744"
+---
+# Introduction
+...
+```
+### Priority Order (arXiv papers)
+1. **Download LaTeX source** from `arxiv.org/e-print/{id}` (tar.gz)
+2. **Generate paper.md** from LaTeX (higher quality than PDF extraction)
+3. **Download PDF** for reference
+### Fallback (non-arXiv papers)
+1. Download PDF
+2. Extract paper.md from PDF (pymupdf4llm)
+3. Generate paper.tex from Markdown
+### Folder Naming Convention
+Format: `{year}.{venue}.{first_author_lastname}`
+| Paper ID | Folder Name |
+|----------|-------------|
+| `2020.findings-emnlp.92` | `2020.emnlp.nguyen` |
+| `N18-5012` | `2018.naacl.vu` |
+| `E14-2005` | `2014.eacl.nguyen` |
+| `2301.10140` | `2023.arxiv.smith` |
+### Format Comparison
+| File | Source | Best For |
+|------|--------|----------|
+| `paper.tex` | arXiv e-print (original) or generated | Recompilation, precise formulas |
+| `paper.md` | Converted from LaTeX or PDF | LLM/RAG, quick reading, GitHub |
+| `source/` | Full arXiv source (if available) | Build, figures, bibliography |
+---
+## PDF to Markdown Extraction Methods
+### Comparison of Methods
+| Method | Table Quality | Speed | Dependencies | Best For |
+|--------|---------------|-------|--------------|----------|
+| **pymupdf4llm** | ★★★★☆ | Fast | pymupdf4llm | General papers, good tables |
+| **pdfplumber** | ★★★★★ | Medium | pdfplumber | Complex tables, accuracy |
+| **Marker** | ★★★★★ | Slow | marker-pdf, torch | Best quality, LaTeX formulas |
+| **MinerU** | ★★★★★ | Slow | magic-pdf, torch | Academic papers, LaTeX output |
+| **Nougat** | ★★★★★ | Slow | nougat-ocr, torch | arXiv papers, full LaTeX |
+| PyMuPDF (basic) | ★★☆☆☆ | Fast | pymupdf | Simple text only |
+| pdftotext | ★★☆☆☆ | Fast | poppler-utils | Basic extraction |
+---
+### Method 1: pymupdf4llm (Recommended)
+Best balance of quality and speed. Produces GitHub-compatible markdown with proper table formatting.
+```bash
+uv run --with pymupdf4llm python -c "
+import pymupdf4llm
+import pathlib
+pdf_path = 'references/{paper_id}/paper.pdf'
+md_path = 'references/{paper_id}/paper.md'
+# Extract with table support
+md_text = pymupdf4llm.to_markdown(pdf_path)
+pathlib.Path(md_path).write_text(md_text, encoding='utf-8')
+print(f'Extracted to: {md_path}')
+"
+```
+**Features:**
+- Automatic table detection and markdown formatting
+- Preserves document structure (headers, lists)
+- Fast processing
+- GitHub-compatible markdown output
+**Advanced options:**
+```python
+import pymupdf4llm
+# With page chunks and table info
+md_text = pymupdf4llm.to_markdown(
+    "paper.pdf",
+    page_chunks=True,      # Get per-page chunks
+    write_images=True,     # Extract images
+    image_path="images/",  # Image output folder
+)
+```
+---
+### Method 2: pdfplumber (Best for Complex Tables)
+Most accurate for papers with complex table structures.
+```bash
+uv run --with pdfplumber --with pandas python -c "
+import pdfplumber
+import pandas as pd
+pdf_path = 'references/{paper_id}/paper.pdf'
+md_path = 'references/{paper_id}/paper.md'
+output = []
+with pdfplumber.open(pdf_path) as pdf:
+    for i, page in enumerate(pdf.pages):
+        output.append(f'<!-- Page {i+1} -->\n')
+        # Extract text
+        text = page.extract_text() or ''
+        # Extract tables
+        tables = page.extract_tables()
+        if tables:
+            for j, table in enumerate(tables):
+                # Convert to markdown table
+                if table and len(table) > 0:
+                    df = pd.DataFrame(table[1:], columns=table[0])
+                    md_table = df.to_markdown(index=False)
+                    text += f'\n\n**Table {j+1}:**\n{md_table}\n'
+        output.append(text)
+        output.append('\n\n---\n\n')
+with open(md_path, 'w', encoding='utf-8') as f:
+    f.write(''.join(output))
+print(f'Extracted with tables to: {md_path}')
+"
+```
+**Features:**
+- Excellent table detection
+- Handles complex multi-row/column tables
+- Detailed control over extraction
+- Can extract individual table cells
+---
+### Method 3: Marker (Best Quality - Deep Learning)
+Uses deep learning for highest quality extraction. Best for papers with LaTeX, complex layouts.
+```bash
+# Install marker
+uv pip install marker-pdf
+# Convert PDF to markdown
+marker_single "references/{paper_id}/paper.pdf" "references/{paper_id}/" --output_format markdown
+```
+Or via Python:
+```bash
+uv run --with marker-pdf python -c "
+from marker.converters.pdf import PdfConverter
+from marker.models import create_model_dict
+from marker.output import text_from_rendered
+# Load models (first run downloads ~2GB)
+models = create_model_dict()
+converter = PdfConverter(artifact_dict=models)
+# Convert
+rendered = converter('references/{paper_id}/paper.pdf')
+text, _, images = text_from_rendered(rendered)
+with open('references/{paper_id}/paper.md', 'w') as f:
+    f.write(text)
+"
+```
+**Features:**
+- AI-based layout detection
+- Excellent table recognition
+- LaTeX formula extraction
+- Best for academic papers
+- Handles multi-column layouts
+**Note:** Requires GPU for best performance, downloads ~2GB models on first run.
+---
+### Method 4: Nougat (PDF to LaTeX - for arXiv papers)
+Best for converting academic papers to full LaTeX source.
+```bash
+# Install nougat
+uv pip install nougat-ocr
+# Convert PDF to LaTeX/Markdown with math
+nougat "references/{paper_id}/paper.pdf" -o "references/{paper_id}/" -m 0.1.0-base
+```
+Or via Python:
+```bash
+uv run --with nougat-ocr python -c "
+from nougat import NougatModel
+from nougat.utils.dataset import LaTeXDataset
+model = NougatModel.from_pretrained('facebook/nougat-base')
+model.eval()
+# Process PDF
+latex_output = model.predict('references/{paper_id}/paper.pdf')
+with open('references/{paper_id}/paper.tex', 'w') as f:
+    f.write(latex_output)
+"
+```
+**Features:**
+- Full LaTeX output with equations
+- Trained on arXiv papers
+- Best for math-heavy documents
+- Outputs compilable LaTeX
+---
+### Method 5: MinerU (Best for Academic Papers)
+Comprehensive tool for high-quality extraction with LaTeX formula support.
+```bash
+# Install MinerU
+pip install magic-pdf[full]
+# Convert PDF
+magic-pdf -p "references/{paper_id}/paper.pdf" -o "references/{paper_id}/"
+```
+**Features:**
+- High formula recognition rate
+- LaTeX-friendly output
+- Table structure preservation
+- Multi-format output (MD, JSON, LaTeX)
+---
+### Method 6: Hybrid Approach (Recommended for Academic Papers)
+Combine methods for best results:
+```python
+# /// script
+# requires-python = ">=3.9"
+# dependencies = ["pymupdf4llm>=0.0.10", "pdfplumber>=0.10.0", "pandas>=2.0.0"]
+# ///
+"""
+Hybrid PDF extraction: pymupdf4llm for text, pdfplumber for tables.
+"""
+import pymupdf4llm
+import pdfplumber
+import pandas as pd
+import re
+import sys
+import os
+def extract_tables_pdfplumber(pdf_path: str) -> dict:
+    """Extract tables using pdfplumber (more accurate)."""
+    tables_by_page = {}
+    with pdfplumber.open(pdf_path) as pdf:
+        for i, page in enumerate(pdf.pages):
+            tables = page.extract_tables()
+            if tables:
+                page_tables = []
+                for table in tables:
+                    if table and len(table) > 1:
+                        try:
+                            # Clean table data
+                            cleaned = [[str(cell).strip() if cell else '' for cell in row] for row in table]
+                            df = pd.DataFrame(cleaned[1:], columns=cleaned[0])
+                            md_table = df.to_markdown(index=False)
+                            page_tables.append(md_table)
+                        except Exception as e:
+                            print(f"Table error on page {i+1}: {e}")
+                if page_tables:
+                    tables_by_page[i] = page_tables
+    return tables_by_page
+def extract_hybrid(pdf_path: str, output_path: str):
+    """Hybrid extraction: pymupdf4llm + pdfplumber tables."""
+    # Get base markdown from pymupdf4llm
+    md_text = pymupdf4llm.to_markdown(pdf_path)
+    # Get accurate tables from pdfplumber
+    tables = extract_tables_pdfplumber(pdf_path)
+    # If pdfplumber found tables, we can append or replace
+    if tables:
+        md_text += "\n\n---\n\n## Extracted Tables (pdfplumber)\n\n"
+        for page_num, page_tables in sorted(tables.items()):
+            md_text += f"### Page {page_num + 1}\n\n"
+            for i, table in enumerate(page_tables):
+                md_text += f"**Table {i+1}:**\n\n{table}\n\n"
+    with open(output_path, 'w', encoding='utf-8') as f:
+        f.write(md_text)
+    print(f"Extracted to: {output_path}")
+    print(f"Found {sum(len(t) for t in tables.values())} tables")
+if __name__ == "__main__":
+    pdf_path = sys.argv[1]
+    output_path = sys.argv[2] if len(sys.argv) > 2 else pdf_path.replace('.pdf', '.md')
+    extract_hybrid(pdf_path, output_path)
+```
+---
+## Complete Fetch Script (Both Formats)
+```python
+# /// script
+# requires-python = ">=3.9"
+# dependencies = ["arxiv>=2.0.0", "requests>=2.28.0", "pymupdf4llm>=0.0.10"]
+# ///
+"""
+Fetch paper and extract to both Markdown and LaTeX formats.
+Folder naming: year.<venue>.main-author
+"""
+import arxiv
+import pymupdf4llm
+import requests
+import json
+import sys
+import os
+import re
+import unicodedata
+def normalize_name(name: str) -> str:
+    """Normalize author name to lowercase ASCII."""
+    # Get last name (last word)
+    parts = name.strip().split()
+    lastname = parts[-1] if parts else name
+    # Remove accents and convert to lowercase
+    normalized = unicodedata.normalize('NFD', lastname)
+    ascii_name = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
+    return ascii_name.lower()
+def get_folder_name(year: int, venue: str, first_author: str) -> str:
+    """Generate folder name: year.venue.author"""
+    # Normalize venue name
+    venue = venue.lower()
+    venue = re.sub(r'findings-', '', venue)  # findings-emnlp -> emnlp
+    venue = re.sub(r'-demos?', '', venue)    # naacl-demos -> naacl
+    venue = re.sub(r'-main', '', venue)      # acl-main -> acl
+    venue = re.sub(r'-long', '', venue)      # acl-long -> acl
+    venue = re.sub(r'-short', '', venue)
+    venue = re.sub(r'[^a-z0-9]', '', venue)  # Remove special chars
+    # Normalize author
+    author = normalize_name(first_author)
+    return f"{year}.{venue}.{author}"
+def convert_md_to_latex(md_text: str) -> str:
+    """Convert Markdown to basic LaTeX document."""
+    # LaTeX document header
+    latex = r"""\documentclass[11pt]{article}
+\usepackage[utf8]{inputenc}
+\usepackage{amsmath,amssymb}
+\usepackage{booktabs}
+\usepackage{hyperref}
+\usepackage{graphicx}
+\begin{document}
+"""
+    # Convert Markdown to LaTeX
+    content = md_text
+    # Headers
+    content = re.sub(r'^# (.+)$', r'\\section*{\1}', content, flags=re.MULTILINE)
+    content = re.sub(r'^## (.+)$', r'\\subsection*{\1}', content, flags=re.MULTILINE)
+    content = re.sub(r'^### (.+)$', r'\\subsubsection*{\1}', content, flags=re.MULTILINE)
+    # Bold and italic
+    content = re.sub(r'\*\*(.+?)\*\*', r'\\textbf{\1}', content)
+    content = re.sub(r'\*(.+?)\*', r'\\textit{\1}', content)
+    content = re.sub(r'_(.+?)_', r'\\textit{\1}', content)
+    # Bullet points
+    content = re.sub(r'^- (.+)$', r'\\item \1', content, flags=re.MULTILINE)
+    # Code blocks
+    content = re.sub(r'```\w*\n(.*?)\n```', r'\\begin{verbatim}\n\1\n\\end{verbatim}', content, flags=re.DOTALL)
+    # Inline code
+    content = re.sub(r'`([^`]+)`', r'\\texttt{\1}', content)
+    # Convert markdown tables to LaTeX (basic)
+    def convert_table(match):
+        lines = match.group(0).strip().split('\n')
+        if len(lines) < 2:
+            return match.group(0)
+        # Get header and determine columns
+        header = lines[0]
+        cols = header.count('|') - 1
+        if cols <= 0:
+            return match.group(0)
+        latex_table = "\\begin{table}[h]\n\\centering\n"
+        latex_table += "\\begin{tabular}{" + "l" * cols + "}\n\\toprule\n"
+        for i, line in enumerate(lines):
+            if '---' in line:
+                continue
+            cells = [c.strip() for c in line.split('|')[1:-1]]
+            latex_table += " & ".join(cells) + " \\\\\n"
+            if i == 0:
+                latex_table += "\\midrule\n"
+        latex_table += "\\bottomrule\n\\end{tabular}\n\\end{table}\n"
+        return latex_table
+    content = re.sub(r'\|.+\|[\s\S]*?\|.+\|', convert_table, content)
+    latex += content
+    latex += "\n\\end{document}\n"
+    return latex
+def convert_latex_to_md(tex_content: str) -> str:
+    """Convert LaTeX to Markdown for LLM/RAG use."""
+    md = tex_content
+    # Remove LaTeX preamble (everything before \begin{document})
+    doc_match = re.search(r'\\begin\{document\}', md)
+    if doc_match:
+        md = md[doc_match.end():]
+    md = re.sub(r'\\end\{document\}.*', '', md, flags=re.DOTALL)
+    # Remove comments
+    md = re.sub(r'%.*$', '', md, flags=re.MULTILINE)
+    # Convert sections
+    md = re.sub(r'\\section\*?\{([^}]+)\}', r'# \1', md)
+    md = re.sub(r'\\subsection\*?\{([^}]+)\}', r'## \1', md)
+    md = re.sub(r'\\subsubsection\*?\{([^}]+)\}', r'### \1', md)
+    md = re.sub(r'\\paragraph\*?\{([^}]+)\}', r'#### \1', md)
+    # Convert formatting
+    md = re.sub(r'\\textbf\{([^}]+)\}', r'**\1**', md)
+    md = re.sub(r'\\textit\{([^}]+)\}', r'*\1*', md)
+    md = re.sub(r'\\emph\{([^}]+)\}', r'*\1*', md)
+    md = re.sub(r'\\texttt\{([^}]+)\}', r'`\1`', md)
+    md = re.sub(r'\\underline\{([^}]+)\}', r'\1', md)
+    # Convert citations and references
+    md = re.sub(r'\\cite\{([^}]+)\}', r'[\1]', md)
+    md = re.sub(r'\\citep?\{([^}]+)\}', r'[\1]', md)
+    md = re.sub(r'\\citet?\{([^}]+)\}', r'[\1]', md)
+    md = re.sub(r'\\ref\{([^}]+)\}', r'[\1]', md)
+    md = re.sub(r'\\label\{[^}]+\}', '', md)
+    # Convert URLs
+    md = re.sub(r'\\url\{([^}]+)\}', r'\1', md)
+    md = re.sub(r'\\href\{([^}]+)\}\{([^}]+)\}', r'[\2](\1)', md)
+    # Convert lists
+    md = re.sub(r'\\begin\{itemize\}', '', md)
+    md = re.sub(r'\\end\{itemize\}', '', md)
+    md = re.sub(r'\\begin\{enumerate\}', '', md)
+    md = re.sub(r'\\end\{enumerate\}', '', md)
+    md = re.sub(r'\\item\s*', '- ', md)
+    # Convert math (keep as-is for LaTeX rendering)
+    md = re.sub(r'\$\$([^$]+)\$\$', r'\n$$\1$$\n', md)
+    md = re.sub(r'\\begin\{equation\*?\}(.*?)\\end\{equation\*?\}', r'\n$$\1$$\n', md, flags=re.DOTALL)
+    md = re.sub(r'\\begin\{align\*?\}(.*?)\\end\{align\*?\}', r'\n$$\1$$\n', md, flags=re.DOTALL)
+    # Convert tables (basic - keep structure)
+    def convert_table(match):
+        table_content = match.group(1)
+        rows = re.split(r'\\\\', table_content)
+        md_rows = []
+        for i, row in enumerate(rows):
+            cells = [c.strip() for c in re.split(r'&', row) if c.strip()]
+            if cells:
+                md_rows.append('| ' + ' | '.join(cells) + ' |')
+                if i == 0:
+                    md_rows.append('|' + '---|' * len(cells))
+        return '\n'.join(md_rows)
+    md = re.sub(r'\\begin\{tabular\}\{[^}]*\}(.*?)\\end\{tabular\}', convert_table, md, flags=re.DOTALL)
+    # Remove common LaTeX commands
+    md = re.sub(r'\\(small|large|Large|footnotesize|normalsize|tiny|huge)\b', '', md)
+    md = re.sub(r'\\(hline|toprule|midrule|bottomrule|cline\{[^}]*\})', '', md)
+    md = re.sub(r'\\(vspace|hspace|vskip|hskip)\{[^}]*\}', '', md)
+    md = re.sub(r'\\(centering|raggedright|raggedleft)\b', '', md)
+    md = re.sub(r'\\(newline|linebreak|pagebreak|newpage)\b', '\n', md)
+    md = re.sub(r'\\\\', '\n', md)
+    md = re.sub(r'\\[a-zA-Z]+\{[^}]*\}', '', md)  # Remove remaining commands with args
+    md = re.sub(r'\\[a-zA-Z]+\b', '', md)  # Remove remaining commands without args
+    # Clean up
+    md = re.sub(r'\n{3,}', '\n\n', md)
+    md = re.sub(r'^\s+', '', md, flags=re.MULTILINE)
+    return md.strip()
+def download_arxiv_source(arxiv_id: str, folder: str) -> str:
+    """Download LaTeX source from arXiv e-print. Returns tex content or None."""
+    import tarfile
+    import gzip
+    from io import BytesIO
+    source_url = f"https://arxiv.org/e-print/{arxiv_id}"
+    try:
+        response = requests.get(source_url, allow_redirects=True)
+        response.raise_for_status()
+        content = response.content
+        tex_content = None
+        # Try to extract as tar.gz
+        try:
+            with tarfile.open(fileobj=BytesIO(content), mode='r:gz') as tar:
+                tex_files = [m.name for m in tar.getmembers() if m.name.endswith('.tex')]
+                # Extract all files for reference
+                os.makedirs(f"{folder}/source", exist_ok=True)
+                tar.extractall(path=f"{folder}/source")
+                print(f"Extracted {len(tar.getmembers())} source files to {folder}/source/")
+                # Find main tex file
+                main_tex = None
+                for name in tex_files:
+                    if 'main' in name.lower():
+                        main_tex = name
+                        break
+                if not main_tex and tex_files:
+                    main_tex = tex_files[0]
+                if main_tex:
+                    with open(f"{folder}/source/{main_tex}", 'r', encoding='utf-8', errors='ignore') as f:
+                        tex_content = f.read()
+                    print(f"Main LaTeX: {main_tex}")
+            if tex_content:
+                return tex_content
+        except tarfile.TarError:
+            pass
+        # Try as plain gzipped tex file
+        try:
+            tex_content = gzip.decompress(content).decode('utf-8', errors='ignore')
+            if '\\documentclass' in tex_content or '\\begin{document}' in tex_content:
+                print(f"Extracted LaTeX source (gzip)")
+                return tex_content
+        except:
+            pass
+        # Try as plain tex file
+        try:
+            tex_content = content.decode('utf-8', errors='ignore')
+            if '\\documentclass' in tex_content or '\\begin{document}' in tex_content:
+                print(f"Extracted LaTeX source (plain)")
+                return tex_content
+        except:
+            pass
+        print(f"Could not extract LaTeX source from arXiv")
+        return None
+    except Exception as e:
+        print(f"Failed to download arXiv source: {e}")
+        return None
+def build_front_matter(title: str, authors: list, year: int, venue: str, url: str, arxiv_id: str = None) -> str:
+    """Build YAML front matter for paper.md"""
+    authors_yaml = '\n'.join(f'  - "{a}"' for a in authors)
+    fm = f'''---
+title: "{title}"
+authors:
+{authors_yaml}
+year: {year}
+venue: "{venue}"
+url: "{url}"'''
+    if arxiv_id:
+        fm += f'\narxiv: "{arxiv_id}"'
+    fm += '\n---\n\n'
+    return fm
+def fetch_arxiv(arxiv_id: str):
+    """Fetch paper from arXiv. Priority: LaTeX source -> generate MD from it."""
+    arxiv_id = re.sub(r'^(arxiv:|https?://arxiv\.org/(abs|pdf)/)', '', arxiv_id)
+    arxiv_id = arxiv_id.rstrip('.pdf').rstrip('/')
+    # Get paper metadata first
+    client = arxiv.Client()
+    paper = next(client.results(arxiv.Search(id_list=[arxiv_id])))
+    # Generate folder name: year.arxiv.author
+    year = paper.published.year
+    first_author = paper.authors[0].name if paper.authors else "unknown"
+    folder_name = get_folder_name(year, "arxiv", first_author)
+    folder = f"references/{folder_name}"
+    os.makedirs(folder, exist_ok=True)
+    print(f"Folder: {folder}")
+    # Build front matter
+    front_matter = build_front_matter(
+        title=paper.title,
+        authors=[a.name for a in paper.authors],
+        year=year,
+        venue="arXiv",
+        url=paper.entry_id,
+        arxiv_id=arxiv_id
+    )
+    # 1. Download LaTeX source first (priority)
+    tex_content = download_arxiv_source(arxiv_id, folder)
+    if tex_content:
+        # Save paper.tex
+        with open(f"{folder}/paper.tex", 'w', encoding='utf-8') as f:
+            f.write(tex_content)
+        print(f"Saved: paper.tex (original arXiv source)")
+        # Generate paper.md from LaTeX with front matter
+        md_text = convert_latex_to_md(tex_content)
+        with open(f"{folder}/paper.md", 'w', encoding='utf-8') as f:
+            f.write(front_matter + md_text)
+        print(f"Generated: paper.md (from LaTeX)")
+        has_source = True
+    else:
+        has_source = False
+    # 2. Download PDF (always, for reference)
+    pdf_path = f"{folder}/paper.pdf"
+    paper.download_pdf(filename=pdf_path)
+    print(f"Downloaded: paper.pdf")
+    # 3. If no LaTeX source, extract from PDF
+    if not has_source:
+        md_text = pymupdf4llm.to_markdown(pdf_path)
+        with open(f"{folder}/paper.md", 'w', encoding='utf-8') as f:
+            f.write(front_matter + md_text)
+        print(f"Extracted: paper.md (from PDF)")
+        tex_content = convert_md_to_latex(md_text)
+        with open(f"{folder}/paper.tex", 'w', encoding='utf-8') as f:
+            f.write(tex_content)
+        print(f"Generated: paper.tex (from PDF)")
+    return folder
+def parse_acl_id(paper_id: str) -> tuple:
+    """Parse ACL paper ID to extract year and venue."""
+    # New format: 2020.findings-emnlp.92, 2021.naacl-demos.1
+    new_match = re.match(r'^(\d{4})\.([a-z\-]+)\.(\d+)$', paper_id)
+    if new_match:
+        year = int(new_match.group(1))
+        venue = new_match.group(2)
+        return year, venue
+    # Old format: E14-2005, N18-5012, P19-1017
+    old_match = re.match(r'^([A-Z])(\d{2})-\d+$', paper_id)
+    if old_match:
+        prefix = old_match.group(1)
+        year_short = int(old_match.group(2))
+        year = 2000 + year_short if year_short < 50 else 1900 + year_short
+        venue_map = {
+            'P': 'acl', 'N': 'naacl', 'E': 'eacl', 'D': 'emnlp',
+            'C': 'coling', 'W': 'workshop', 'S': 'semeval', 'Q': 'tacl'
+        }
+        venue = venue_map.get(prefix, 'acl')
+        return year, venue
+    return None, paper_id
+def fetch_acl(paper_id: str):
+    """Fetch paper from ACL Anthology."""
+    paper_id = re.sub(r'^https?://aclanthology\.org/', '', paper_id)
+    paper_id = paper_id.rstrip('/').rstrip('.pdf')
+    # Get metadata from ACL Anthology BibTeX
+    bib_url = f"https://aclanthology.org/{paper_id}.bib"
+    title = ""
+    authors = []
+    booktitle = ""
+    try:
+        bib_response = requests.get(bib_url)
+        bib_text = bib_response.text
+        # Extract title
+        title_match = re.search(r'title\s*=\s*["{]([^"}]+)', bib_text)
+        title = title_match.group(1) if title_match else ""
+        # Extract all authors
+        author_match = re.search(r'author\s*=\s*["{]([^"}]+)', bib_text)
+        if author_match:
+            authors_str = author_match.group(1)
+            authors = [a.strip() for a in authors_str.split(' and ')]
+        # Extract booktitle/venue
+        booktitle_match = re.search(r'booktitle\s*=\s*["{]([^"}]+)', bib_text)
+        booktitle = booktitle_match.group(1) if booktitle_match else ""
+        # Extract year
+        year_match = re.search(r'year\s*=\s*["{]?(\d{4})', bib_text)
+        year = int(year_match.group(1)) if year_match else None
+    except:
+        authors = ["unknown"]
+        year = None
+    first_author = authors[0] if authors else "unknown"
+    # Parse venue from paper_id
+    parsed_year, venue = parse_acl_id(paper_id)
+    if year is None:
+        year = parsed_year or 2020
+    # Generate folder name
+    folder_name = get_folder_name(year, venue, first_author)
+    folder = f"references/{folder_name}"
+    os.makedirs(folder, exist_ok=True)
+    print(f"Folder: {folder}")
+    # Build front matter
+    front_matter = build_front_matter(
+        title=title,
+        authors=authors,
+        year=year,
+        venue=booktitle or venue.upper(),
+        url=f"https://aclanthology.org/{paper_id}"
+    )
+    pdf_url = f"https://aclanthology.org/{paper_id}.pdf"
+    pdf_path = f"{folder}/paper.pdf"
+    response = requests.get(pdf_url, allow_redirects=True)
+    response.raise_for_status()
+    with open(pdf_path, 'wb') as f:
+        f.write(response.content)
+    print(f"Downloaded PDF: {pdf_path}")
+    # Extract markdown from PDF
+    md_text = pymupdf4llm.to_markdown(pdf_path)
+    with open(f"{folder}/paper.md", 'w', encoding='utf-8') as f:
+        f.write(front_matter + md_text)
+    print(f"Extracted: paper.md")
+    # Generate LaTeX from markdown
+    tex_content = convert_md_to_latex(md_text)
+    with open(f"{folder}/paper.tex", 'w', encoding='utf-8') as f:
+        f.write(tex_content)
+    print(f"Generated: paper.tex")
+    return folder
+def fetch_semantic_scholar(query: str):
+    """Fetch paper via Semantic Scholar."""
+    if re.match(r'^[0-9a-f]{40}$', query.replace('s2:', '')):
+        paper_id = query.replace('s2:', '')
+    else:
+        url = "https://api.semanticscholar.org/graph/v1/paper/search"
+        params = {"query": query, "limit": 1, "fields": "paperId,title"}
+        response = requests.get(url, params=params)
+        data = response.json()
+        if not data.get('data'):
+            print(f"No papers found for: {query}")
+            return None
+        paper_id = data['data'][0]['paperId']
+        print(f"Found: {data['data'][0]['title']}")
+    url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}"
+    params = {"fields": "title,authors,abstract,year,openAccessPdf,externalIds,url,venue"}
+    response = requests.get(url, params=params)
+    response.raise_for_status()
+    data = response.json()
+    # Get year, venue, first author for folder name
+    year = data.get('year') or 2020
+    venue = data.get('venue') or 'unknown'
+    # Normalize venue - handle common patterns
+    if not venue or venue == 'unknown':
+        if 'ArXiv' in data.get('externalIds', {}):
+            venue = 'arxiv'
+        elif 'ACL' in data.get('externalIds', {}):
+            venue = 'acl'
+        else:
+            venue = 'paper'
+    authors = data.get('authors', [])
+    author_names = [a['name'] for a in authors]
+    first_author = author_names[0] if author_names else 'unknown'
+    # Generate folder name: year.venue.author
+    folder_name = get_folder_name(year, venue, first_author)
+    folder = f"references/{folder_name}"
+    os.makedirs(folder, exist_ok=True)
+    print(f"Folder: {folder}")
+    # Build front matter
+    front_matter = build_front_matter(
+        title=data.get('title', ''),
+        authors=author_names,
+        year=year,
+        venue=venue,
+        url=data.get('url', '')
+    )
+    pdf_info = data.get('openAccessPdf')
+    if pdf_info and pdf_info.get('url'):
+        pdf_url = pdf_info['url']
+        pdf_path = f"{folder}/paper.pdf"
+        pdf_response = requests.get(pdf_url, allow_redirects=True)
+        pdf_response.raise_for_status()
+        with open(pdf_path, 'wb') as f:
+            f.write(pdf_response.content)
+        print(f"Downloaded PDF: {pdf_path}")
+        # Extract markdown from PDF with front matter
+        md_text = pymupdf4llm.to_markdown(pdf_path)
+        with open(f"{folder}/paper.md", 'w', encoding='utf-8') as f:
+            f.write(front_matter + md_text)
+        print(f"Extracted: paper.md")
+        # Generate LaTeX
+        tex_content = convert_md_to_latex(md_text)
+        with open(f"{folder}/paper.tex", 'w', encoding='utf-8') as f:
+            f.write(tex_content)
+        print(f"Generated: paper.tex")
+    else:
+        print("No open access PDF available")
+        if 'ArXiv' in data.get('externalIds', {}):
+            return fetch_arxiv(data['externalIds']['ArXiv'])
+    return folder
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print("Usage: uv run fetch_paper.py <paper_id_or_url_or_query>")
+        sys.exit(1)
+    query = ' '.join(sys.argv[1:])
+    if re.match(r'^\d{4}\.\d{4,5}', query) or 'arxiv.org' in query or query.startswith('arxiv:'):
+        fetch_arxiv(query)
+    elif re.match(r'^[A-Z]\d{2}-\d{4}$', query) or re.match(r'^\d{4}\.[a-z]+-', query) or 'aclanthology.org' in query:
+        fetch_acl(query)
+    else:
+        fetch_semantic_scholar(query)
+```
+---
+## Quick Commands
+**Fetch and extract to both formats (MD + LaTeX):**
+```bash
+mkdir -p references/E14-2005 && \
+curl -L "https://aclanthology.org/E14-2005.pdf" -o references/E14-2005/paper.pdf && \
+uv run --with pymupdf4llm python << 'EOF'
+import pymupdf4llm
+import re
+folder = 'references/E14-2005'
+pdf_path = f'{folder}/paper.pdf'
+# Extract to Markdown
+md = pymupdf4llm.to_markdown(pdf_path)
+open(f'{folder}/paper.md', 'w').write(md)
+print(f'Created: {folder}/paper.md')
+# Convert to basic LaTeX
+tex = f"""\\documentclass{{article}}
+\\usepackage[utf8]{{inputenc}}
+\\usepackage{{amsmath,booktabs}}
+\\begin{{document}}
+{md}
+\\end{{document}}
+"""
+open(f'{folder}/paper.tex', 'w').write(tex)
+print(f'Created: {folder}/paper.tex')
+EOF
+```
+**Using pdfplumber for complex tables:**
+```bash
+uv run --with pdfplumber --with pandas python -c "
+import pdfplumber
+import pandas as pd
+with pdfplumber.open('references/E14-2005/paper.pdf') as pdf:
+    for page in pdf.pages:
+        for table in page.extract_tables():
+            if table:
+                print(pd.DataFrame(table[1:], columns=table[0]).to_markdown())
+"
+```
+---
+## Troubleshooting Table Extraction
+### Problem: Tables not detected
+**Solution 1:** Try different extraction strategy with pymupdf4llm:
+```python
+md = pymupdf4llm.to_markdown("paper.pdf", table_strategy="lines")  # or "text"
+```
+**Solution 2:** Use pdfplumber with custom settings:
+```python
+import pdfplumber
+with pdfplumber.open("paper.pdf") as pdf:
+    page = pdf.pages[0]
+    tables = page.extract_tables(table_settings={
+        "vertical_strategy": "text",
+        "horizontal_strategy": "text",
+    })
+```
+### Problem: Table columns misaligned
+**Solution:** Use pdfplumber's debug mode to visualize:
+```python
+import pdfplumber
+with pdfplumber.open("paper.pdf") as pdf:
+    page = pdf.pages[3]  # Page with table
+    im = page.to_image()
+    im.debug_tablefinder()
+    im.save("debug_table.png")
+```
+### Problem: Multi-column papers
+**Solution:** Use Marker for best results:
+```bash
+marker_single paper.pdf output/ --output_format markdown
+```
+---
+## Sources
+- [pymupdf4llm](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) - Best for markdown with tables
+- [pdfplumber](https://github.com/jsvine/pdfplumber) - Most accurate table extraction
+- [Marker](https://github.com/datalab-to/marker) - Deep learning PDF to markdown
+- [Camelot](https://camelot-py.readthedocs.io/) - Specialized table extraction
+- [PDF Parsing Comparison Study](https://arxiv.org/html/2410.09871v1) - Academic comparison

.claude/skills/paper-research/SKILL.md ADDED Viewed

	@@ -0,0 +1,502 @@

+---
+name: paper-research
+description: Research a topic systematically following ACL/NeurIPS/ICML best practices. Finds papers, builds citation networks, and synthesizes findings.
+argument-hint: "<research-topic>"
+dependencies:
+  - paper-fetch
+---
+# Systematic Research Skill
+Research a topic following best practices from ACL, NeurIPS, ICML, and systematic literature review (SLR) methodology.
+**Integrates with**: `/paper-fetch` - automatically fetch and store important papers for full-text analysis.
+## Target
+**Research Topic**: $ARGUMENTS
+If no topic specified, analyze the current project to identify relevant research topics.
+---
+## Research Methodology
+Follow the systematic approach based on PRISMA guidelines and AI conference best practices.
+### Phase 1: Define Research Scope
+#### 1.1 Formulate Research Questions
+Use the PICO/PICo framework adapted for CS/NLP:
+| Component | Description | Example |
+|-----------|-------------|---------|
+| **P**opulation | Task/Domain | Vietnamese POS tagging |
+| **I**ntervention | Method/Approach | CRF, Transformers, BERT |
+| **C**omparison | Baselines | Rule-based, HMM, BiLSTM |
+| **O**utcome | Metrics | Accuracy, F1, inference speed |
+**Template research questions:**
+- RQ1: What is the current state-of-the-art for [task]?
+- RQ2: What methods have been applied to [task]?
+- RQ3: What are the main challenges and open problems?
+- RQ4: What datasets and benchmarks exist?
+#### 1.2 Define Search Terms
+Create a comprehensive keyword list:
+```
+Primary terms: [main task] (e.g., "POS tagging", "part-of-speech")
+Method terms: [approaches] (e.g., "CRF", "neural", "transformer")
+Domain terms: [language/domain] (e.g., "Vietnamese", "low-resource")
+Synonyms: [alternatives] (e.g., "word tagging", "morphological analysis")
+```
+Build search queries using Boolean operators:
+```
+("POS tagging" OR "part-of-speech") AND ("Vietnamese" OR "low-resource") AND ("CRF" OR "neural" OR "BERT")
+```
+---
+### Phase 2: Search for Papers
+#### 2.1 Search Sources
+Search these sources in order of priority:
+| Source | Best For | URL |
+|--------|----------|-----|
+| **ACL Anthology** | NLP/CL papers | https://aclanthology.org |
+| **Semantic Scholar** | AI/ML papers, citations | https://semanticscholar.org |
+| **arXiv** | Preprints, latest work | https://arxiv.org |
+| **Google Scholar** | Broad coverage | https://scholar.google.com |
+| **DBLP** | CS bibliography | https://dblp.org |
+| **Papers With Code** | SOTA benchmarks | https://paperswithcode.com |
+#### 2.2 Search Commands
+**ACL Anthology:**
+```bash
+# Search via web
+curl "https://aclanthology.org/search/?q=vietnamese+pos+tagging"
+```
+**Semantic Scholar API:**
+```bash
+# Search papers
+curl "https://api.semanticscholar.org/graph/v1/paper/search?query=vietnamese+POS+tagging&limit=20&fields=title,year,citationCount,authors,abstract,openAccessPdf"
+# Get paper details with citations
+curl "https://api.semanticscholar.org/graph/v1/paper/{paper_id}?fields=title,abstract,citations,references"
+```
+**arXiv API:**
+```bash
+# Search arXiv
+curl "http://export.arxiv.org/api/query?search_query=all:vietnamese+pos+tagging&max_results=20"
+```
+**Papers With Code:**
+```bash
+# Check SOTA
+curl "https://paperswithcode.com/api/v1/search/?q=vietnamese+pos+tagging"
+```
+#### 2.3 Citation Network Exploration
+Use these strategies to find related work:
+1. **Backward search**: Check references of key papers
+2. **Forward search**: Find papers that cite key papers
+3. **Author search**: Find other papers by same authors
+4. **Similar papers**: Use Semantic Scholar's recommendations
+```bash
+# Get citations (papers that cite this paper)
+curl "https://api.semanticscholar.org/graph/v1/paper/{paper_id}/citations?fields=title,year,citationCount&limit=50"
+# Get references (papers this paper cites)
+curl "https://api.semanticscholar.org/graph/v1/paper/{paper_id}/references?fields=title,year,citationCount&limit=50"
+```
+#### 2.4 Discovery Tools
+Use these tools for visual exploration:
+| Tool | Purpose | URL |
+|------|---------|-----|
+| **Connected Papers** | Visual citation graph | https://connectedpapers.com |
+| **Research Rabbit** | Paper recommendations | https://researchrabbit.ai |
+| **Litmaps** | Citation mapping | https://litmaps.com |
+| **Elicit** | AI paper search | https://elicit.com |
+| **Inciteful** | Citation network | https://inciteful.xyz |
+---
+### Phase 3: Screen and Select Papers
+#### 3.1 Inclusion/Exclusion Criteria
+Define clear criteria:
+**Include:**
+- Published in peer-reviewed venue (ACL, EMNLP, NAACL, COLING, etc.)
+- Relevant to research questions
+- Published within timeframe (e.g., last 5-10 years)
+- English language
+**Exclude:**
+- Non-peer-reviewed (unless highly cited preprint)
+- Tangentially related
+- Superseded by newer work
+- Duplicate/extended versions (keep most comprehensive)
+#### 3.2 Screening Process
+1. **Title/Abstract screening**: Quick relevance check
+2. **Full-text screening**: Detailed relevance assessment
+3. **Quality assessment**: Methodological rigor
+Track using PRISMA flow:
+```
+Records identified: N
+Duplicates removed: N
+Records screened: N
+Records excluded: N
+Full-text assessed: N
+Studies included: N
+```
+---
+### Phase 3.5: Fetch Selected Papers
+After screening, use the **paper-fetch** skill to download important papers for full-text analysis.
+#### 3.5.1 Identify Papers to Fetch
+Prioritize fetching:
+1. **Seminal papers**: Highly cited foundational work
+2. **SOTA papers**: Current best-performing methods
+3. **Directly relevant**: Papers closest to your research
+4. **Methodology papers**: Detailed method descriptions needed
+#### 3.5.2 Fetch Papers Using paper-fetch Skill
+For each selected paper, invoke `/paper-fetch` with the paper ID:
+```bash
+# arXiv papers
+/paper-fetch 2301.10140
+/paper-fetch arxiv:1810.04805   # BERT paper
+# ACL Anthology papers
+/paper-fetch P19-1017
+/paper-fetch 2023.acl-long.1
+# Semantic Scholar (by title search)
+/paper-fetch "BERT: Pre-training of Deep Bidirectional Transformers"
+```
+#### 3.5.3 Batch Fetching
+For multiple papers, fetch in sequence:
+```bash
+# Create list of paper IDs to fetch
+PAPERS=(
+    "1810.04805"      # BERT
+    "2003.10555"      # PhoBERT
+    "P19-1017"        # Example ACL paper
+)
+# Fetch each paper
+for paper_id in "${PAPERS[@]}"; do
+    /paper-fetch $paper_id
+done
+```
+#### 3.5.4 Output Structure
+Each fetched paper creates:
+```
+references/
+  1810.04805/
+    paper.pdf         # Original PDF
+    paper.md          # Extracted text (for full-text search/analysis)
+    metadata.json     # Title, authors, abstract
+  2003.10555/
+    paper.pdf
+    paper.md
+    metadata.json
+  research_{topic}/   # Research synthesis (Phase 6)
+    README.md
+    papers.md
+    ...
+```
+#### 3.5.5 Use Fetched Papers
+After fetching, you can:
+- **Read full text**: Open `paper.md` for detailed analysis
+- **Search across papers**: Grep through all `paper.md` files
+- **Extract quotes**: Copy relevant sections with page references
+- **Verify claims**: Check original source for accuracy
+```bash
+# Search across all fetched papers
+grep -r "CRF" references/*/paper.md
+# Find specific methodology details
+grep -r "feature template" references/*/paper.md
+```
+---
+### Phase 4: Extract and Organize Information
+#### 4.1 Create Paper Database
+For each paper, extract:
+```markdown
+## Paper: [Title]
+- **Authors**: [Names]
+- **Venue**: [Conference/Journal] [Year]
+- **URL**: [Link]
+- **Citations**: [Count]
+### Summary
+[2-3 sentence summary]
+### Key Contributions
+1. [Contribution 1]
+2. [Contribution 2]
+### Methodology
+- **Approach**: [Method name/type]
+- **Dataset**: [Dataset used]
+- **Metrics**: [Evaluation metrics]
+### Results
+| Dataset | Metric | Score |
+|---------|--------|-------|
+| [Name]  | [Acc]  | [XX%] |
+### Strengths
+- [Strength 1]
+### Limitations
+- [Limitation 1]
+### Relevance to Our Work
+[How this paper relates to current project]
+```
+#### 4.2 Comparison Table
+Create a summary table:
+```markdown
+| Paper | Year | Method | Dataset | Accuracy | F1 | Key Innovation |
+|-------|------|--------|---------|----------|-----|----------------|
+| [1]   | 2023 | BERT   | UDD     | 97.2%    | 96.8| Fine-tuning    |
+| [2]   | 2022 | CRF    | VLSP    | 95.5%    | 94.1| Feature eng.   |
+```
+---
+### Phase 5: Synthesize Findings
+#### 5.1 Thematic Analysis
+Organize findings by themes (not chronologically):
+```markdown
+## Related Work Synthesis
+### Traditional Approaches
+- Rule-based methods: [Summary]
+- Statistical methods (HMM, CRF): [Summary]
+### Neural Approaches
+- RNN/LSTM-based: [Summary]
+- Transformer-based: [Summary]
+### Vietnamese-Specific Work
+- [Summary of Vietnamese NLP research]
+### Datasets and Benchmarks
+- [Available resources]
+### Open Challenges
+- [Remaining problems]
+```
+#### 5.2 Gap Analysis
+Identify what's missing:
+```markdown
+## Research Gaps
+1. **Methodological gaps**: [What methods haven't been tried?]
+2. **Data gaps**: [What data is missing?]
+3. **Evaluation gaps**: [What isn't being measured?]
+4. **Domain gaps**: [What domains lack coverage?]
+```
+#### 5.3 SOTA Summary
+```markdown
+## State-of-the-Art
+### Current Best Results
+| Task | Dataset | SOTA Model | Score | Paper |
+|------|---------|------------|-------|-------|
+| POS  | UDD     | PhoBERT    | 97.2% | [Ref] |
+### Trends
+- [Trend 1: e.g., "Shift from CRF to Transformers"]
+- [Trend 2: e.g., "Increasing use of pre-trained models"]
+```
+---
+### Phase 6: Document Research
+#### 6.1 Output Structure
+Save research to `references/` with fetched papers and synthesis:
+```
+references/
+  # Fetched papers (via /paper-fetch)
+  1810.04805/             # BERT paper
+    paper.pdf
+    paper.md              # Full text for analysis
+    metadata.json
+  2003.10555/             # PhoBERT paper
+    paper.pdf
+    paper.md
+    metadata.json
+  P19-1017/               # ACL paper
+    paper.pdf
+    paper.md
+    metadata.json
+  # Research synthesis (this skill)
+  research_vietnamese_pos/
+    README.md             # Research summary & findings
+    papers.md             # Paper database with notes
+    comparison.md         # Comparison tables
+    bibliography.bib      # BibTeX references
+    sota.md               # State-of-the-art summary
+```
+#### 6.2 Research Report Template
+```markdown
+# Literature Review: [Topic]
+**Date**: [YYYY-MM-DD]
+**Research Questions**: [RQs]
+## Executive Summary
+[1 paragraph overview]
+## Methodology
+- **Search sources**: [List]
+- **Search terms**: [Keywords]
+- **Timeframe**: [Date range]
+- **Inclusion criteria**: [Criteria]
+## PRISMA Flow
+- Records identified: N
+- Studies included: N
+## Findings
+### RQ1: [Question]
+[Answer with citations]
+### RQ2: [Question]
+[Answer with citations]
+## State-of-the-Art
+[Current best methods/results]
+## Research Gaps
+[Identified opportunities]
+## Recommendations
+[Suggested directions]
+## References
+[Bibliography]
+```
+---
+## Best Practices (ACL/NeurIPS/ICML)
+### DO:
+- **Explain differences**: "Related work should not just list prior work, but explain how the proposed work differs" (NeurIPS guidelines)
+- **Be comprehensive**: Cover all major approaches and methods
+- **Be fair**: Acknowledge contributions of prior work
+- **Be current**: Include recent work (but contemporaneous papers within 2 months are excused)
+- **Include proper citations**: Use DOIs or ACL Anthology links (ACL requirement)
+### DON'T:
+- Just list papers without synthesis
+- Ignore non-English language work
+- Miss seminal papers in the field
+- Cherry-pick only papers that support your position
+- Dismiss work as "obvious in retrospect"
+### Quality Checks:
+- [ ] All major approaches covered
+- [ ] Recent work (last 2-3 years) included
+- [ ] Seminal papers cited
+- [ ] Fair characterization of prior work
+- [ ] Clear connection to your work
+- [ ] Proper citation format
+---
+## Quick Reference: API Endpoints
+```bash
+# Semantic Scholar - Search
+curl "https://api.semanticscholar.org/graph/v1/paper/search?query=QUERY&limit=20&fields=title,year,authors,citationCount,abstract"
+# Semantic Scholar - Paper details
+curl "https://api.semanticscholar.org/graph/v1/paper/PAPER_ID?fields=title,abstract,citations,references,tldr"
+# Semantic Scholar - Author papers
+curl "https://api.semanticscholar.org/graph/v1/author/AUTHOR_ID/papers?fields=title,year,venue"
+# arXiv - Search
+curl "http://export.arxiv.org/api/query?search_query=QUERY&max_results=20"
+# DBLP - Search
+curl "https://dblp.org/search/publ/api?q=QUERY&format=json"
+```
+---
+## References
+Based on guidelines from:
+- [ACL Rolling Review Author Guidelines](https://aclrollingreview.org/authors)
+- [NeurIPS Reviewer Guidelines](https://neurips.cc/Conferences/2025/ReviewerGuidelines)
+- [ICML Paper Guidelines](https://icml.cc/Conferences/2024/PaperGuidelines)
+- [PRISMA Statement](https://www.prisma-statement.org/)
+- [How-to conduct a systematic literature review (CS)](https://www.sciencedirect.com/science/article/pii/S2215016122002746)
+- [Semantic Scholar API](https://api.semanticscholar.org/api-docs/)
+- [ACL Anthology](https://aclanthology.org)

.claude/skills/paper-review/SKILL.md ADDED Viewed

	@@ -0,0 +1,275 @@

+---
+name: paper-review
+description: Review research papers following ACL/EMNLP conference standards. Provides structured feedback with soundness, excitement, and overall assessment scores.
+argument-hint: "[file-path]"
+---
+# Academic Paper Review (ACL/EMNLP Standards)
+Review papers following ACL Rolling Review (ARR) guidelines and best practices from top NLP conferences.
+## Target File
+Review the file: $ARGUMENTS
+If no file specified, review `TECHNICAL_REPORT.md` in the project root.
+## Review Process
+### Step 1: Reading Strategy (Two-Pass Method)
+1. **First Pass (Skim)**: Read abstract, introduction, and conclusion first to understand research questions, scope, and claimed contributions
+2. **Second Pass (Deep Dive)**: Evaluate technical soundness, methodology, evidence quality, and reproducibility
+### Step 2: Research Relevant Papers
+Before completing the review, research the current state of the field to properly contextualize the contribution.
+#### 2.1 Identify Key Topics
+Extract from the paper:
+- Main task/problem (e.g., "Vietnamese POS tagging", "named entity recognition")
+- Methods used (e.g., "CRF", "transformer", "BERT-based")
+- Dataset/benchmark names
+- Baseline systems mentioned
+#### 2.2 Search for Related Work
+Use web search to find:
+1. **State-of-the-art results** on the same task/dataset:
+   - Search: "[task name] [dataset name] benchmark results [current year]"
+   - Search: "[task name] state of the art [current year]"
+2. **Competing approaches**:
+   - Search: "[task name] [alternative method] comparison"
+   - Search: "best [task name] models [current year]"
+3. **Prior work by same authors** (for context on research trajectory):
+   - Search author names + institution
+4. **Survey papers** for comprehensive background:
+   - Search: "[task name] survey" OR "[task name] review paper"
+5. **Datasets and benchmarks**:
+   - Search: "[dataset name] leaderboard" OR "[dataset name] benchmark"
+#### 2.3 Verify Claims
+Cross-check the paper's claims against found literature:
+- Are baseline comparisons fair and up-to-date?
+- Are cited SOTA numbers accurate?
+- Is related work coverage comprehensive?
+- Are there significant missing references?
+#### 2.4 Document Findings
+Record relevant papers found during research:
+```markdown
+## Related Work Research
+### Papers Found
+| Paper | Year | Method | Results | Relevance |
+|-------|------|--------|---------|-----------|
+| [Title] | [Year] | [Method] | [Key metric] | [Why relevant] |
+### Missing from Related Work
+- [Paper that should have been cited]
+### SOTA Verification
+- Claimed SOTA: [what paper claims]
+- Actual SOTA: [what you found]
+- Gap: [difference if any]
+```
+### Step 3: Write Review
+With both the paper content and research context, write the formal review following the ARR structure below.
+## ARR Review Form Structure
+Provide your review in the following structure:
+```markdown
+## Paper Summary
+[2-3 sentences describing what the paper is about, helping editors understand the topic]
+## Summary of Strengths
+Major reasons to publish this paper at a selective *ACL venue:
+1. [Strength 1 - be specific, reference sections/tables]
+2. [Strength 2]
+3. [Strength 3]
+## Summary of Weaknesses
+Numbered concerns that prevent prioritizing this work:
+1. [Weakness 1 - specific and actionable]
+2. [Weakness 2]
+3. [Weakness 3]
+## Scores
+### Soundness: [1-5]
+- 5: Excellent - No major issues, claims well-supported
+- 4: Good - Minor issues that don't affect main claims
+- 3: Acceptable - Some issues but core contributions valid
+- 2: Poor - Significant issues undermine key claims
+- 1: Major Issues - Not sufficiently thorough for publication
+### Excitement: [1-5]
+- 5: Highly Exciting - Would recommend to others, transformational
+- 4: Exciting - Important contribution to the field
+- 3: Moderately Exciting - Incremental but solid work
+- 2: Somewhat Boring - Limited novelty or impact
+- 1: Not Exciting - Routine work with minimal contribution
+### Overall Assessment: [1-5]
+- 5: Award consideration (top 2.5%)
+- 4: Strong accept - main conference
+- 3: Borderline - Findings track appropriate
+- 2: Resubmit next cycle - substantial revisions needed
+- 1: Do not resubmit
+### Reproducibility: [1-5]
+- 5: Could reproduce results exactly
+- 4: Could mostly reproduce, minor variation expected
+- 3: Partial reproduction possible
+- 2: Significant details missing
+- 1: Cannot reproduce
+### Confidence: [1-5]
+- 5: Expert - positive my evaluation is correct
+- 4: High - familiar with related work
+- 3: Medium - read related papers but not expert
+- 2: Low - educated guess
+- 1: Not my area
+## Detailed Comments
+### Technical Soundness
+[Evaluate methodology, experimental design, statistical validity]
+### Novelty and Contribution
+[Assess originality - don't dismiss work just because method is "simple" or results seem "obvious in retrospect"]
+### Clarity and Presentation
+[Focus on substance, not style - note if non-native English but don't penalize]
+### Reproducibility Assessment
+[Check for: dataset details, hyperparameters, code availability, training configuration]
+### Limitations and Ethics
+[Evaluate if authors adequately discuss limitations and potential negative impacts]
+## Related Work Research
+### Papers Found
+| Paper | Year | Method | Results | Relevance |
+|-------|------|--------|---------|-----------|
+| [Title] | [Year] | [Method] | [Key metric] | [Why relevant] |
+### Missing Citations
+[Important papers not cited that should be referenced]
+### SOTA Verification
+- **Claimed**: [what the paper claims as baseline/SOTA]
+- **Actual**: [current SOTA from your research]
+- **Assessment**: [whether claims are accurate]
+## Questions for Authors
+[Specific questions that could be addressed in author response - do NOT ask for new experiments]
+## Minor Issues
+[Typos, formatting, missing references - not grounds for rejection]
+## Suggestions for Improvement
+[Constructive, actionable recommendations to strengthen the work]
+```
+## Review Principles (ACL/EMNLP Best Practices)
+### DO:
+- **Be specific**: Reference particular sections, equations, tables, or line numbers
+- **Be constructive**: Suggest how to improve, not just what's wrong
+- **Be kind**: Write the review you would like to receive
+- **Justify scores**: Low soundness scores MUST cite specific technical flaws
+- **Consider diverse contributions**: Novel methodology, insightful analysis, new resources, theoretical advances
+- **Evaluate claimed contributions**: A paper only needs sufficient evidence for its stated claims
+### DO NOT:
+- Reject because results aren't SOTA (ask "state of which art?")
+- Dismiss work as "obvious in retrospect" without prior empirical validation
+- Demand experiments beyond the paper's stated scope
+- Criticize for not using deep learning (method diversity is valuable)
+- Reject resource papers (datasets are as important as models)
+- Reject work on non-English languages
+- Penalize for simple methods (often most cited papers use simple methods)
+- Use sarcasm or dismissive language
+- Generate AI-written review content (violates ACL policy)
+### Common Review Problems to Avoid (ARR Issue Codes):
+- **I1**: Lack of specificity - vague criticisms without examples
+- **I2**: Score-content misalignment - low scores without stated flaws
+- **I3**: Unprofessional tone - harsh or dismissive language
+- **I4**: Demanding out-of-scope work
+- **I5**: Ignoring author responses without explanation
+## Evaluation Checklist
+### Methodology
+- [ ] Research questions clearly stated
+- [ ] Methods appropriate for research questions
+- [ ] Baselines appropriate and fairly compared
+- [ ] Statistical significance properly addressed
+- [ ] Limitations of approach acknowledged
+### Experiments
+- [ ] Datasets properly described (source, size, splits, preprocessing)
+- [ ] Evaluation metrics appropriate for the task
+- [ ] Training details sufficient for reproduction
+- [ ] Ablation studies or analysis provided
+- [ ] Results support the claims made
+### Presentation
+- [ ] Abstract accurately summarizes contributions
+- [ ] Introduction motivates the problem
+- [ ] Related work comprehensive and fair
+- [ ] Figures/tables readable and informative
+- [ ] Conclusion matches actual contributions
+### Related Work Verification (from Step 2 Research)
+- [ ] Key prior work on same task is cited
+- [ ] Baseline comparisons use current methods
+- [ ] SOTA claims are accurate and up-to-date
+- [ ] No significant missing references
+- [ ] Fair characterization of competing approaches
+### Responsible NLP
+- [ ] Limitations section present and substantive
+- [ ] Potential negative impacts discussed
+- [ ] Data collection ethics addressed (if applicable)
+- [ ] Bias considerations mentioned (if applicable)
+## Score Calibration Guidelines
+**Soundness vs Excitement**: These are orthogonal. A paper can be:
+- High soundness, low excitement: Solid but incremental
+- Low soundness, high excitement: Interesting idea but flawed execution
+- Both should be reflected independently
+**Overall Assessment**: Consider:
+- Does the paper advance the field?
+- Would the NLP community benefit from this work?
+- Are the claimed contributions adequately supported?
+## References
+Based on guidelines from:
+- [ACL Rolling Review Reviewer Guidelines](https://aclrollingreview.org/reviewerguidelines)
+- [ARR Review Form](https://aclrollingreview.org/reviewform)
+- [EMNLP 2020: How to Write Good Reviews](https://2020.emnlp.org/blog/2020-05-17-write-good-reviews/)
+- [ACL 2023 Review Process](https://2023.aclweb.org/blog/review-basics/)

.claude/skills/paper-write/SKILL.md ADDED Viewed

	@@ -0,0 +1,402 @@

+---
+name: paper-write
+description: Write or improve technical reports following ACL/EMNLP conference standards. Generates publication-ready sections with proper structure and formatting.
+argument-hint: "[section] or [output-file]"
+---
+# Technical Paper Writing (ACL/EMNLP Standards)
+Write or improve technical papers following ACL Rolling Review guidelines and best practices from top NLP conferences.
+## Target
+**Arguments**: $ARGUMENTS
+- If argument is a section name (abstract, introduction, methodology, experiments, related-work, conclusion, limitations), generate that section
+- If argument is a file path, write the complete paper to that file
+- If no argument, analyze the project and generate a complete TECHNICAL_REPORT.md
+## Writing Process
+### Step 1: Project Analysis
+Before writing, analyze the codebase to understand:
+1. **Core Contribution**: What is the main technical contribution?
+2. **Methodology**: What algorithms/models/approaches are used?
+3. **Data**: What datasets are used for training/evaluation?
+4. **Results**: What are the key metrics and findings?
+5. **Implementation**: What are the technical details (hyperparameters, architecture)?
+Search for:
+- README.md, CLAUDE.md for project overview
+- Training scripts for methodology details
+- Evaluation scripts for metrics
+- Config files for hyperparameters
+- Model files for architecture
+### Step 2: Research Context
+Use web search to contextualize the contribution:
+1. **State-of-the-art**: Search "[task] state of the art [year]"
+2. **Benchmarks**: Search "[dataset] benchmark leaderboard"
+3. **Related methods**: Search "[method] [task] comparison"
+4. **Prior work**: Search key paper titles for citations
+### Step 3: Write Paper
+Follow the ACL paper structure below.
+---
+## ACL Paper Structure
+### 1. Title
+- Concise and informative (max 12 words)
+- Include: task, method, language/domain if specific
+- Avoid: "Novel", "New", "Improved" without substance
+**Good examples**:
+- "Vietnamese POS Tagging with Conditional Random Fields"
+- "BERT-based Named Entity Recognition for Legal Documents"
+### 2. Abstract (max 200 words)
+Structure in 4-5 sentences:
+```
+[Problem/Motivation] [Task] remains challenging due to [specific challenges].
+[Approach] We present [method name], a [brief description of approach].
+[Key Innovation] Our method [key differentiator from prior work].
+[Results] Experiments on [dataset] show [main result with number].
+[Impact/Availability] [Code/model availability statement].
+```
+**Tips**:
+- Be specific with numbers: "achieves 95.89% accuracy" not "achieves high accuracy"
+- Avoid vague claims: "outperforms baselines" → "outperforms VnCoreNLP by 2.1%"
+- Include reproducibility info if possible
+### 3. Introduction (1-1.5 pages)
+**Paragraph 1: Problem & Motivation**
+- What is the task? Why is it important?
+- What are the real-world applications?
+**Paragraph 2: Challenges**
+- What makes this problem difficult?
+- What are the specific challenges for this language/domain?
+**Paragraph 3: Existing Approaches & Limitations**
+- What methods have been tried?
+- What are their limitations?
+**Paragraph 4: Our Approach**
+- What is your method?
+- How does it address the limitations?
+**Paragraph 5: Contributions** (bulleted list, max 3)
+```markdown
+Our main contributions are:
+- We propose [method] for [task], achieving [result]
+- We release [dataset/model/code] for [purpose]
+- We provide [analysis/insights] showing [finding]
+```
+**Paragraph 6: Paper Organization** (optional)
+- Brief roadmap of remaining sections
+### 4. Related Work (0.5-1 page)
+Organize by themes, not chronologically:
+```markdown
+## Related Work
+### [Theme 1: e.g., "Traditional Approaches"]
+[Discussion of rule-based, statistical methods...]
+### [Theme 2: e.g., "Neural Methods"]
+[Discussion of deep learning approaches...]
+### [Theme 3: e.g., "Vietnamese NLP"]
+[Discussion of language-specific work...]
+```
+**Tips**:
+- Cite 15-30 papers for a full paper
+- Be fair to prior work - acknowledge their contributions
+- Clearly state how your work differs
+- Use ACL Anthology for citations when available
+### 5. Methodology (1.5-2 pages)
+**5.1 Problem Formulation**
+- Formal definition with mathematical notation
+- Input/output specification
+**5.2 Model Architecture**
+- High-level overview (with figure if helpful)
+- Detailed description of each component
+**5.3 Feature Engineering** (if applicable)
+- List all features with clear notation
+- Justify feature choices
+**5.4 Training**
+- Loss function
+- Optimization algorithm
+- Hyperparameters (in table format)
+```markdown
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| Learning rate | 0.001 | Adam optimizer |
+| Batch size | 32 | Training batch |
+| Epochs | 100 | Maximum iterations |
+```
+### 6. Experimental Setup (0.5-1 page)
+**6.1 Datasets**
+```markdown
+| Dataset | Train | Dev | Test | Domain |
+|---------|-------|-----|------|--------|
+| [Name] | [N] | [N] | [N] | [Domain] |
+```
+Include:
+- Source and citation
+- Preprocessing steps
+- Train/dev/test split rationale
+**6.2 Baselines**
+- List all baseline systems with citations
+- Brief description of each
+**6.3 Evaluation Metrics**
+- Define each metric mathematically
+- Justify metric choices for the task
+**6.4 Implementation Details**
+- Framework/library versions
+- Hardware used
+- Random seeds for reproducibility
+### 7. Results (1-1.5 pages)
+**7.1 Main Results**
+Present main comparison table:
+```markdown
+| Model | Accuracy | Precision | Recall | F1 |
+|-------|----------|-----------|--------|-----|
+| Baseline 1 | X.XX | X.XX | X.XX | X.XX |
+| Baseline 2 | X.XX | X.XX | X.XX | X.XX |
+| **Ours** | **X.XX** | **X.XX** | **X.XX** | **X.XX** |
+```
+**7.2 Analysis**
+- Why does your method work?
+- Per-class/category breakdown
+- Statistical significance (p-values if applicable)
+**7.3 Ablation Study**
+- What happens when you remove components?
+- Which features/components contribute most?
+**7.4 Error Analysis**
+- Common error patterns
+- Failure cases with examples
+- Linguistic analysis if applicable
+### 8. Discussion (optional, 0.5 page)
+- Broader implications of findings
+- Comparison with concurrent work
+- Unexpected observations
+### 9. Conclusion (0.5 page)
+**Paragraph 1: Summary**
+- Restate the problem and your approach
+- Highlight main results
+**Paragraph 2: Limitations** (can be separate section)
+- Honest assessment of limitations
+- What doesn't your method handle well?
+**Paragraph 3: Future Work**
+- 2-3 concrete directions for future research
+### 10. Limitations Section (REQUIRED)
+ACL requires a dedicated "Limitations" section. Include:
+- Data limitations (domain, size, annotation quality)
+- Method limitations (assumptions, failure cases)
+- Evaluation limitations (metrics, benchmarks)
+- Scope limitations (languages, tasks)
+```markdown
+## Limitations
+This work has several limitations:
+1. **Data**: Our model is trained on [domain] data and may not generalize to [other domains].
+2. **Method**: [Specific limitation of the approach].
+3. **Evaluation**: We evaluate only on [dataset]; performance on other benchmarks is unknown.
+4. **Scope**: Our work focuses on [language/task]; extension to [other scenarios] requires further investigation.
+```
+### 11. Ethics Statement (if applicable)
+Address:
+- Data collection ethics
+- Potential misuse
+- Bias considerations
+- Environmental impact (for large models)
+---
+## Formatting Guidelines
+### Page Limits
+- **Long paper**: 8 pages content + unlimited references
+- **Short paper**: 4 pages content + unlimited references
+- Limitations, ethics, acknowledgments don't count
+### Required Elements
+- [ ] Title (15pt bold)
+- [ ] Abstract (max 200 words)
+- [ ] Sections numbered with Arabic numerals
+- [ ] Limitations section (after conclusion, before references)
+- [ ] References (unnumbered)
+### Figures & Tables
+- Number sequentially (Figure 1, Table 1)
+- Captions below figures, above tables (10pt)
+- Reference all figures/tables in text
+- Ensure grayscale readability
+### Citations
+- Use ACL Anthology when available
+- Format: (Author, Year) or Author (Year)
+- Include DOIs when available
+---
+## Writing Tips
+### DO:
+- **Be specific**: Use numbers, not vague claims
+- **Be honest**: Acknowledge limitations
+- **Be concise**: Every sentence should add value
+- **Be clear**: Define terms, explain notation
+- **Be fair**: Give credit to prior work
+### DON'T:
+- Oversell contributions
+- Hide negative results
+- Use excessive jargon
+- Make claims without evidence
+- Ignore reviewer guidelines
+### Common Mistakes to Avoid:
+1. Abstract that doesn't match paper content
+2. Introduction that's too long/detailed
+3. Related work that's just a list of papers
+4. Methodology without enough detail to reproduce
+5. Results without error analysis
+6. Missing or superficial limitations section
+---
+## Section Templates
+### Abstract Template
+```
+[Task] is [importance/challenge]. Existing methods [limitation].
+We present [method], which [key innovation].
+Our approach [brief description].
+Experiments on [dataset] demonstrate [main result].
+[Additional contribution: code/data release].
+```
+### Introduction Contribution Template
+```markdown
+Our main contributions are as follows:
+- We propose **[Method Name]**, a [brief description] for [task] that [key advantage].
+- We conduct extensive experiments on [datasets], achieving [specific result] and outperforming [baselines] by [margin].
+- We release our [code/model/data] at [URL] to facilitate future research.
+```
+### Conclusion Template
+```
+We presented [method] for [task]. Our approach [key innovation].
+Experiments on [dataset] show [main findings].
+Our analysis reveals [key insight].
+Limitations include [honest limitations].
+Future work includes [2-3 specific directions].
+```
+---
+## Checklist Before Submission
+### Content
+- [ ] Abstract summarizes all key points
+- [ ] Introduction clearly states contributions
+- [ ] Related work is comprehensive and fair
+- [ ] Methodology has enough detail to reproduce
+- [ ] Experiments include baselines and ablations
+- [ ] Results include error analysis
+- [ ] Limitations section is substantive
+- [ ] Conclusion matches actual contributions
+### Formatting
+- [ ] Within page limits
+- [ ] All figures/tables referenced in text
+- [ ] All citations properly formatted
+- [ ] No orphaned section headers
+- [ ] Consistent notation throughout
+### Reproducibility
+- [ ] Hyperparameters specified
+- [ ] Dataset details provided
+- [ ] Random seeds mentioned
+- [ ] Code/data availability stated
+---
+## Output Format
+Generate the paper in Markdown format with:
+1. Clear section headers (## for main sections, ### for subsections)
+2. Tables using Markdown table syntax
+3. Math using LaTeX notation ($..$ for inline, $$...$$ for display)
+4. Code blocks for algorithms/features
+5. Proper citation placeholders: (Author, Year)
+Save to the specified output file or TECHNICAL_REPORT.md by default.
+---
+## References
+Based on guidelines from:
+- [ACL Paper Formatting Guidelines](https://acl-org.github.io/ACLPUB/formatting.html)
+- [ACL Rolling Review Author Guidelines](http://aclrollingreview.org/authors)
+- [Tips for Writing NLP Papers](https://medium.com/@vered1986/tips-for-writing-nlp-papers-9c729a2f9e1f)
+- [Stanford Tips for Writing Technical Papers](https://cs.stanford.edu/people/widom/paper-writing.html)
+- [EMNLP 2024 Call for Papers](https://2024.emnlp.org/calls/main_conference_papers/)

.gitattributes ADDED Viewed

	@@ -0,0 +1,6 @@

+*.pdf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
+*.synctex filter=lfs diff=lfs merge=lfs -text

references/2007.rivf.hoang/paper.md ADDED Viewed

	@@ -0,0 +1,39 @@

+---
+title: "A Comparative Study on Vietnamese Text Classification Methods"
+authors:
+  - "Cong Duy Vu Hoang"
+  - "Dien Dinh"
+  - "Le Nguyen Nguyen"
+  - "Quoc Hung Ngo"
+year: 2007
+venue: "IEEE RIVF 2007"
+url: "https://ieeexplore.ieee.org/document/4223084/"
+---
+# A Comparative Study on Vietnamese Text Classification Methods
+## Abstract
+This paper presents two different approaches for Vietnamese text classification: Bag of Words (BOW) and Statistical N-Gram Language Modeling. Based on a Vietnamese news corpus, these approaches achieved an average of >95% accuracy with an average 79 minutes classifying time for about 14,000 documents.
+## Key Contributions
+1. Introduced the VNTC (Vietnamese News Text Classification) corpus
+2. Compared BOW and N-gram language model approaches for Vietnamese text classification
+3. Demonstrated SVM effectiveness for Vietnamese text
+## Results
+| Method | Accuracy |
+|--------|----------|
+| N-gram LM | 97.1% |
+| SVM Multi | 93.4% |
+| BOW + SVM | ~92% |
+## Dataset
+- VNTC: Vietnamese News Text Classification Corpus
+- 10 topics: Politics, Lifestyle, Science, Business, Law, Health, World, Sports, Culture, Technology
+- Available: https://github.com/duyvuleo/VNTC
+*Full text available at IEEE Xplore*

references/2018.kse.nguyen/paper.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+title: "UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis"
+authors:
+  - "Kiet Van Nguyen"
+  - "Vu Duc Nguyen"
+  - "Phu Xuan-Vinh Nguyen"
+  - "Tham Thi-Hong Truong"
+  - "Ngan Luu-Thuy Nguyen"
+year: 2018
+venue: "KSE 2018"
+url: "https://ieeexplore.ieee.org/document/8573337/"
+---
+# UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis
+## Abstract
+Vietnamese Students' Feedback Corpus (UIT-VSFC) is a resource consisting of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications.
+The corpus achieved 91.20% of the inter-annotator agreement for the sentiment-based task and 71.07% for the topic-based task.
+Baseline models built with the Maximum Entropy classifier achieved approximately 88% of the sentiment F1-score and over 84% of the topic F1-score.
+## Dataset Statistics
+- Total sentences: 16,175
+- Sentiment labels: Positive, Negative, Neutral
+- Topic labels: Multiple categories
+- Domain: Vietnamese university student feedback
+- Available at: https://huggingface.co/datasets/uitnlp/vietnamese_students_feedback
+*Full text available at IEEE Xplore*

references/2019.arxiv.conneau/paper.md ADDED Viewed

	@@ -0,0 +1,35 @@

+---
+title: "Unsupervised Cross-lingual Representation Learning at Scale"
+authors:
+  - "Alexis Conneau"
+  - "Kartikay Khandelwal"
+  - "Naman Goyal"
+  - "Vishrav Chaudhary"
+  - "Guillaume Wenzek"
+  - "Francisco Guzmán"
+  - "Edouard Grave"
+  - "Myle Ott"
+  - "Luke Zettlemoyer"
+  - "Veselin Stoyanov"
+year: 2019
+venue: "arXiv"
+url: "https://arxiv.org/abs/1911.02116"
+arxiv: "1911.02116"
+---
+\nobibliography{acl2020}
+\bibliographystyle{acl_natbib}
+\appendix
+\onecolumn
+# Supplementary materials
+# Languages and statistics for CC-100 used by \xlmr
+In this section we present the list of languages in the CC-100 corpus we created for training \xlmr. We also report statistics such as the number of tokens and the size of each monolingual corpus.
+\label{sec:appendix_A}
+\insertDataStatistics
+\newpage
+# Model Architectures and Sizes
+As we showed in section 5, capacity is an important parameter for learning strong cross-lingual representations. In the table below, we list multiple monolingual and multilingual models used by the research community and summarize their architectures and total number of parameters.
+\label{sec:appendix_B}
+\insertParameters

references/2019.arxiv.conneau/paper.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf2fbb1aa1805ab6f892a4a421ffd4d7575df37343980b9a3729855577d2d8a1
+size 398981

references/2019.arxiv.conneau/paper.tex ADDED Viewed

	@@ -0,0 +1,45 @@

+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{acl2020}
+\usepackage{times}
+\usepackage{latexsym}
+\renewcommand{\UrlFont}{\ttfamily\small}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\usepackage{graphicx}
+\usepackage{subfigure}
+\usepackage{booktabs} % for professional tables
+\usepackage{url}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{array}
+\usepackage{adjustbox}
+\usepackage{multirow}
+% \usepackage{subcaption}
+\usepackage{hyperref}
+\usepackage{longtable}
+\usepackage{bibentry}
+\newcommand{\xlmr}{\textit{XLM-R}\xspace}
+\newcommand{\mbert}{mBERT\xspace}
+\input{content/tables}
+\begin{document}
+\nobibliography{acl2020}
+\bibliographystyle{acl_natbib}
+\appendix
+\onecolumn
+\section*{Supplementary materials}
+\section{Languages and statistics for CC-100 used by \xlmr}
+In this section we present the list of languages in the CC-100 corpus we created for training \xlmr. We also report statistics such as the number of tokens and the size of each monolingual corpus.
+\label{sec:appendix_A}
+\insertDataStatistics
+\newpage
+\section{Model Architectures and Sizes}
+As we showed in section 5, capacity is an important parameter for learning strong cross-lingual representations. In the table below, we list multiple monolingual and multilingual models used by the research community and summarize their architectures and total number of parameters.
+\label{sec:appendix_B}
+\insertParameters
+\end{document}

references/2019.arxiv.conneau/source/XLMR Paper/acl2020.bib ADDED Viewed

	@@ -0,0 +1,739 @@

+@inproceedings{koehn2007moses,
+  title={Moses: Open source toolkit for statistical machine translation},
+  author={Koehn, Philipp and Hoang, Hieu and Birch, Alexandra and Callison-Burch, Chris and Federico, Marcello and Bertoldi, Nicola and Cowan, Brooke and Shen, Wade and Moran, Christine and Zens, Richard and others},
+  booktitle={Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions},
+  pages={177--180},
+  year={2007},
+  organization={Association for Computational Linguistics}
+}
+@article{xie2019unsupervised,
+  title={Unsupervised data augmentation for consistency training},
+  author={Xie, Qizhe and Dai, Zihang and Hovy, Eduard and Luong, Minh-Thang and Le, Quoc V},
+  journal={arXiv preprint arXiv:1904.12848},
+  year={2019}
+}
+@article{baevski2018adaptive,
+  title={Adaptive input representations for neural language modeling},
+  author={Baevski, Alexei and Auli, Michael},
+  journal={arXiv preprint arXiv:1809.10853},
+  year={2018}
+}
+@article{wu2019emerging,
+  title={Emerging Cross-lingual Structure in Pretrained Language Models},
+  author={Wu, Shijie and Conneau, Alexis and Li, Haoran and Zettlemoyer, Luke and Stoyanov, Veselin},
+  journal={ACL},
+  year={2019}
+}
+@inproceedings{grave2017efficient,
+  title={Efficient softmax approximation for GPUs},
+  author={Grave, Edouard and Joulin, Armand and Ciss{\'e}, Moustapha and J{\'e}gou, Herv{\'e} and others},
+  booktitle={Proceedings of the 34th International Conference on Machine Learning-Volume 70},
+  pages={1302--1310},
+  year={2017},
+  organization={JMLR. org}
+}
+@article{sang2002introduction,
+  title={Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition},
+  author={Sang, Erik F},
+  journal={CoNLL},
+  year={2002}
+}
+@article{singh2019xlda,
+  title={XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering},
+  author={Singh, Jasdeep and McCann, Bryan and Keskar, Nitish Shirish and Xiong, Caiming and Socher, Richard},
+  journal={arXiv preprint arXiv:1905.11471},
+  year={2019}
+}
+@inproceedings{tjong2003introduction,
+  title={Introduction to the CoNLL-2003 shared task: language-independent named entity recognition},
+  author={Tjong Kim Sang, Erik F and De Meulder, Fien},
+  booktitle={CoNLL},
+  pages={142--147},
+  year={2003},
+  organization={Association for Computational Linguistics}
+}
+@misc{ud-v2.3,
+ title = {Universal Dependencies 2.3},
+ author = {Nivre, Joakim et al.},
+ url = {http://hdl.handle.net/11234/1-2895},
+ note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
+ copyright = {Licence Universal Dependencies v2.3},
+ year = {2018} }
+@article{huang2019unicoder,
+  title={Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks},
+  author={Huang, Haoyang and Liang, Yaobo and Duan, Nan and Gong, Ming and Shou, Linjun and Jiang, Daxin and Zhou, Ming},
+  journal={ACL},
+  year={2019}
+}
+@article{kingma2014adam,
+  title={Adam: A method for stochastic optimization},
+  author={Kingma, Diederik P and Ba, Jimmy},
+  journal={arXiv preprint arXiv:1412.6980},
+  year={2014}
+}
+@article{bojanowski2017enriching,
+  title={Enriching word vectors with subword information},
+  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
+  journal={TACL},
+  volume={5},
+  pages={135--146},
+  year={2017},
+  publisher={MIT Press}
+}
+@article{werbos1990backpropagation,
+  title={Backpropagation through time: what it does and how to do it},
+  author={Werbos, Paul J},
+  journal={Proceedings of the IEEE},
+  volume={78},
+  number={10},
+  pages={1550--1560},
+  year={1990},
+  publisher={IEEE}
+}
+@article{hochreiter1997long,
+  title={Long short-term memory},
+  author={Hochreiter, Sepp and Schmidhuber, J{\"u}rgen},
+  journal={Neural computation},
+  volume={9},
+  number={8},
+  pages={1735--1780},
+  year={1997},
+  publisher={MIT Press}
+}
+@article{al2018character,
+  title={Character-level language modeling with deeper self-attention},
+  author={Al-Rfou, Rami and Choe, Dokook and Constant, Noah and Guo, Mandy and Jones, Llion},
+  journal={arXiv preprint arXiv:1808.04444},
+  year={2018}
+}
+@misc{dai2019transformerxl,
+  title={Transformer-{XL}: Language Modeling with Longer-Term Dependency},
+  author={Zihang Dai and Zhilin Yang and Yiming Yang and William W. Cohen and Jaime Carbonell and Quoc V. Le and Ruslan Salakhutdinov},
+  year={2019},
+  url={https://openreview.net/forum?id=HJePno0cYm},
+}
+@article{jozefowicz2016exploring,
+  title={Exploring the limits of language modeling},
+  author={Jozefowicz, Rafal and Vinyals, Oriol and Schuster, Mike and Shazeer, Noam and Wu, Yonghui},
+  journal={arXiv preprint arXiv:1602.02410},
+  year={2016}
+}
+@inproceedings{mikolov2010recurrent,
+  title={Recurrent neural network based language model},
+  author={Mikolov, Tom{\'a}{\v{s}} and Karafi{\'a}t, Martin and Burget, Luk{\'a}{\v{s}} and {\v{C}}ernock{\`y}, Jan and Khudanpur, Sanjeev},
+  booktitle={Eleventh Annual Conference of the International Speech Communication Association},
+  year={2010}
+}
+@article{gehring2017convolutional,
+  title={Convolutional sequence to sequence learning},
+  author={Gehring, Jonas and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N},
+  journal={arXiv preprint arXiv:1705.03122},
+  year={2017}
+}
+@article{sennrich2016edinburgh,
+  title={Edinburgh neural machine translation systems for wmt 16},
+  author={Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
+  journal={arXiv preprint arXiv:1606.02891},
+  year={2016}
+}
+@inproceedings{howard2018universal,
+  title={Universal language model fine-tuning for text classification},
+  author={Howard, Jeremy and Ruder, Sebastian},
+  booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  volume={1},
+  pages={328--339},
+  year={2018}
+}
+@inproceedings{unsupNMTartetxe,
+  title = {Unsupervised neural machine translation},
+  author = {Mikel Artetxe and Gorka Labaka and Eneko Agirre and Kyunghyun Cho},
+  booktitle = {International Conference on Learning Representations (ICLR)},
+  year = {2018}
+}
+@inproceedings{artetxe2017learning,
+  title={Learning bilingual word embeddings with (almost) no bilingual data},
+  author={Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
+  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  volume={1},
+  pages={451--462},
+  year={2017}
+}
+@inproceedings{socher2013recursive,
+  title={Recursive deep models for semantic compositionality over a sentiment treebank},
+  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew and Potts, Christopher},
+  booktitle={EMNLP},
+  pages={1631--1642},
+  year={2013}
+}
+@inproceedings{bowman2015large,
+  title={A large annotated corpus for learning natural language inference},
+  author={Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher and Manning, Christopher D.},
+  booktitle={EMNLP},
+  year={2015}
+}
+@inproceedings{multinli:2017,
+  Title = {A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference},
+  Author = {Adina Williams and Nikita Nangia and Samuel R. Bowman},
+  Booktitle = {NAACL},
+  year = {2017}
+}
+@article{paszke2017automatic,
+  title={Automatic differentiation in pytorch},
+  author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
+  journal={NIPS 2017 Autodiff Workshop},
+  year={2017}
+}
+@inproceedings{conneau2018craminto,
+  title={What you can cram into a single vector: Probing sentence embeddings for linguistic properties},
+  author={Conneau, Alexis and Kruszewski, German and Lample, Guillaume and Barrault, Lo{\"\i}c and Baroni, Marco},
+  booktitle = {ACL},
+  year={2018}
+}
+@inproceedings{Conneau:2018:iclr_muse,
+  title={Word Translation without Parallel Data},
+  author={Alexis Conneau and Guillaume Lample and {Marc'Aurelio} Ranzato and Ludovic Denoyer and Hervé Jegou},
+  booktitle = {ICLR},
+  year={2018}
+}
+@article{johnson2017google,
+  title={Google’s multilingual neural machine translation system: Enabling zero-shot translation},
+  author={Johnson, Melvin and Schuster, Mike and Le, Quoc V and Krikun, Maxim and Wu, Yonghui and Chen, Zhifeng and Thorat, Nikhil and Vi{\'e}gas, Fernanda and Wattenberg, Martin and Corrado, Greg and others},
+  journal={TACL},
+  volume={5},
+  pages={339--351},
+  year={2017},
+  publisher={MIT Press}
+}
+@article{radford2019language,
+  title={Language models are unsupervised multitask learners},
+  author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
+  journal={OpenAI Blog},
+  volume={1},
+  number={8},
+  year={2019}
+}
+@inproceedings{unsupNMTlample,
+title = {Unsupervised machine translation using monolingual corpora only},
+author = {Lample, Guillaume and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
+booktitle = {ICLR},
+year = {2018}
+}
+@inproceedings{lample2018phrase,
+  title={Phrase-Based \& Neural Unsupervised Machine Translation},
+  author={Lample, Guillaume and Ott, Myle and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
+  booktitle={EMNLP},
+  year={2018}
+}
+@article{hendrycks2016bridging,
+  title={Bridging nonlinearities and stochastic regularizers with Gaussian error linear units},
+  author={Hendrycks, Dan and Gimpel, Kevin},
+  journal={arXiv preprint arXiv:1606.08415},
+  year={2016}
+}
+@inproceedings{chang2008optimizing,
+  title={Optimizing Chinese word segmentation for machine translation performance},
+  author={Chang, Pi-Chuan and Galley, Michel and Manning, Christopher D},
+  booktitle={Proceedings of the third workshop on statistical machine translation},
+  pages={224--232},
+  year={2008}
+}
+@inproceedings{rajpurkar-etal-2016-squad,
+    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
+    author = "Rajpurkar, Pranav  and
+      Zhang, Jian  and
+      Lopyrev, Konstantin  and
+      Liang, Percy",
+    booktitle = "EMNLP",
+    month = nov,
+    year = "2016",
+    address = "Austin, Texas",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/D16-1264",
+    doi = "10.18653/v1/D16-1264",
+    pages = "2383--2392",
+}
+@article{lewis2019mlqa,
+  title={MLQA: Evaluating Cross-lingual Extractive Question Answering},
+  author={Lewis, Patrick and O\u{g}uz, Barlas and Rinott, Ruty and Riedel, Sebastian and Schwenk, Holger},
+  journal={arXiv preprint arXiv:1910.07475},
+  year={2019}
+}
+@inproceedings{sennrich2015neural,
+  title={Neural machine translation of rare words with subword units},
+  author={Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
+  booktitle={Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics},
+  pages = {1715-1725},
+  year={2015}
+}
+@article{eriguchi2018zero,
+  title={Zero-shot cross-lingual classification using multilingual neural machine translation},
+  author={Eriguchi, Akiko and Johnson, Melvin and Firat, Orhan and Kazawa, Hideto and Macherey, Wolfgang},
+  journal={arXiv preprint arXiv:1809.04686},
+  year={2018}
+}
+@article{smith2017offline,
+  title={Offline bilingual word vectors, orthogonal transformations and the inverted softmax},
+  author={Smith, Samuel L and Turban, David HP and Hamblin, Steven and Hammerla, Nils Y},
+  journal={International Conference on Learning Representations},
+  year={2017}
+}
+@article{artetxe2016learning,
+  title={Learning principled bilingual mappings of word embeddings while preserving monolingual invariance},
+  author={Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
+  journal={Proceedings of EMNLP},
+  year={2016}
+}
+@article{ammar2016massively,
+  title={Massively multilingual word embeddings},
+  author={Ammar, Waleed and Mulcaire, George and Tsvetkov, Yulia and Lample, Guillaume and Dyer, Chris and Smith, Noah A},
+  journal={arXiv preprint arXiv:1602.01925},
+  year={2016}
+}
+@article{marcobaroni2015hubness,
+  title={Hubness and pollution: Delving into cross-space mapping for zero-shot learning},
+  author={Lazaridou, Angeliki and Dinu, Georgiana and Baroni, Marco},
+  journal={Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics},
+  year={2015}
+}
+@article{xing2015normalized,
+  title={Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation},
+  author={Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye},
+  journal={Proceedings of NAACL},
+  year={2015}
+}
+@article{faruqui2014improving,
+  title={Improving Vector Space Word Representations Using Multilingual Correlation},
+  author={Faruqui, Manaal and Dyer, Chris},
+  journal={Proceedings of EACL},
+  year={2014}
+}
+@article{taylor1953cloze,
+  title={“Cloze procedure”: A new tool for measuring readability},
+  author={Taylor, Wilson L},
+  journal={Journalism Bulletin},
+  volume={30},
+  number={4},
+  pages={415--433},
+  year={1953},
+  publisher={SAGE Publications Sage CA: Los Angeles, CA}
+}
+@inproceedings{mikolov2013distributed,
+  title={Distributed representations of words and phrases and their compositionality},
+  author={Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff},
+  booktitle={NIPS},
+  pages={3111--3119},
+  year={2013}
+}
+@article{mikolov2013exploiting,
+  title={Exploiting similarities among languages for machine translation},
+  author={Mikolov, Tomas and Le, Quoc V and Sutskever, Ilya},
+  journal={arXiv preprint arXiv:1309.4168},
+  year={2013}
+}
+@article{artetxe2018massively,
+  title={Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond},
+  author={Artetxe, Mikel and Schwenk, Holger},
+  journal={arXiv preprint arXiv:1812.10464},
+  year={2018}
+}
+@article{williams2017broad,
+  title={A broad-coverage challenge corpus for sentence understanding through inference},
+  author={Williams, Adina and Nangia, Nikita and Bowman, Samuel R},
+  journal={Proceedings of the 2nd Workshop on Evaluating Vector-Space Representations for NLP},
+  year={2017}
+}
+@InProceedings{conneau2018xnli,
+  author = "Conneau, Alexis
+        and Rinott, Ruty
+        and Lample, Guillaume
+        and Williams, Adina
+        and Bowman, Samuel R.
+        and Schwenk, Holger
+        and Stoyanov, Veselin",
+  title = "XNLI: Evaluating Cross-lingual Sentence Representations",
+  booktitle = "EMNLP",
+  year = "2018",
+  publisher = "Association for Computational Linguistics",
+  location = "Brussels, Belgium",
+}
+@article{wada2018unsupervised,
+  title={Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models},
+  author={Wada, Takashi and Iwata, Tomoharu},
+  journal={arXiv preprint arXiv:1809.02306},
+  year={2018}
+}
+@article{xu2013cross,
+  title={Cross-lingual language modeling for low-resource speech recognition},
+  author={Xu, Ping and Fung, Pascale},
+  journal={IEEE Transactions on Audio, Speech, and Language Processing},
+  volume={21},
+  number={6},
+  pages={1134--1144},
+  year={2013},
+  publisher={IEEE}
+}
+@article{hermann2014multilingual,
+  title={Multilingual models for compositional distributed semantics},
+  author={Hermann, Karl Moritz and Blunsom, Phil},
+  journal={arXiv preprint arXiv:1404.4641},
+  year={2014}
+}
+@inproceedings{transformer17,
+title = {Attention is all you need},
+author = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
+booktitle={Advances in Neural Information Processing Systems},
+pages={6000--6010},
+year = {2017}
+}
+@article{liu2019multi,
+  title={Multi-task deep neural networks for natural language understanding},
+  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
+  journal={arXiv preprint arXiv:1901.11504},
+  year={2019}
+}
+@article{wang2018glue,
+  title={GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
+  author={Wang, Alex and Singh, Amapreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R},
+  journal={arXiv preprint arXiv:1804.07461},
+  year={2018}
+}
+@article{radford2018improving,
+  title={Improving language understanding by generative pre-training},
+  author={Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya},
+  journal={URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language\_understanding\_paper.pdf},
+  url={https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf},
+  year={2018}
+}
+@article{conneau2018senteval,
+  title={SentEval: An Evaluation Toolkit for Universal Sentence Representations},
+  author={Conneau, Alexis and Kiela, Douwe},
+  journal={LREC},
+  year={2018}
+}
+@article{devlin2018bert,
+  title={Bert: Pre-training of deep bidirectional transformers for language understanding},
+  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
+  journal={NAACL},
+  year={2018}
+}
+@article{peters2018deep,
+  title={Deep contextualized word representations},
+  author={Peters, Matthew E and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke},
+  journal={NAACL},
+  year={2018}
+}
+@article{ramachandran2016unsupervised,
+  title={Unsupervised pretraining for sequence to sequence learning},
+  author={Ramachandran, Prajit and Liu, Peter J and Le, Quoc V},
+  journal={arXiv preprint arXiv:1611.02683},
+  year={2016}
+}
+@inproceedings{kunchukuttan2018iit,
+  title={The IIT Bombay English-Hindi Parallel Corpus},
+  author={Kunchukuttan Anoop and Mehta Pratik and Bhattacharyya Pushpak},
+  booktitle={LREC},
+  year={2018}
+}
+@article{wu2019beto,
+  title={Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT},
+  author={Wu, Shijie and Dredze, Mark},
+  journal={EMNLP},
+  year={2019}
+}
+@inproceedings{lample-etal-2016-neural,
+    title = "Neural Architectures for Named Entity Recognition",
+    author = "Lample, Guillaume  and
+      Ballesteros, Miguel  and
+      Subramanian, Sandeep  and
+      Kawakami, Kazuya  and
+      Dyer, Chris",
+    booktitle = "NAACL",
+    month = jun,
+    year = "2016",
+    address = "San Diego, California",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/N16-1030",
+    doi = "10.18653/v1/N16-1030",
+    pages = "260--270",
+}
+@inproceedings{akbik2018coling,
+  title={Contextual String Embeddings for Sequence Labeling},
+  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
+  booktitle = {COLING},
+  pages     = {1638--1649},
+  year      = {2018}
+}
+@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,
+    title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
+    author = "Tjong Kim Sang, Erik F.  and
+      De Meulder, Fien",
+    booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
+    year = "2003",
+    url = "https://www.aclweb.org/anthology/W03-0419",
+    pages = "142--147",
+}
+@inproceedings{tjong-kim-sang-2002-introduction,
+    title = "Introduction to the {C}o{NLL}-2002 Shared Task: Language-Independent Named Entity Recognition",
+    author = "Tjong Kim Sang, Erik F.",
+    booktitle = "{COLING}-02: The 6th Conference on Natural Language Learning 2002 ({C}o{NLL}-2002)",
+    year = "2002",
+    url = "https://www.aclweb.org/anthology/W02-2024",
+}
+@InProceedings{TIEDEMANN12.463,
+  author = {Jörg Tiedemann},
+  title = {Parallel Data, Tools and Interfaces in OPUS},
+  booktitle = {LREC},
+  year = {2012},
+  month = {may},
+  date = {23-25},
+  address = {Istanbul, Turkey},
+  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
+  publisher = {European Language Resources Association (ELRA)},
+  isbn = {978-2-9517408-7-7},
+  language = {english}
+ }
+@inproceedings{ziemski2016united,
+  title={The United Nations Parallel Corpus v1. 0.},
+  author={Ziemski, Michal and Junczys-Dowmunt, Marcin and Pouliquen, Bruno},
+  booktitle={LREC},
+  year={2016}
+}
+@article{roberta2019,
+  author    = {Yinhan Liu and
+               Myle Ott and
+               Naman Goyal and
+               Jingfei Du and
+               Mandar Joshi and
+               Danqi Chen and
+               Omer Levy and
+               Mike Lewis and
+               Luke Zettlemoyer and
+               Veselin Stoyanov},
+  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
+  journal    = {arXiv preprint arXiv:1907.11692},
+  year      = {2019}
+}
+@article{tan2019multilingual,
+  title={Multilingual neural machine translation with knowledge distillation},
+  author={Tan, Xu and Ren, Yi and He, Di and Qin, Tao and Zhao, Zhou and Liu, Tie-Yan},
+  journal={ICLR},
+  year={2019}
+}
+@article{siddhant2019evaluating,
+  title={Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation},
+  author={Siddhant, Aditya and Johnson, Melvin and Tsai, Henry and Arivazhagan, Naveen and Riesa, Jason and Bapna, Ankur and Firat, Orhan and Raman, Karthik},
+  journal={AAAI},
+  year={2019}
+}
+@inproceedings{camacho2017semeval,
+  title={Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity},
+  author={Camacho-Collados, Jose and Pilehvar, Mohammad Taher and Collier, Nigel and Navigli, Roberto},
+  booktitle={Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)},
+  pages={15--26},
+  year={2017}
+}
+@inproceedings{Pires2019HowMI,
+  title={How Multilingual is Multilingual BERT?},
+  author={Telmo Pires and Eva Schlinger and Dan Garrette},
+  booktitle={ACL},
+  year={2019}
+}
+@article{lample2019cross,
+  title={Cross-lingual language model pretraining},
+  author={Lample, Guillaume and Conneau, Alexis},
+  journal={NeurIPS},
+  year={2019}
+}
+@article{schuster2019cross,
+  title={Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing},
+  author={Schuster, Tal and Ram, Ori and Barzilay, Regina and Globerson, Amir},
+  journal={NAACL},
+  year={2019}
+}
+@inproceedings{chang2008optimizing,
+  title={Optimizing Chinese word segmentation for machine translation performance},
+  author={Chang, Pi-Chuan and Galley, Michel and Manning, Christopher D},
+  booktitle={Proceedings of the third workshop on statistical machine translation},
+  pages={224--232},
+  year={2008}
+}
+@inproceedings{koehn2007moses,
+  title={Moses: Open source toolkit for statistical machine translation},
+  author={Koehn, Philipp and Hoang, Hieu and Birch, Alexandra and Callison-Burch, Chris and Federico, Marcello and Bertoldi, Nicola and Cowan, Brooke and Shen, Wade and Moran, Christine and Zens, Richard and others},
+  booktitle={Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions},
+  pages={177--180},
+  year={2007},
+  organization={Association for Computational Linguistics}
+}
+@article{wenzek2019ccnet,
+  title={CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data},
+  author={Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzman, Francisco and Joulin, Armand and Grave, Edouard},
+  journal={arXiv preprint arXiv:1911.00359},
+  year={2019}
+}
+@inproceedings{zhou2016cross,
+  title={Cross-lingual sentiment classification with bilingual document representation learning},
+  author={Zhou, Xinjie and Wan, Xiaojun and Xiao, Jianguo},
+  booktitle={Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  pages={1403--1412},
+  year={2016}
+}
+@article{goyal2017accurate,
+  title={Accurate, large minibatch sgd: Training imagenet in 1 hour},
+  author={Goyal, Priya and Doll{\'a}r, Piotr and Girshick, Ross and Noordhuis, Pieter and Wesolowski, Lukasz and Kyrola, Aapo and Tulloch, Andrew and Jia, Yangqing and He, Kaiming},
+  journal={arXiv preprint arXiv:1706.02677},
+  year={2017}
+}
+@article{arivazhagan2019massively,
+  title={Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges},
+  author={Arivazhagan, Naveen and Bapna, Ankur and Firat, Orhan and Lepikhin, Dmitry and Johnson, Melvin and Krikun, Maxim and Chen, Mia Xu and Cao, Yuan and Foster, George and Cherry, Colin and others},
+  journal={arXiv preprint arXiv:1907.05019},
+  year={2019}
+}
+@inproceedings{pan2017cross,
+  title={Cross-lingual name tagging and linking for 282 languages},
+  author={Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng},
+  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  volume={1},
+  pages={1946--1958},
+  year={2017}
+}
+@article{raffel2019exploring,
+    title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
+    author={Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
+    year={2019},
+    journal={arXiv preprint arXiv:1910.10683},
+}
+@inproceedings{pennington2014glove,
+  author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning},
+  booktitle = {EMNLP},
+  title = {GloVe: Global Vectors for Word Representation},
+  year = {2014},
+  pages = {1532--1543},
+  url = {http://www.aclweb.org/anthology/D14-1162},
+}
+@article{kudo2018sentencepiece,
+  title={Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing},
+  author={Kudo, Taku and Richardson, John},
+  journal={EMNLP},
+  year={2018}
+}
+@article{rajpurkar2018know,
+  title={Know What You Don't Know: Unanswerable Questions for SQuAD},
+  author={Rajpurkar, Pranav and Jia, Robin and Liang, Percy},
+  journal={ACL},
+  year={2018}
+}
+@article{joulin2017bag,
+  title={Bag of Tricks for Efficient Text Classification},
+  author={Joulin, Armand and Grave, Edouard and Mikolov, Piotr Bojanowski Tomas},
+  journal={EACL 2017},
+  pages={427},
+  year={2017}
+}
+@inproceedings{kudo2018subword,
+  title={Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates},
+  author={Kudo, Taku},
+  booktitle={ACL},
+  pages={66--75},
+  year={2018}
+}
+@inproceedings{grave2018learning,
+  title={Learning Word Vectors for 157 Languages},
+  author={Grave, Edouard and Bojanowski, Piotr and Gupta, Prakhar and Joulin, Armand and Mikolov, Tomas},
+  booktitle={LREC},
+  year={2018}
+}

references/2019.arxiv.conneau/source/XLMR Paper/acl2020.sty ADDED Viewed

	@@ -0,0 +1,560 @@

+% This is the LaTex style file for ACL 2020, based off of ACL 2019.
+% Addressing bibtex issues mentioned in https://github.com/acl-org/acl-pub/issues/2
+% Other major modifications include
+% changing the color of the line numbers to a light gray; changing font size of abstract to be 10pt; changing caption font size to be 10pt.
+% -- M Mitchell and Stephanie Lukin
+% 2017: modified to support DOI links in bibliography.  Now uses
+% natbib package rather than defining citation commands in this file.
+% Use with acl_natbib.bst bib style.  -- Dan Gildea
+% This is the LaTeX style for ACL 2016. It contains Margaret Mitchell's
+% line number adaptations (ported by Hai Zhao and Yannick Versley).
+% It is nearly identical to the style files for ACL 2015,
+% ACL 2014, EACL 2006, ACL2005, ACL 2002, ACL 2001, ACL 2000,
+% EACL 95 and EACL 99.
+%
+% Changes made include: adapt layout to A4 and centimeters, widen abstract
+% This is the LaTeX style file for ACL 2000.  It is nearly identical to the
+% style files for EACL 95 and EACL 99.  Minor changes include editing the
+% instructions to reflect use of \documentclass rather than \documentstyle
+% and removing the white space before the title on the first page
+% -- John Chen, June 29, 2000
+% This is the LaTeX style file for EACL-95.  It is identical to the
+% style file for ANLP '94 except that the margins are adjusted for A4
+% paper.  -- abney 13 Dec 94
+% The ANLP '94 style file is a slightly modified
+% version of the style used for AAAI and IJCAI, using some changes
+% prepared by Fernando Pereira and others and some minor changes
+% by Paul Jacobs.
+% Papers prepared using the aclsub.sty file and acl.bst bibtex style
+% should be easily converted to final format using this style.
+% (1) Submission information (\wordcount, \subject, and \makeidpage)
+% should be removed.
+% (2) \summary should be removed.  The summary material should come
+% after \maketitle and should be in the ``abstract'' environment
+% (between \begin{abstract} and \end{abstract}).
+% (3) Check all citations.  This style should handle citations correctly
+% and also allows multiple citations separated by semicolons.
+% (4) Check figures and examples.  Because the final format is double-
+% column, some adjustments may have to be made to fit text in the column
+% or to choose full-width (\figure*} figures.
+% Place this in a file called aclap.sty in the TeX search path.
+% (Placing it in the same directory as the paper should also work.)
+% Prepared by Peter F. Patel-Schneider, liberally using the ideas of
+% other style hackers, including Barbara Beeton.
+% This style is NOT guaranteed to work.  It is provided in the hope
+% that it will make the preparation of papers easier.
+%
+% There are undoubtably bugs in this style.  If you make bug fixes,
+% improvements, etc.  please let me know.  My e-mail address is:
+%       pfps@research.att.com
+% Papers are to be prepared using the ``acl_natbib'' bibliography style,
+% as follows:
+%       \documentclass[11pt]{article}
+%       \usepackage{acl2000}
+%       \title{Title}
+%       \author{Author 1 \and Author 2 \\ Address line \\ Address line \And
+%               Author 3 \\ Address line \\ Address line}
+%       \begin{document}
+%       ...
+%       \bibliography{bibliography-file}
+%       \bibliographystyle{acl_natbib}
+%       \end{document}
+% Author information can be set in various styles:
+% For several authors from the same institution:
+% \author{Author 1 \and ... \and Author n \\
+%         Address line \\ ... \\ Address line}
+% if the names do not fit well on one line use
+%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
+% For authors from different institutions:
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \And  ... \And
+%         Author n \\ Address line \\ ... \\ Address line}
+% To start a seperate ``row'' of authors use \AND, as in
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \AND
+%         Author 2 \\ Address line \\ ... \\ Address line \And
+%         Author 3 \\ Address line \\ ... \\ Address line}
+% If the title and author information does not fit in the area allocated,
+% place \setlength\titlebox{<new height>} right after
+% \usepackage{acl2015}
+% where <new height> can be something larger than 5cm
+% include hyperref, unless user specifies nohyperref option like this:
+% \usepackage[nohyperref]{naaclhlt2018}
+\newif\ifacl@hyperref
+\DeclareOption{hyperref}{\acl@hyperreftrue}
+\DeclareOption{nohyperref}{\acl@hyperreffalse}
+\ExecuteOptions{hyperref} % default is to use hyperref
+\ProcessOptions\relax
+\ifacl@hyperref
+  \RequirePackage{hyperref}
+  \usepackage{xcolor}		% make links dark blue
+  \definecolor{darkblue}{rgb}{0, 0, 0.5}
+  \hypersetup{colorlinks=true,citecolor=darkblue, linkcolor=darkblue, urlcolor=darkblue}
+\else
+  % This definition is used if the hyperref package is not loaded.
+  % It provides a backup, no-op definiton of \href.
+  % This is necessary because \href command is used in the acl_natbib.bst file.
+  \def\href#1#2{{#2}}
+  % We still need to load xcolor in this case because the lighter line numbers require it. (SC/KG/WL)
+  \usepackage{xcolor}
+\fi
+\typeout{Conference Style for ACL 2019}
+% NOTE:  Some laser printers have a serious problem printing TeX output.
+% These printing devices, commonly known as ``write-white'' laser
+% printers, tend to make characters too light.  To get around this
+% problem, a darker set of fonts must be created for these devices.
+%
+\newcommand{\Thanks}[1]{\thanks{\ #1}}
+% A4 modified by Eneko; again modified by Alexander for 5cm titlebox
+\setlength{\paperwidth}{21cm}   % A4
+\setlength{\paperheight}{29.7cm}% A4
+\setlength\topmargin{-0.5cm}
+\setlength\oddsidemargin{0cm}
+\setlength\textheight{24.7cm}
+\setlength\textwidth{16.0cm}
+\setlength\columnsep{0.6cm}
+\newlength\titlebox
+\setlength\titlebox{5cm}
+\setlength\headheight{5pt}
+\setlength\headsep{0pt}
+\thispagestyle{empty}
+\pagestyle{empty}
+\flushbottom \twocolumn \sloppy
+% We're never going to need a table of contents, so just flush it to
+% save space --- suggested by drstrip@sandia-2
+\def\addcontentsline#1#2#3{}
+\newif\ifaclfinal
+\aclfinalfalse
+\def\aclfinalcopy{\global\aclfinaltrue}
+%% ----- Set up hooks to repeat content on every page of the output doc,
+%% necessary for the line numbers in the submitted version.  --MM
+%%
+%% Copied from CVPR 2015's cvpr_eso.sty, which appears to be largely copied from everyshi.sty.
+%%
+%% Original cvpr_eso.sty available at: http://www.pamitc.org/cvpr15/author_guidelines.php
+%% Original evershi.sty available at: https://www.ctan.org/pkg/everyshi
+%%
+%% Copyright (C) 2001 Martin Schr\"oder:
+%%
+%%                         Martin Schr"oder
+%%                         Cr"usemannallee 3
+%%                         D-28213 Bremen
+%%                         Martin.Schroeder@ACM.org
+%%
+%% This program may be redistributed and/or modified under the terms
+%% of the LaTeX Project Public License, either version 1.0 of this
+%% license, or (at your option) any later version.
+%% The latest version of this license is in
+%%    CTAN:macros/latex/base/lppl.txt.
+%%
+%% Happy users are requested to send [Martin] a postcard. :-)
+%%
+\newcommand{\@EveryShipoutACL@Hook}{}
+\newcommand{\@EveryShipoutACL@AtNextHook}{}
+\newcommand*{\EveryShipoutACL}[1]
+   {\g@addto@macro\@EveryShipoutACL@Hook{#1}}
+\newcommand*{\AtNextShipoutACL@}[1]
+   {\g@addto@macro\@EveryShipoutACL@AtNextHook{#1}}
+\newcommand{\@EveryShipoutACL@Shipout}{%
+   \afterassignment\@EveryShipoutACL@Test
+   \global\setbox\@cclv= %
+   }
+\newcommand{\@EveryShipoutACL@Test}{%
+   \ifvoid\@cclv\relax
+      \aftergroup\@EveryShipoutACL@Output
+   \else
+      \@EveryShipoutACL@Output
+   \fi%
+   }
+\newcommand{\@EveryShipoutACL@Output}{%
+   \@EveryShipoutACL@Hook%
+   \@EveryShipoutACL@AtNextHook%
+      \gdef\@EveryShipoutACL@AtNextHook{}%
+   \@EveryShipoutACL@Org@Shipout\box\@cclv%
+   }
+\newcommand{\@EveryShipoutACL@Org@Shipout}{}
+\newcommand*{\@EveryShipoutACL@Init}{%
+   \message{ABD: EveryShipout initializing macros}%
+   \let\@EveryShipoutACL@Org@Shipout\shipout
+   \let\shipout\@EveryShipoutACL@Shipout
+   }
+\AtBeginDocument{\@EveryShipoutACL@Init}
+%% ----- Set up for placing additional items into the submitted version --MM
+%%
+%% Based on eso-pic.sty
+%%
+%% Original available at: https://www.ctan.org/tex-archive/macros/latex/contrib/eso-pic
+%% Copyright (C) 1998-2002 by Rolf Niepraschk <niepraschk@ptb.de>
+%%
+%% Which may be distributed and/or modified under the conditions of
+%% the LaTeX Project Public License, either version 1.2 of this license
+%% or (at your option) any later version.  The latest version of this
+%% license is in:
+%%
+%%    http://www.latex-project.org/lppl.txt
+%%
+%% and version 1.2 or later is part of all distributions of LaTeX version
+%% 1999/12/01 or later.
+%%
+%% In contrast to the original, we do not include the definitions for/using:
+%% gridpicture, div[2], isMEMOIR[1], gridSetup[6][], subgridstyle{dotted}, labelfactor{}, gap{}, gridunitname{}, gridunit{}, gridlines{\thinlines}, subgridlines{\thinlines}, the {keyval} package, evenside margin, nor any definitions with 'color'.
+%%
+%% These are beyond  what is needed for the NAACL/ACL style.
+%%
+\newcommand\LenToUnit[1]{#1\@gobble}
+\newcommand\AtPageUpperLeft[1]{%
+  \begingroup
+    \@tempdima=0pt\relax\@tempdimb=\ESO@yoffsetI\relax
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtPageLowerLeft[1]{\AtPageUpperLeft{%
+  \put(0,\LenToUnit{-\paperheight}){#1}}}
+\newcommand\AtPageCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.5\paperheight}){#1}}}
+\newcommand\AtPageLowerCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-\paperheight}){#1}}}%
+\newcommand\AtPageLowishCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.96\paperheight}){#1}}}
+\newcommand\AtTextUpperLeft[1]{%
+  \begingroup
+    \setlength\@tempdima{1in}%
+    \advance\@tempdima\oddsidemargin%
+    \@tempdimb=\ESO@yoffsetI\relax\advance\@tempdimb-1in\relax%
+    \advance\@tempdimb-\topmargin%
+    \advance\@tempdimb-\headheight\advance\@tempdimb-\headsep%
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtTextLowerLeft[1]{\AtTextUpperLeft{%
+  \put(0,\LenToUnit{-\textheight}){#1}}}
+\newcommand\AtTextCenter[1]{\AtTextUpperLeft{%
+  \put(\LenToUnit{.5\textwidth},\LenToUnit{-.5\textheight}){#1}}}
+\newcommand{\ESO@HookI}{} \newcommand{\ESO@HookII}{}
+\newcommand{\ESO@HookIII}{}
+\newcommand{\AddToShipoutPicture}{%
+  \@ifstar{\g@addto@macro\ESO@HookII}{\g@addto@macro\ESO@HookI}}
+\newcommand{\ClearShipoutPicture}{\global\let\ESO@HookI\@empty}
+\newcommand{\@ShipoutPicture}{%
+  \bgroup
+    \@tempswafalse%
+    \ifx\ESO@HookI\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookII\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookIII\@empty\else\@tempswatrue\fi%
+    \if@tempswa%
+      \@tempdima=1in\@tempdimb=-\@tempdima%
+      \advance\@tempdimb\ESO@yoffsetI%
+      \unitlength=1pt%
+      \global\setbox\@cclv\vbox{%
+        \vbox{\let\protect\relax
+          \pictur@(0,0)(\strip@pt\@tempdima,\strip@pt\@tempdimb)%
+            \ESO@HookIII\ESO@HookI\ESO@HookII%
+            \global\let\ESO@HookII\@empty%
+          \endpicture}%
+          \nointerlineskip%
+        \box\@cclv}%
+    \fi
+  \egroup
+}
+\EveryShipoutACL{\@ShipoutPicture}
+\newif\ifESO@dvips\ESO@dvipsfalse
+\newif\ifESO@grid\ESO@gridfalse
+\newif\ifESO@texcoord\ESO@texcoordfalse
+\newcommand*\ESO@griddelta{}\newcommand*\ESO@griddeltaY{}
+\newcommand*\ESO@gridDelta{}\newcommand*\ESO@gridDeltaY{}
+\newcommand*\ESO@yoffsetI{}\newcommand*\ESO@yoffsetII{}
+\ifESO@texcoord
+  \def\ESO@yoffsetI{0pt}\def\ESO@yoffsetII{-\paperheight}
+  \edef\ESO@griddeltaY{-\ESO@griddelta}\edef\ESO@gridDeltaY{-\ESO@gridDelta}
+\else
+  \def\ESO@yoffsetI{\paperheight}\def\ESO@yoffsetII{0pt}
+  \edef\ESO@griddeltaY{\ESO@griddelta}\edef\ESO@gridDeltaY{\ESO@gridDelta}
+\fi
+%% ----- Submitted version markup: Page numbers, ruler, and confidentiality.  Using ideas/code from cvpr.sty 2015. --MM
+\font\aclhv  = phvb at 8pt
+%% Define vruler %%
+%\makeatletter
+\newbox\aclrulerbox
+\newcount\aclrulercount
+\newdimen\aclruleroffset
+\newdimen\cv@lineheight
+\newdimen\cv@boxheight
+\newbox\cv@tmpbox
+\newcount\cv@refno
+\newcount\cv@tot
+% NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
+\newcount\cv@tmpc@ \newcount\cv@tmpc
+\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
+\cv@tmpc=1 %
+\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
+   \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
+\ifnum#2<0\advance\cv@tmpc1\relax-\fi
+\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
+\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\makevruler[#1][#2][#3][#4][#5]{\begingroup\offinterlineskip
+\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
+\global\setbox\aclrulerbox=\vbox to \textheight{%
+{\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
+\color{gray}
+\cv@lineheight=#1\global\aclrulercount=#2%
+\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
+\cv@refno1\vskip-\cv@lineheight\vskip1ex%
+\loop\setbox\cv@tmpbox=\hbox to0cm{{\aclhv\hfil\fillzeros[#4]\aclrulercount}}%
+\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
+\advance\cv@refno1\global\advance\aclrulercount#3\relax
+\ifnum\cv@refno<\cv@tot\repeat}}\endgroup}%
+%\makeatother
+\def\aclpaperid{***}
+\def\confidential{\textcolor{black}{ACL 2020 Submission~\aclpaperid.  Confidential Review Copy.  DO NOT DISTRIBUTE.}}
+%% Page numbering, Vruler and Confidentiality %%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+% SC/KG/WL - changed line numbering to gainsboro
+\definecolor{gainsboro}{rgb}{0.8, 0.8, 0.8}
+%\def\aclruler#1{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}  %% old line
+\def\aclruler#1{\textcolor{gainsboro}{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}}
+\def\leftoffset{-2.1cm} %original: -45pt
+\def\rightoffset{17.5cm} %original: 500pt
+\ifaclfinal\else\pagenumbering{arabic}
+\AddToShipoutPicture{%
+\ifaclfinal\else
+\AtPageLowishCenter{\textcolor{black}{\thepage}}
+\aclruleroffset=\textheight
+\advance\aclruleroffset4pt
+  \AtTextUpperLeft{%
+    \put(\LenToUnit{\leftoffset},\LenToUnit{-\aclruleroffset}){%left ruler
+      \aclruler{\aclrulercount}}
+    \put(\LenToUnit{\rightoffset},\LenToUnit{-\aclruleroffset}){%right ruler
+      \aclruler{\aclrulercount}}
+  }
+  \AtTextUpperLeft{%confidential
+    \put(0,\LenToUnit{1cm}){\parbox{\textwidth}{\centering\aclhv\confidential}}
+  }
+\fi
+}
+%%%% ----- End settings for placing additional items into the submitted version --MM ----- %%%%
+%%%% ----- Begin settings for both submitted and camera-ready version ----- %%%%
+%% Title and Authors %%
+\newcommand\outauthor{
+    \begin{tabular}[t]{c}
+	\ifaclfinal
+	     \bf\@author
+	\else
+		% Avoiding common accidental de-anonymization issue. --MM
+        \bf Anonymous ACL submission
+	\fi
+    \end{tabular}}
+% Changing the expanded titlebox for submissions to 2.5 in (rather than 6.5cm)
+% and moving it to the style sheet, rather than within the example tex file. --MM
+\ifaclfinal
+\else
+	\addtolength\titlebox{.25in}
+\fi
+% Mostly taken from deproc.
+\def\maketitle{\par
+ \begingroup
+   \def\thefootnote{\fnsymbol{footnote}}
+   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}}
+   \twocolumn[\@maketitle] \@thanks
+ \endgroup
+ \setcounter{footnote}{0}
+ \let\maketitle\relax \let\@maketitle\relax
+ \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax}
+\def\@maketitle{\vbox to \titlebox{\hsize\textwidth
+ \linewidth\hsize \vskip 0.125in minus 0.125in \centering
+ {\Large\bf \@title \par} \vskip 0.2in plus 1fil minus 0.1in
+ {\def\and{\unskip\enspace{\rm and}\enspace}%
+  \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil
+           \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}%
+  \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
+          \vskip 0.25in plus 1fil minus 0.125in
+           \hbox to \linewidth\bgroup\large \hfil\hfil
+             \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}
+  \hbox to \linewidth\bgroup\large \hfil\hfil
+    \hbox to 0pt\bgroup\hss
+	\outauthor
+   \hss\egroup
+    \hfil\hfil\egroup}
+  \vskip 0.3in plus 2fil minus 0.1in
+}}
+% margins and font size for abstract
+\renewenvironment{abstract}%
+		 {\centerline{\large\bf Abstract}%
+		  \begin{list}{}%
+		     {\setlength{\rightmargin}{0.6cm}%
+		      \setlength{\leftmargin}{0.6cm}}%
+		   \item[]\ignorespaces%
+		   \@setsize\normalsize{12pt}\xpt\@xpt
+		   }%
+		 {\unskip\end{list}}
+%\renewenvironment{abstract}{\centerline{\large\bf
+% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
+% Resizing figure and table captions - SL
+\newcommand{\figcapfont}{\rm}
+\newcommand{\tabcapfont}{\rm}
+\renewcommand{\fnum@figure}{\figcapfont Figure \thefigure}
+\renewcommand{\fnum@table}{\tabcapfont Table \thetable}
+\renewcommand{\figcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+\renewcommand{\tabcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+% Support for interacting with the caption, subfigure, and subcaption packages - SL
+\usepackage{caption}
+\DeclareCaptionFont{10pt}{\fontsize{10pt}{12pt}\selectfont}
+\captionsetup{font=10pt}
+\RequirePackage{natbib}
+% for citation commands in the .tex, authors can use:
+% \citep, \citet, and \citeyearpar for compatibility with natbib, or
+% \cite, \newcite, and \shortcite for compatibility with older ACL .sty files
+\renewcommand\cite{\citep}	% to get "(Author Year)" with natbib
+\newcommand\shortcite{\citeyearpar}% to get "(Year)" with natbib
+\newcommand\newcite{\citet}	% to get "Author (Year)" with natbib
+% DK/IV: Workaround for annoying hyperref pagewrap bug
+% \RequirePackage{etoolbox}
+% \patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{\errmessage{\noexpand patch failed}}
+% bibliography
+\def\@up#1{\raise.2ex\hbox{#1}}
+% Don't put a label in the bibliography at all.  Just use the unlabeled format
+% instead.
+\def\thebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{References\@mkboth
+ {References}{References}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthebibliography=\endlist
+% Allow for a bibliography of sources of attested examples
+\def\thesourcebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{Sources of Attested Examples\@mkboth
+ {Sources of Attested Examples}{Sources of Attested Examples}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthesourcebibliography=\endlist
+% sections with less space
+\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus
+    -0.5ex minus -.2ex}{1.5ex plus 0.3ex minus .2ex}{\large\bf\raggedright}}
+\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus
+    -0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}}
+%% changed by KO to - values to get teh initial parindent right
+\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex plus
+   -0.5ex minus -.2ex}{0.5ex plus .2ex}{\normalsize\bf\raggedright}}
+\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+\def\subparagraph{\@startsection{subparagraph}{5}{\parindent}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+% Footnotes
+\footnotesep 6.65pt %
+\skip\footins 9pt plus 4pt minus 2pt
+\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt }
+\setcounter{footnote}{0}
+% Lists and paragraphs
+\parindent 1em
+\topsep 4pt plus 1pt minus 2pt
+\partopsep 1pt plus 0.5pt minus 0.5pt
+\itemsep 2pt plus 1pt minus 0.5pt
+\parsep 2pt plus 1pt minus 0.5pt
+\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
+\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em
+\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
+\def\@listi{\leftmargin\leftmargini}
+\def\@listii{\leftmargin\leftmarginii
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
+   \topsep 2pt plus 1pt minus 0.5pt
+   \parsep 1pt plus 0.5pt minus 0.5pt
+   \itemsep \parsep}
+\def\@listiii{\leftmargin\leftmarginiii
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
+    \topsep 1pt plus 0.5pt minus 0.5pt
+    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
+    \itemsep \topsep}
+\def\@listiv{\leftmargin\leftmarginiv
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
+\def\@listv{\leftmargin\leftmarginv
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
+\def\@listvi{\leftmargin\leftmarginvi
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
+\abovedisplayskip 7pt plus2pt minus5pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip  0pt plus3pt%
+\belowdisplayshortskip  4pt plus3pt minus3pt%
+% Less leading in most fonts (due to the narrow columns)
+% The choices were between 1-pt and 1.5-pt leading
+\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
+\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
+\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
+\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
+\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
+\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
+\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
+\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
+\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
+\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}

references/2019.arxiv.conneau/source/XLMR Paper/acl_natbib.bst ADDED Viewed

	@@ -0,0 +1,1975 @@

+%%% acl_natbib.bst
+%%% Modification of BibTeX style file acl_natbib_nourl.bst
+%%% ... by urlbst, version 0.7 (marked with "% urlbst")
+%%% See <http://purl.org/nxg/dist/urlbst>
+%%% Added webpage entry type, and url and lastchecked fields.
+%%% Added eprint support.
+%%% Added DOI support.
+%%% Added PUBMED support.
+%%% Added hyperref support.
+%%% Original headers follow...
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file acl_natbib_nourl.bst
+%
+% intended as input to urlbst script
+% $ ./urlbst --hyperref --inlinelinks acl_natbib_nourl.bst > acl_natbib.bst
+%
+% adapted from compling.bst
+% in order to mimic the style files for ACL conferences prior to 2017
+% by making the following three changes:
+% - for @incollection, page numbers now follow volume title.
+% - for @inproceedings, address now follows conference name.
+%	(address is intended as location of conference,
+%	 not address of publisher.)
+% - for papers with three authors, use et al. in citation
+% Dan Gildea 2017/06/08
+% - fixed a bug with format.chapter - error given if chapter is empty
+%   with inbook.
+% Shay Cohen 2018/02/16
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file compling.bst
+%
+% Intended for the journal Computational Linguistics (ACL/MIT Press)
+% Created by Ron Artstein on 2005/08/22
+% For use with <natbib.sty> for author-year citations.
+%
+% I created this file in order to allow submissions to the journal
+% Computational Linguistics using the <natbib> package for author-year
+% citations, which offers a lot more flexibility than <fullname>, CL's
+% official citation package. This file adheres strictly to the official
+% style guide available from the MIT Press:
+%
+% http://mitpress.mit.edu/journals/coli/compling_style.pdf
+%
+% This includes all the various quirks of the style guide, for example:
+% - a chapter from a monograph (@inbook) has no page numbers.
+% - an article from an edited volume (@incollection) has page numbers
+%   after the publisher and address.
+% - an article from a proceedings volume (@inproceedings) has page
+%   numbers before the publisher and address.
+%
+% Where the style guide was inconsistent or not specific enough I
+% looked at actual published articles and exercised my own judgment.
+% I noticed two inconsistencies in the style guide:
+%
+% - The style guide gives one example of an article from an edited
+%   volume with the editor's name spelled out in full, and another
+%   with the editors' names abbreviated. I chose to accept the first
+%   one as correct, since the style guide generally shuns abbreviations,
+%   and editors' names are also spelled out in some recently published
+%   articles.
+%
+% - The style guide gives one example of a reference where the word
+%   "and" between two authors is preceded by a comma. This is most
+%   likely a typo, since in all other cases with just two authors or
+%   editors there is no comma before the word "and".
+%
+% One case where the style guide is not being specific is the placement
+% of the edition number, for which no example is given. I chose to put
+% it immediately after the title, which I (subjectively) find natural,
+% and is also the place of the edition in a few recently published
+% articles.
+%
+% This file correctly reproduces all of the examples in the official
+% style guide, except for the two inconsistencies noted above. I even
+% managed to get it to correctly format the proceedings example which
+% has an organization, a publisher, and two addresses (the conference
+% location and the publisher's address), though I cheated a bit by
+% putting the conference location and month as part of the title field;
+% I feel that in this case the conference location and month can be
+% considered as part of the title, and that adding a location field
+% is not justified. Note also that a location field is not standard,
+% so entries made with this field would not port nicely to other styles.
+% However, if authors feel that there's a need for a location field
+% then tell me and I'll see what I can do.
+%
+% The file also produces to my satisfaction all the bibliographical
+% entries in my recent (joint) submission to CL (this was the original
+% motivation for creating the file). I also tested it by running it
+% on a larger set of entries and eyeballing the results. There may of
+% course still be errors, especially with combinations of fields that
+% are not that common, or with cross-references (which I seldom use).
+% If you find such errors please write to me.
+%
+% I hope people find this file useful. Please email me with comments
+% and suggestions.
+%
+% Ron Artstein
+% artstein [at] essex.ac.uk
+% August 22, 2005.
+%
+% Some technical notes.
+%
+% This file is based on a file generated with the package <custom-bib>
+% by Patrick W. Daly (see selected options below), which was then
+% manually customized to conform with certain CL requirements which
+% cannot be met by <custom-bib>. Departures from the generated file
+% include:
+%
+% Function inbook: moved publisher and address to the end; moved
+% edition after title; replaced function format.chapter.pages by
+% new function format.chapter to output chapter without pages.
+%
+% Function inproceedings: moved publisher and address to the end;
+% replaced function format.in.ed.booktitle by new function
+% format.in.booktitle to output the proceedings title without
+% the editor.
+%
+% Functions book, incollection, manual: moved edition after title.
+%
+% Function mastersthesis: formatted title as for articles (unlike
+% phdthesis which is formatted as book) and added month.
+%
+% Function proceedings: added new.sentence between organization and
+% publisher when both are present.
+%
+% Function format.lab.names: modified so that it gives all the
+% authors' surnames for in-text citations for one, two and three
+% authors and only uses "et. al" for works with four authors or more
+% (thanks to Ken Shan for convincing me to go through the trouble of
+% modifying this function rather than using unreliable hacks).
+%
+% Changes:
+%
+% 2006-10-27: Changed function reverse.pass so that the extra label is
+% enclosed in parentheses when the year field ends in an uppercase or
+% lowercase letter (change modeled after Uli Sauerland's modification
+% of nals.bst). RA.
+%
+%
+% The preamble of the generated file begins below:
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%
+%% This is file `compling.bst',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% merlin.mbs  (with options: `ay,nat,vonx,nm-revv1,jnrlst,keyxyr,blkyear,dt-beg,yr-per,note-yr,num-xser,pre-pub,xedn,nfss')
+%% ----------------------------------------
+%% *** Intended for the journal Computational Linguistics ***
+%%
+%% Copyright 1994-2002 Patrick W Daly
+ % ===============================================================
+ % IMPORTANT NOTICE:
+ % This bibliographic style (bst) file has been generated from one or
+ % more master bibliographic style (mbs) files, listed above.
+ %
+ % This generated file can be redistributed and/or modified under the terms
+ % of the LaTeX Project Public License Distributed from CTAN
+ % archives in directory macros/latex/base/lppl.txt; either
+ % version 1 of the License, or any later version.
+ % ===============================================================
+ % Name and version information of the main mbs file:
+ % \ProvidesFile{merlin.mbs}[2002/10/21 4.05 (PWD, AO, DPC)]
+ %   For use with BibTeX version 0.99a or later
+ %-------------------------------------------------------------------
+ % This bibliography style file is intended for texts in ENGLISH
+ % This is an author-year citation style bibliography. As such, it is
+ % non-standard LaTeX, and requires a special package file to function properly.
+ % Such a package is    natbib.sty   by Patrick W. Daly
+ % The form of the \bibitem entries is
+ %   \bibitem[Jones et al.(1990)]{key}...
+ %   \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}...
+ % The essential feature is that the label (the part in brackets) consists
+ % of the author names, as they should appear in the citation, with the year
+ % in parentheses following. There must be no space before the opening
+ % parenthesis!
+ % With natbib v5.3, a full list of authors may also follow the year.
+ % In natbib.sty, it is possible to define the type of enclosures that is
+ % really wanted (brackets or parentheses), but in either case, there must
+ % be parentheses in the label.
+ % The \cite command functions as follows:
+ %   \citet{key} ==>>                Jones et al. (1990)
+ %   \citet*{key} ==>>               Jones, Baker, and Smith (1990)
+ %   \citep{key} ==>>                (Jones et al., 1990)
+ %   \citep*{key} ==>>               (Jones, Baker, and Smith, 1990)
+ %   \citep[chap. 2]{key} ==>>       (Jones et al., 1990, chap. 2)
+ %   \citep[e.g.][]{key} ==>>        (e.g. Jones et al., 1990)
+ %   \citep[e.g.][p. 32]{key} ==>>   (e.g. Jones et al., p. 32)
+ %   \citeauthor{key} ==>>           Jones et al.
+ %   \citeauthor*{key} ==>>          Jones, Baker, and Smith
+ %   \citeyear{key} ==>>             1990
+ %---------------------------------------------------------------------
+ENTRY
+  { address
+    author
+    booktitle
+    chapter
+    edition
+    editor
+    howpublished
+    institution
+    journal
+    key
+    month
+    note
+    number
+    organization
+    pages
+    publisher
+    school
+    series
+    title
+    type
+    volume
+    year
+    eprint % urlbst
+    doi % urlbst
+    pubmed % urlbst
+    url % urlbst
+    lastchecked % urlbst
+  }
+  {}
+  { label extra.label sort.label short.list }
+INTEGERS { output.state before.all mid.sentence after.sentence after.block }
+% urlbst...
+% urlbst constants and state variables
+STRINGS { urlintro
+  eprinturl eprintprefix doiprefix doiurl pubmedprefix pubmedurl
+  citedstring onlinestring linktextstring
+  openinlinelink closeinlinelink }
+INTEGERS { hrefform inlinelinks makeinlinelink
+  addeprints adddoiresolver addpubmedresolver }
+FUNCTION {init.urlbst.variables}
+{
+  % The following constants may be adjusted by hand, if desired
+  % The first set allow you to enable or disable certain functionality.
+  #1 'addeprints :=         % 0=no eprints; 1=include eprints
+  #1 'adddoiresolver :=     % 0=no DOI resolver; 1=include it
+  #1 'addpubmedresolver :=     % 0=no PUBMED resolver; 1=include it
+  #2 'hrefform :=           % 0=no crossrefs; 1=hypertex xrefs; 2=hyperref refs
+  #1 'inlinelinks :=        % 0=URLs explicit; 1=URLs attached to titles
+  % String constants, which you _might_ want to tweak.
+  "URL: " 'urlintro := % prefix before URL; typically "Available from:" or "URL":
+  "online" 'onlinestring := % indication that resource is online; typically "online"
+  "cited " 'citedstring := % indicator of citation date; typically "cited "
+  "[link]" 'linktextstring := % dummy link text; typically "[link]"
+  "http://arxiv.org/abs/" 'eprinturl := % prefix to make URL from eprint ref
+  "arXiv:" 'eprintprefix := % text prefix printed before eprint ref; typically "arXiv:"
+  "https://doi.org/" 'doiurl := % prefix to make URL from DOI
+  "doi:" 'doiprefix :=      % text prefix printed before DOI ref; typically "doi:"
+  "http://www.ncbi.nlm.nih.gov/pubmed/" 'pubmedurl := % prefix to make URL from PUBMED
+  "PMID:" 'pubmedprefix :=      % text prefix printed before PUBMED ref; typically "PMID:"
+  % The following are internal state variables, not configuration constants,
+  % so they shouldn't be fiddled with.
+  #0 'makeinlinelink :=     % state variable managed by possibly.setup.inlinelink
+  "" 'openinlinelink :=     % ditto
+  "" 'closeinlinelink :=    % ditto
+}
+INTEGERS {
+  bracket.state
+  outside.brackets
+  open.brackets
+  within.brackets
+  close.brackets
+}
+% ...urlbst to here
+FUNCTION {init.state.consts}
+{ #0 'outside.brackets := % urlbst...
+  #1 'open.brackets :=
+  #2 'within.brackets :=
+  #3 'close.brackets := % ...urlbst to here
+  #0 'before.all :=
+  #1 'mid.sentence :=
+  #2 'after.sentence :=
+  #3 'after.block :=
+}
+STRINGS { s t}
+% urlbst
+FUNCTION {output.nonnull.original}
+{ 's :=
+  output.state mid.sentence =
+    { ", " * write$ }
+    { output.state after.block =
+        { add.period$ write$
+          newline$
+          "\newblock " write$
+        }
+        { output.state before.all =
+            'write$
+            { add.period$ " " * write$ }
+          if$
+        }
+      if$
+      mid.sentence 'output.state :=
+    }
+  if$
+  s
+}
+% urlbst...
+% The following three functions are for handling inlinelink.  They wrap
+% a block of text which is potentially output with write$ by multiple
+% other functions, so we don't know the content a priori.
+% They communicate between each other using the variables makeinlinelink
+% (which is true if a link should be made), and closeinlinelink (which holds
+% the string which should close any current link.  They can be called
+% at any time, but start.inlinelink will be a no-op unless something has
+% previously set makeinlinelink true, and the two ...end.inlinelink functions
+% will only do their stuff if start.inlinelink has previously set
+% closeinlinelink to be non-empty.
+% (thanks to 'ijvm' for suggested code here)
+FUNCTION {uand}
+{ 'skip$ { pop$ #0 } if$ } % 'and' (which isn't defined at this point in the file)
+FUNCTION {possibly.setup.inlinelink}
+{ makeinlinelink hrefform #0 > uand
+    { doi empty$ adddoiresolver uand
+        { pubmed empty$ addpubmedresolver uand
+            { eprint empty$ addeprints uand
+                { url empty$
+                    { "" }
+                    { url }
+                  if$ }
+                { eprinturl eprint * }
+              if$ }
+            { pubmedurl pubmed * }
+          if$ }
+        { doiurl doi * }
+      if$
+      % an appropriately-formatted URL is now on the stack
+      hrefform #1 = % hypertex
+        { "\special {html:<a href=" quote$ * swap$ * quote$ * "> }{" * 'openinlinelink :=
+          "\special {html:</a>}" 'closeinlinelink := }
+        { "\href {" swap$ * "} {" * 'openinlinelink := % hrefform=#2 -- hyperref
+          % the space between "} {" matters: a URL of just the right length can cause "\% newline em"
+          "}" 'closeinlinelink := }
+      if$
+      #0 'makeinlinelink :=
+      }
+    'skip$
+  if$ % makeinlinelink
+}
+FUNCTION {add.inlinelink}
+{ openinlinelink empty$
+    'skip$
+    { openinlinelink swap$ * closeinlinelink *
+      "" 'openinlinelink :=
+      }
+  if$
+}
+FUNCTION {output.nonnull}
+{ % Save the thing we've been asked to output
+  's :=
+  % If the bracket-state is close.brackets, then add a close-bracket to
+  % what is currently at the top of the stack, and set bracket.state
+  % to outside.brackets
+  bracket.state close.brackets =
+    { "]" *
+      outside.brackets 'bracket.state :=
+    }
+    'skip$
+  if$
+  bracket.state outside.brackets =
+    { % We're outside all brackets -- this is the normal situation.
+      % Write out what's currently at the top of the stack, using the
+      % original output.nonnull function.
+      s
+      add.inlinelink
+      output.nonnull.original % invoke the original output.nonnull
+    }
+    { % Still in brackets.  Add open-bracket or (continuation) comma, add the
+      % new text (in s) to the top of the stack, and move to the close-brackets
+      % state, ready for next time (unless inbrackets resets it).  If we come
+      % into this branch, then output.state is carefully undisturbed.
+      bracket.state open.brackets =
+        { " [" * }
+        { ", " * } % bracket.state will be within.brackets
+      if$
+      s *
+      close.brackets 'bracket.state :=
+    }
+  if$
+}
+% Call this function just before adding something which should be presented in
+% brackets.  bracket.state is handled specially within output.nonnull.
+FUNCTION {inbrackets}
+{ bracket.state close.brackets =
+    { within.brackets 'bracket.state := } % reset the state: not open nor closed
+    { open.brackets 'bracket.state := }
+  if$
+}
+FUNCTION {format.lastchecked}
+{ lastchecked empty$
+    { "" }
+    { inbrackets citedstring lastchecked * }
+  if$
+}
+% ...urlbst to here
+FUNCTION {output}
+{ duplicate$ empty$
+    'pop$
+    'output.nonnull
+  if$
+}
+FUNCTION {output.check}
+{ 't :=
+  duplicate$ empty$
+    { pop$ "empty " t * " in " * cite$ * warning$ }
+    'output.nonnull
+  if$
+}
+FUNCTION {fin.entry.original} % urlbst (renamed from fin.entry, so it can be wrapped below)
+{ add.period$
+  write$
+  newline$
+}
+FUNCTION {new.block}
+{ output.state before.all =
+    'skip$
+    { after.block 'output.state := }
+  if$
+}
+FUNCTION {new.sentence}
+{ output.state after.block =
+    'skip$
+    { output.state before.all =
+        'skip$
+        { after.sentence 'output.state := }
+      if$
+    }
+  if$
+}
+FUNCTION {add.blank}
+{  " " * before.all 'output.state :=
+}
+FUNCTION {date.block}
+{
+  new.block
+}
+FUNCTION {not}
+{   { #0 }
+    { #1 }
+  if$
+}
+FUNCTION {and}
+{   'skip$
+    { pop$ #0 }
+  if$
+}
+FUNCTION {or}
+{   { pop$ #1 }
+    'skip$
+  if$
+}
+FUNCTION {new.block.checkb}
+{ empty$
+  swap$ empty$
+  and
+    'skip$
+    'new.block
+  if$
+}
+FUNCTION {field.or.null}
+{ duplicate$ empty$
+    { pop$ "" }
+    'skip$
+  if$
+}
+FUNCTION {emphasize}
+{ duplicate$ empty$
+    { pop$ "" }
+    { "\emph{" swap$ * "}" * }
+  if$
+}
+FUNCTION {tie.or.space.prefix}
+{ duplicate$ text.length$ #3 <
+    { "~" }
+    { " " }
+  if$
+  swap$
+}
+FUNCTION {capitalize}
+{ "u" change.case$ "t" change.case$ }
+FUNCTION {space.word}
+{ " " swap$ * " " * }
+ % Here are the language-specific definitions for explicit words.
+ % Each function has a name bbl.xxx where xxx is the English word.
+ % The language selected here is ENGLISH
+FUNCTION {bbl.and}
+{ "and"}
+FUNCTION {bbl.etal}
+{ "et~al." }
+FUNCTION {bbl.editors}
+{ "editors" }
+FUNCTION {bbl.editor}
+{ "editor" }
+FUNCTION {bbl.edby}
+{ "edited by" }
+FUNCTION {bbl.edition}
+{ "edition" }
+FUNCTION {bbl.volume}
+{ "volume" }
+FUNCTION {bbl.of}
+{ "of" }
+FUNCTION {bbl.number}
+{ "number" }
+FUNCTION {bbl.nr}
+{ "no." }
+FUNCTION {bbl.in}
+{ "in" }
+FUNCTION {bbl.pages}
+{ "pages" }
+FUNCTION {bbl.page}
+{ "page" }
+FUNCTION {bbl.chapter}
+{ "chapter" }
+FUNCTION {bbl.techrep}
+{ "Technical Report" }
+FUNCTION {bbl.mthesis}
+{ "Master's thesis" }
+FUNCTION {bbl.phdthesis}
+{ "Ph.D. thesis" }
+MACRO {jan} {"January"}
+MACRO {feb} {"February"}
+MACRO {mar} {"March"}
+MACRO {apr} {"April"}
+MACRO {may} {"May"}
+MACRO {jun} {"June"}
+MACRO {jul} {"July"}
+MACRO {aug} {"August"}
+MACRO {sep} {"September"}
+MACRO {oct} {"October"}
+MACRO {nov} {"November"}
+MACRO {dec} {"December"}
+MACRO {acmcs} {"ACM Computing Surveys"}
+MACRO {acta} {"Acta Informatica"}
+MACRO {cacm} {"Communications of the ACM"}
+MACRO {ibmjrd} {"IBM Journal of Research and Development"}
+MACRO {ibmsj} {"IBM Systems Journal"}
+MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
+MACRO {ieeetc} {"IEEE Transactions on Computers"}
+MACRO {ieeetcad}
+ {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
+MACRO {ipl} {"Information Processing Letters"}
+MACRO {jacm} {"Journal of the ACM"}
+MACRO {jcss} {"Journal of Computer and System Sciences"}
+MACRO {scp} {"Science of Computer Programming"}
+MACRO {sicomp} {"SIAM Journal on Computing"}
+MACRO {tocs} {"ACM Transactions on Computer Systems"}
+MACRO {tods} {"ACM Transactions on Database Systems"}
+MACRO {tog} {"ACM Transactions on Graphics"}
+MACRO {toms} {"ACM Transactions on Mathematical Software"}
+MACRO {toois} {"ACM Transactions on Office Information Systems"}
+MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
+MACRO {tcs} {"Theoretical Computer Science"}
+FUNCTION {bibinfo.check}
+{ swap$
+  duplicate$ missing$
+    {
+      pop$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ pop$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {bibinfo.warn}
+{ swap$
+  duplicate$ missing$
+    {
+      swap$ "missing " swap$ * " in " * cite$ * warning$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ "empty " swap$ * " in " * cite$ * warning$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+STRINGS  { bibinfo}
+INTEGERS { nameptr namesleft numnames }
+FUNCTION {format.names}
+{ 'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      duplicate$ #1 >
+        { "{ff~}{vv~}{ll}{, jj}" }
+        { "{ff~}{vv~}{ll}{, jj}" }	% first name first for first author
+%        { "{vv~}{ll}{, ff}{, jj}" }	% last name first for first author
+      if$
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.names.ed}
+{
+  'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ff~}{vv~}{ll}{, jj}"
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.key}
+{ empty$
+    { key field.or.null }
+    { "" }
+  if$
+}
+FUNCTION {format.authors}
+{ author "author" format.names
+}
+FUNCTION {get.bbl.editor}
+{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ }
+FUNCTION {format.editors}
+{ editor "editor" format.names duplicate$ empty$ 'skip$
+    {
+      "," *
+      " " *
+      get.bbl.editor
+      *
+    }
+  if$
+}
+FUNCTION {format.note}
+{
+ note empty$
+    { "" }
+    { note #1 #1 substring$
+      duplicate$ "{" =
+        'skip$
+        { output.state mid.sentence =
+          { "l" }
+          { "u" }
+        if$
+        change.case$
+        }
+      if$
+      note #2 global.max$ substring$ * "note" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.title}
+{ title
+  duplicate$ empty$ 'skip$
+    { "t" change.case$ }
+  if$
+  "title" bibinfo.check
+}
+FUNCTION {format.full.names}
+{'s :=
+ "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv~}{ll}" format.name$
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  numnames #2 >
+                    { "," * }
+                    'skip$
+                  if$
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {author.editor.key.full}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.full.names }
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {author.key.full}
+{ author empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {editor.key.full}
+{ editor empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { editor format.full.names }
+  if$
+}
+FUNCTION {make.full.names}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.full
+    { type$ "proceedings" =
+        'editor.key.full
+        'author.key.full
+      if$
+    }
+  if$
+}
+FUNCTION {output.bibitem.original} % urlbst (renamed from output.bibitem, so it can be wrapped below)
+{ newline$
+  "\bibitem[{" write$
+  label write$
+  ")" make.full.names duplicate$ short.list =
+     { pop$ }
+     { * }
+   if$
+  "}]{" * write$
+  cite$ write$
+  "}" write$
+  newline$
+  ""
+  before.all 'output.state :=
+}
+FUNCTION {n.dashify}
+{
+  't :=
+  ""
+    { t empty$ not }
+    { t #1 #1 substring$ "-" =
+        { t #1 #2 substring$ "--" = not
+            { "--" *
+              t #2 global.max$ substring$ 't :=
+            }
+            {   { t #1 #1 substring$ "-" = }
+                { "-" *
+                  t #2 global.max$ substring$ 't :=
+                }
+              while$
+            }
+          if$
+        }
+        { t #1 #1 substring$ *
+          t #2 global.max$ substring$ 't :=
+        }
+      if$
+    }
+  while$
+}
+FUNCTION {word.in}
+{ bbl.in capitalize
+  " " * }
+FUNCTION {format.date}
+{ year "year" bibinfo.check duplicate$ empty$
+    {
+    }
+    'skip$
+  if$
+  extra.label *
+  before.all 'output.state :=
+  after.sentence 'output.state :=
+}
+FUNCTION {format.btitle}
+{ title "title" bibinfo.check
+  duplicate$ empty$ 'skip$
+    {
+      emphasize
+    }
+  if$
+}
+FUNCTION {either.or.check}
+{ empty$
+    'pop$
+    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
+  if$
+}
+FUNCTION {format.bvolume}
+{ volume empty$
+    { "" }
+    { bbl.volume volume tie.or.space.prefix
+      "volume" bibinfo.check * *
+      series "series" bibinfo.check
+      duplicate$ empty$ 'pop$
+        { swap$ bbl.of space.word * swap$
+          emphasize * }
+      if$
+      "volume and number" number either.or.check
+    }
+  if$
+}
+FUNCTION {format.number.series}
+{ volume empty$
+    { number empty$
+        { series field.or.null }
+        { series empty$
+            { number "number" bibinfo.check }
+        { output.state mid.sentence =
+            { bbl.number }
+            { bbl.number capitalize }
+          if$
+          number tie.or.space.prefix "number" bibinfo.check * *
+          bbl.in space.word *
+          series "series" bibinfo.check *
+        }
+      if$
+    }
+      if$
+    }
+    { "" }
+  if$
+}
+FUNCTION {format.edition}
+{ edition duplicate$ empty$ 'skip$
+    {
+      output.state mid.sentence =
+        { "l" }
+        { "t" }
+      if$ change.case$
+      "edition" bibinfo.check
+      " " * bbl.edition *
+    }
+  if$
+}
+INTEGERS { multiresult }
+FUNCTION {multi.page.check}
+{ 't :=
+  #0 'multiresult :=
+    { multiresult not
+      t empty$ not
+      and
+    }
+    { t #1 #1 substring$
+      duplicate$ "-" =
+      swap$ duplicate$ "," =
+      swap$ "+" =
+      or or
+        { #1 'multiresult := }
+        { t #2 global.max$ substring$ 't := }
+      if$
+    }
+  while$
+  multiresult
+}
+FUNCTION {format.pages}
+{ pages duplicate$ empty$ 'skip$
+    { duplicate$ multi.page.check
+        {
+          bbl.pages swap$
+          n.dashify
+        }
+        {
+          bbl.page swap$
+        }
+      if$
+      tie.or.space.prefix
+      "pages" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.journal.pages}
+{ pages duplicate$ empty$ 'pop$
+    { swap$ duplicate$ empty$
+        { pop$ pop$ format.pages }
+        {
+          ":" *
+          swap$
+          n.dashify
+          "pages" bibinfo.check
+          *
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.vol.num.pages}
+{ volume field.or.null
+  duplicate$ empty$ 'skip$
+    {
+      "volume" bibinfo.check
+    }
+  if$
+  number "number" bibinfo.check duplicate$ empty$ 'skip$
+    {
+      swap$ duplicate$ empty$
+        { "there's a number but no volume in " cite$ * warning$ }
+        'skip$
+      if$
+      swap$
+      "(" swap$ * ")" *
+    }
+  if$ *
+  format.journal.pages
+}
+FUNCTION {format.chapter}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.chapter.pages}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+      pages empty$
+        'skip$
+        { ", " * format.pages * }
+      if$
+    }
+  if$
+}
+FUNCTION {format.booktitle}
+{
+  booktitle "booktitle" bibinfo.check
+  emphasize
+}
+FUNCTION {format.in.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.in.ed.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      editor "editor" format.names.ed duplicate$ empty$ 'pop$
+        {
+          "," *
+          " " *
+          get.bbl.editor
+          ", " *
+          * swap$
+          * }
+      if$
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.thesis.type}
+{ type duplicate$ empty$
+    'pop$
+    { swap$ pop$
+      "t" change.case$ "type" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.tr.number}
+{ number "number" bibinfo.check
+  type duplicate$ empty$
+    { pop$ bbl.techrep }
+    'skip$
+  if$
+  "type" bibinfo.check
+  swap$ duplicate$ empty$
+    { pop$ "t" change.case$ }
+    { tie.or.space.prefix * * }
+  if$
+}
+FUNCTION {format.article.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.book.crossref}
+{ volume duplicate$ empty$
+    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
+      pop$ word.in
+    }
+    { bbl.volume
+      capitalize
+      swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word *
+    }
+  if$
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.incoll.inproc.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.org.or.pub}
+{ 't :=
+  ""
+  address empty$ t empty$ and
+    'skip$
+    {
+      t empty$
+        { address "address" bibinfo.check *
+        }
+        { t *
+          address empty$
+            'skip$
+            { ", " * address "address" bibinfo.check * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.publisher.address}
+{ publisher "publisher" bibinfo.warn format.org.or.pub
+}
+FUNCTION {format.organization.address}
+{ organization "organization" bibinfo.check format.org.or.pub
+}
+% urlbst...
+% Functions for making hypertext links.
+% In all cases, the stack has (link-text href-url)
+%
+% make 'null' specials
+FUNCTION {make.href.null}
+{
+  pop$
+}
+% make hypertex specials
+FUNCTION {make.href.hypertex}
+{
+  "\special {html:<a href=" quote$ *
+  swap$ * quote$ * "> }" * swap$ *
+  "\special {html:</a>}" *
+}
+% make hyperref specials
+FUNCTION {make.href.hyperref}
+{
+  "\href {" swap$ * "} {\path{" * swap$ * "}}" *
+}
+FUNCTION {make.href}
+{ hrefform #2 =
+    'make.href.hyperref      % hrefform = 2
+    { hrefform #1 =
+        'make.href.hypertex  % hrefform = 1
+        'make.href.null      % hrefform = 0 (or anything else)
+      if$
+    }
+  if$
+}
+% If inlinelinks is true, then format.url should be a no-op, since it's
+% (a) redundant, and (b) could end up as a link-within-a-link.
+FUNCTION {format.url}
+{ inlinelinks #1 = url empty$ or
+   { "" }
+   { hrefform #1 =
+       { % special case -- add HyperTeX specials
+         urlintro "\url{" url * "}" * url make.href.hypertex * }
+       { urlintro "\url{" * url * "}" * }
+     if$
+   }
+  if$
+}
+FUNCTION {format.eprint}
+{ eprint empty$
+    { "" }
+    { eprintprefix eprint * eprinturl eprint * make.href }
+  if$
+}
+FUNCTION {format.doi}
+{ doi empty$
+    { "" }
+    { doiprefix doi * doiurl doi * make.href }
+  if$
+}
+FUNCTION {format.pubmed}
+{ pubmed empty$
+    { "" }
+    { pubmedprefix pubmed * pubmedurl pubmed * make.href }
+  if$
+}
+% Output a URL.  We can't use the more normal idiom (something like
+% `format.url output'), because the `inbrackets' within
+% format.lastchecked applies to everything between calls to `output',
+% so that `format.url format.lastchecked * output' ends up with both
+% the URL and the lastchecked in brackets.
+FUNCTION {output.url}
+{ url empty$
+    'skip$
+    { new.block
+      format.url output
+      format.lastchecked output
+    }
+  if$
+}
+FUNCTION {output.web.refs}
+{
+  new.block
+  inlinelinks
+    'skip$ % links were inline -- don't repeat them
+    {
+      output.url
+      addeprints eprint empty$ not and
+        { format.eprint output.nonnull }
+        'skip$
+      if$
+      adddoiresolver doi empty$ not and
+        { format.doi output.nonnull }
+        'skip$
+      if$
+      addpubmedresolver pubmed empty$ not and
+        { format.pubmed output.nonnull }
+        'skip$
+      if$
+    }
+  if$
+}
+% Wrapper for output.bibitem.original.
+% If the URL field is not empty, set makeinlinelink to be true,
+% so that an inline link will be started at the next opportunity
+FUNCTION {output.bibitem}
+{ outside.brackets 'bracket.state :=
+  output.bibitem.original
+  inlinelinks url empty$ not doi empty$ not or pubmed empty$ not or eprint empty$ not or and
+    { #1 'makeinlinelink := }
+    { #0 'makeinlinelink := }
+  if$
+}
+% Wrapper for fin.entry.original
+FUNCTION {fin.entry}
+{ output.web.refs  % urlbst
+  makeinlinelink       % ooops, it appears we didn't have a title for inlinelink
+    { possibly.setup.inlinelink % add some artificial link text here, as a fallback
+      linktextstring output.nonnull }
+    'skip$
+  if$
+  bracket.state close.brackets = % urlbst
+    { "]" * }
+    'skip$
+  if$
+  fin.entry.original
+}
+% Webpage entry type.
+% Title and url fields required;
+% author, note, year, month, and lastchecked fields optional
+% See references
+%   ISO 690-2 http://www.nlc-bnc.ca/iso/tc46sc9/standard/690-2e.htm
+%   http://www.classroom.net/classroom/CitingNetResources.html
+%   http://neal.ctstateu.edu/history/cite.html
+%   http://www.cas.usf.edu/english/walker/mla.html
+% for citation formats for web pages.
+FUNCTION {webpage}
+{ output.bibitem
+  author empty$
+    { editor empty$
+        'skip$  % author and editor both optional
+        { format.editors output.nonnull }
+      if$
+    }
+    { editor empty$
+        { format.authors output.nonnull }
+        { "can't use both author and editor fields in " cite$ * warning$ }
+      if$
+    }
+  if$
+  new.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$
+  format.title "title" output.check
+  inbrackets onlinestring output
+  new.block
+  year empty$
+    'skip$
+    { format.date "year" output.check }
+  if$
+  % We don't need to output the URL details ('lastchecked' and 'url'),
+  % because fin.entry does that for us, using output.web.refs.  The only
+  % reason we would want to put them here is if we were to decide that
+  % they should go in front of the rather miscellaneous information in 'note'.
+  new.block
+  note output
+  fin.entry
+}
+% ...urlbst to here
+FUNCTION {article}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    {
+      journal
+      "journal" bibinfo.check
+      emphasize
+      "journal" output.check
+      possibly.setup.inlinelink format.vol.num.pages output% urlbst
+    }
+    { format.article.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {book}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    { format.bvolume output
+      new.block
+      format.number.series output
+      new.sentence
+      format.publisher.address output
+    }
+    {
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {booklet}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inbook}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    {
+      format.bvolume output
+      format.number.series output
+      format.chapter "chapter" output.check
+      new.sentence
+      format.publisher.address output
+      new.block
+    }
+    {
+      format.chapter "chapter" output.check
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {incollection}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.ed.booktitle "booktitle" output.check
+      format.edition output
+      format.bvolume output
+      format.number.series output
+      format.chapter.pages output
+      new.sentence
+      format.publisher.address output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.chapter.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inproceedings}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.booktitle "booktitle" output.check
+      format.bvolume output
+      format.number.series output
+      format.pages output
+      address "address" bibinfo.check output
+      new.sentence
+      organization "organization" bibinfo.check output
+      publisher "publisher" bibinfo.check output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {conference} { inproceedings }
+FUNCTION {manual}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  organization address new.block.checkb
+  organization "organization" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {mastersthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  bbl.mthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  month "month" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {misc}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title output
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {phdthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle
+  "title" output.check
+  new.block
+  bbl.phdthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {proceedings}
+{ output.bibitem
+  format.editors output
+  editor format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.bvolume output
+  format.number.series output
+  new.sentence
+  publisher empty$
+    { format.organization.address output }
+    { organization "organization" bibinfo.check output
+      new.sentence
+      format.publisher.address output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {techreport}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  format.tr.number output.nonnull
+  institution "institution" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {unpublished}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  format.note "note" output.check
+  fin.entry
+}
+FUNCTION {default.type} { misc }
+READ
+FUNCTION {sortify}
+{ purify$
+  "l" change.case$
+}
+INTEGERS { len }
+FUNCTION {chop.word}
+{ 's :=
+  'len :=
+  s #1 len substring$ =
+    { s len #1 + global.max$ substring$ }
+    's
+  if$
+}
+FUNCTION {format.lab.names}
+{ 's :=
+  "" 't :=
+  s #1 "{vv~}{ll}" format.name$
+  s num.names$ duplicate$
+  #2 >
+    { pop$
+      " " * bbl.etal *
+    }
+    { #2 <
+        'skip$
+        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
+            {
+              " " * bbl.etal *
+            }
+            { bbl.and space.word * s #2 "{vv~}{ll}" format.name$
+              * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {author.key.label}
+{ author empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {author.editor.key.label}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.lab.names }
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {editor.key.label}
+{ editor empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { editor format.lab.names }
+  if$
+}
+FUNCTION {calc.short.authors}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.label
+    { type$ "proceedings" =
+        'editor.key.label
+        'author.key.label
+      if$
+    }
+  if$
+  'short.list :=
+}
+FUNCTION {calc.label}
+{ calc.short.authors
+  short.list
+  "("
+  *
+  year duplicate$ empty$
+  short.list key field.or.null = or
+     { pop$ "" }
+     'skip$
+  if$
+  *
+  'label :=
+}
+FUNCTION {sort.format.names}
+{ 's :=
+  #1 'nameptr :=
+  ""
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ll{ }}{  ff{ }}{  jj{ }}"
+      format.name$ 't :=
+      nameptr #1 >
+        {
+          "   "  *
+          namesleft #1 = t "others" = and
+            { "zzzzz" * }
+            { t sortify * }
+          if$
+        }
+        { t sortify * }
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {sort.format.title}
+{ 't :=
+  "A " #2
+    "An " #3
+      "The " #4 t chop.word
+    chop.word
+  chop.word
+  sortify
+  #1 global.max$ substring$
+}
+FUNCTION {author.sort}
+{ author empty$
+    { key empty$
+        { "to sort, need author or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {author.editor.sort}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { "to sort, need author, editor, or key in " cite$ * warning$
+              ""
+            }
+            { key sortify }
+          if$
+        }
+        { editor sort.format.names }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {editor.sort}
+{ editor empty$
+    { key empty$
+        { "to sort, need editor or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { editor sort.format.names }
+  if$
+}
+FUNCTION {presort}
+{ calc.label
+  label sortify
+  "    "
+  *
+  type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.sort
+    { type$ "proceedings" =
+        'editor.sort
+        'author.sort
+      if$
+    }
+  if$
+  #1 entry.max$ substring$
+  'sort.label :=
+  sort.label
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {presort}
+SORT
+STRINGS { last.label next.extra }
+INTEGERS { last.extra.num number.label }
+FUNCTION {initialize.extra.label.stuff}
+{ #0 int.to.chr$ 'last.label :=
+  "" 'next.extra :=
+  #0 'last.extra.num :=
+  #0 'number.label :=
+}
+FUNCTION {forward.pass}
+{ last.label label =
+    { last.extra.num #1 + 'last.extra.num :=
+      last.extra.num int.to.chr$ 'extra.label :=
+    }
+    { "a" chr.to.int$ 'last.extra.num :=
+      "" 'extra.label :=
+      label 'last.label :=
+    }
+  if$
+  number.label #1 + 'number.label :=
+}
+FUNCTION {reverse.pass}
+{ next.extra "b" =
+    { "a" 'extra.label := }
+    'skip$
+  if$
+  extra.label 'next.extra :=
+  extra.label
+  duplicate$ empty$
+    'skip$
+    { year field.or.null #-1 #1 substring$ chr.to.int$ #65 <
+      { "{\natexlab{" swap$ * "}}" * }
+      { "{(\natexlab{" swap$ * "})}" * }
+    if$ }
+  if$
+  'extra.label :=
+  label extra.label * 'label :=
+}
+EXECUTE {initialize.extra.label.stuff}
+ITERATE {forward.pass}
+REVERSE {reverse.pass}
+FUNCTION {bib.sort.order}
+{ sort.label
+  "    "
+  *
+  year field.or.null sortify
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {bib.sort.order}
+SORT
+FUNCTION {begin.bib}
+{ preamble$ empty$
+    'skip$
+    { preamble$ write$ newline$ }
+  if$
+  "\begin{thebibliography}{" number.label int.to.str$ * "}" *
+  write$ newline$
+  "\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi"
+  write$ newline$
+}
+EXECUTE {begin.bib}
+EXECUTE {init.urlbst.variables} % urlbst
+EXECUTE {init.state.consts}
+ITERATE {call.type$}
+FUNCTION {end.bib}
+{ newline$
+  "\end{thebibliography}" write$ newline$
+}
+EXECUTE {end.bib}
+%% End of customized bst file
+%%
+%% End of file `compling.bst'.

references/2019.arxiv.conneau/source/XLMR Paper/appendix.tex ADDED Viewed

	@@ -0,0 +1,45 @@

+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{acl2020}
+\usepackage{times}
+\usepackage{latexsym}
+\renewcommand{\UrlFont}{\ttfamily\small}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\usepackage{graphicx}
+\usepackage{subfigure}
+\usepackage{booktabs} % for professional tables
+\usepackage{url}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{array}
+\usepackage{adjustbox}
+\usepackage{multirow}
+% \usepackage{subcaption}
+\usepackage{hyperref}
+\usepackage{longtable}
+\usepackage{bibentry}
+\newcommand{\xlmr}{\textit{XLM-R}\xspace}
+\newcommand{\mbert}{mBERT\xspace}
+\input{content/tables}
+\begin{document}
+\nobibliography{acl2020}
+\bibliographystyle{acl_natbib}
+\appendix
+\onecolumn
+\section*{Supplementary materials}
+\section{Languages and statistics for CC-100 used by \xlmr}
+In this section we present the list of languages in the CC-100 corpus we created for training \xlmr. We also report statistics such as the number of tokens and the size of each monolingual corpus.
+\label{sec:appendix_A}
+\insertDataStatistics
+\newpage
+\section{Model Architectures and Sizes}
+As we showed in section 5, capacity is an important parameter for learning strong cross-lingual representations. In the table below, we list multiple monolingual and multilingual models used by the research community and summarize their architectures and total number of parameters.
+\label{sec:appendix_B}
+\insertParameters
+\end{document}

references/2019.arxiv.conneau/source/XLMR Paper/content/batchsize.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e0c4e1c156379efeba93f0c1a6717bb12ab0b2aa0bdd361a7fda362ff01442e
+size 14673

references/2019.arxiv.conneau/source/XLMR Paper/content/capacity.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:00087aeb1a14190e7800a77cecacb04e8ce1432c029e0276b4d8b02b7ff66edb
+size 16459

references/2019.arxiv.conneau/source/XLMR Paper/content/datasize.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d07fdd658101ef6caf7e2808faa6045ab175315b6435e25ff14ecedac584118
+size 26052

references/2019.arxiv.conneau/source/XLMR Paper/content/dilution.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80d1555811c23e2c521fbb007d84dfddb85e7020cc9333058368d3a1d63e240a
+size 16376

references/2019.arxiv.conneau/source/XLMR Paper/content/langsampling.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2f2f95649a23b0a46f8553f4e0e29000aff1971385b9addf6f478acc5a516a3
+size 15612

references/2019.arxiv.conneau/source/XLMR Paper/content/tables.tex ADDED Viewed

	@@ -0,0 +1,398 @@

+\newcommand{\insertXNLItable}{
+    \begin{table*}[h!]
+        \begin{center}
+            % \scriptsize
+            \resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{l ccc ccccccccccccccc c}
+            \toprule
+            {\bf Model} & {\bf D }& {\bf \#M} & {\bf \#lg} & {\bf en} & {\bf fr} & {\bf es} & {\bf de} & {\bf el} & {\bf bg} & {\bf ru} & {\bf tr} & {\bf ar} & {\bf vi} & {\bf th} & {\bf zh} & {\bf hi} & {\bf sw} & {\bf ur} & {\bf Avg}\\
+                \midrule
+                %\cmidrule(r){1-1}
+                %\cmidrule(lr){2-4}
+                %\cmidrule(lr){5-19}
+                %\cmidrule(l){20-20}
+                \multicolumn{19}{l}{\it Fine-tune multilingual model on English training set (Cross-lingual Transfer)} \\
+                %\midrule
+                \midrule
+                \citet{lample2019cross} & Wiki+MT & N & 15 & 85.0 & 78.7 & 78.9 & 77.8 & 76.6 & 77.4 & 75.3 & 72.5 & 73.1 & 76.1 & 73.2 & 76.5 & 69.6 & 68.4 & 67.3 & 75.1 \\
+                \citet{huang2019unicoder} & Wiki+MT & N & 15 & 85.1 & 79.0 & 79.4 & 77.8 & 77.2 & 77.2 & 76.3 & 72.8 & 73.5 & 76.4 & 73.6 & 76.2 & 69.4 & 69.7 & 66.7 & 75.4 \\
+                %\midrule
+                \citet{devlin2018bert} & Wiki & N & 102 & 82.1 & 73.8 & 74.3 & 71.1 & 66.4 & 68.9 & 69.0 & 61.6 & 64.9 & 69.5 & 55.8 & 69.3 & 60.0 & 50.4 & 58.0 & 66.3  \\
+                \citet{lample2019cross}  & Wiki & N & 100 & 83.7 & 76.2 & 76.6 & 73.7 & 72.4 & 73.0 & 72.1 & 68.1 & 68.4 & 72.0 & 68.2 & 71.5 & 64.5 & 58.0 & 62.4 & 71.3 \\
+                \citet{lample2019cross}  & Wiki & 1 & 100 & 83.2 & 76.7 & 77.7 & 74.0 & 72.7 & 74.1 & 72.7 & 68.7 & 68.6 & 72.9 & 68.9 & 72.5 & 65.6 & 58.2 & 62.4 & 70.7 \\
+                \bf XLM-R\textsubscript{Base}  & CC & 1 & 100 & 85.8 & 79.7 & 80.7 & 78.7 & 77.5 & 79.6 & 78.1 & 74.2 & 73.8 & 76.5 & 74.6 & 76.7 & 72.4 & 66.5 & 68.3 & 76.2 \\
+                \bf XLM-R & CC & 1 & 100 & \bf 89.1 & \bf 84.1 & \bf 85.1 & \bf 83.9 & \bf 82.9 & \bf 84.0 & \bf 81.2 & \bf 79.6 & \bf 79.8 & \bf 80.8 & \bf 78.1 & \bf 80.2 & \bf 76.9 & \bf 73.9 & \bf 73.8 & \bf 80.9 \\
+                \midrule
+                \multicolumn{19}{l}{\it Translate everything to English and use English-only model (TRANSLATE-TEST)} \\
+                \midrule
+                BERT-en & Wiki & 1 & 1 & 88.8 & 81.4 & 82.3 & 80.1 & 80.3 & 80.9 & 76.2 & 76.0 & 75.4 & 72.0 & 71.9 & 75.6 & 70.0 & 65.8 & 65.8 & 76.2 \\
+                RoBERTa & Wiki+CC & 1 & 1 & \underline{\bf 91.3} & 82.9 & 84.3 & 81.2 & 81.7 & 83.1 & 78.3 & 76.8 & 76.6 & 74.2 & 74.1 & 77.5 & 70.9 & 66.7 & 66.8 & 77.8 \\
+                % XLM-en & Wiki & 1 & 1 & 00.0 & 00.0 & 00.0 & 00.0 & 00.0  & 00.0 & 00.0 \\
+                \midrule
+                \multicolumn{19}{l}{\it Fine-tune multilingual model on each training set (TRANSLATE-TRAIN)} \\
+                \midrule
+                \citet{lample2019cross}  & Wiki & N & 100 & 82.9 & 77.6 & 77.9 & 77.9 & 77.1 & 75.7 & 75.5 & 72.6 & 71.2 & 75.8 & 73.1 & 76.2 & 70.4 & 66.5  & 62.4 & 74.2 \\
+                \midrule
+                \multicolumn{19}{l}{\it Fine-tune multilingual model on all training sets (TRANSLATE-TRAIN-ALL)} \\
+                \midrule
+                \citet{lample2019cross}$^{\dagger}$ & Wiki+MT & 1 & 15 & 85.0 & 80.8 & 81.3 & 80.3 & 79.1 & 80.9 & 78.3 & 75.6 & 77.6 & 78.5 & 76.0 & 79.5 & 72.9 & 72.8 & 68.5 & 77.8 \\
+                \citet{huang2019unicoder} & Wiki+MT & 1 & 15 & 85.6 & 81.1 & 82.3 & 80.9 & 79.5 & 81.4 & 79.7 & 76.8 & 78.2 & 77.9 & 77.1 & 80.5 & 73.4 & 73.8 & 69.6 & 78.5 \\
+                %\midrule
+                \citet{lample2019cross}  & Wiki & 1 & 100 & 84.5 & 80.1 & 81.3 & 79.3 & 78.6 & 79.4 & 77.5 & 75.2 & 75.6 & 78.3 & 75.7 & 78.3 & 72.1 & 69.2 & 67.7 & 76.9 \\
+                \bf XLM-R\textsubscript{Base} & CC & 1 & 100 & 85.4 & 81.4 & 82.2 & 80.3 & 80.4 & 81.3 & 79.7 & 78.6 & 77.3 & 79.7 & 77.9 & 80.2 & 76.1 & 73.1 & 73.0 & 79.1 \\
+                \bf XLM-R & CC & 1 & 100 & \bf 89.1 & \underline{\bf 85.1} & \underline{\bf 86.6} & \underline{\bf 85.7} & \underline{\bf 85.3} & \underline{\bf 85.9} & \underline{\bf 83.5} & \underline{\bf 83.2} & \underline{\bf 83.1} & \underline{\bf 83.7} & \underline{\bf 81.5} & \underline{\bf 83.7} & \underline{\bf 81.6} & \underline{\bf 78.0} & \underline{\bf 78.1} & \underline{\bf 83.6} \\
+                \bottomrule
+            \end{tabular}
+            }
+            \caption{\textbf{Results on cross-lingual classification.} We report the accuracy on each of the 15 XNLI languages and the average accuracy. We specify the dataset D used for pretraining, the number of models \#M the approach requires and the number of languages \#lg the model handles. Our \xlmr results are averaged over five different seeds. We show that using the translate-train-all approach which leverages training sets from multiple languages, \xlmr obtains a new state of the art on XNLI of $83.6$\% average accuracy. Results with $^{\dagger}$ are from \citet{huang2019unicoder}. %It also outperforms previous methods on cross-lingual transfer.
+            \label{tab:xnli}}
+        \end{center}
+%       \vspace{-0.4cm}
+    \end{table*}
+}
+% Evolution of performance w.r.t number of languages
+\newcommand{\insertLanguagesize}{
+    \begin{table*}[h!]
+    \begin{minipage}{0.49\textwidth}
+      \includegraphics[scale=0.4]{content/wiki_vs_cc.pdf}
+    \end{minipage}
+        \hfill
+    \begin{minipage}{0.4\textwidth}
+    	\captionof{figure}{\textbf{Distribution of the amount of data (in MB) per language for Wikipedia and CommonCrawl.} The Wikipedia data used in open-source mBERT and XLM is not sufficient for the model to develop an understanding of low-resource languages. The CommonCrawl data we collect alleviates that issue and creates the conditions for a single model to understand text coming from multiple languages. \label{fig:lgs}}
+    \end{minipage}
+%    \vspace{-0.5cm}
+    \end{table*}
+}
+% Evolution of performance w.r.t number of languages
+\newcommand{\insertXLMmorelanguages}{
+    \begin{table*}[h!]
+    \begin{minipage}{0.49\textwidth}
+      \includegraphics[scale=0.4]{content/evolution_languages}
+    \end{minipage}
+        \hfill
+    \begin{minipage}{0.4\textwidth}
+    	\captionof{figure}{\textbf{Evolution of XLM performance on SeqLab, XNLI and GLUE as the number of languages increases.} While there are subtlteties as to what languages lose more accuracy than others as we add more languages, we observe a steady decrease of the overall monolingual and cross-lingual performance. \label{fig:lgsunused}}
+    \end{minipage}
+%    \vspace{-0.5cm}
+    \end{table*}
+}
+\newcommand{\insertMLQA}{
+\begin{table*}[h!]
+    \begin{center}
+        % \scriptsize
+        \resizebox{1\linewidth}{!}{
+        \begin{tabular}[h]{l cc ccccccc c}
+        \toprule
+            {\bf Model} & {\bf train} & {\bf \#lgs} & {\bf en} & {\bf es} & {\bf de} & {\bf ar} & {\bf hi} & {\bf vi} & {\bf zh} & {\bf Avg} \\
+            \midrule
+            BERT-Large$^{\dagger}$ & en & 1 & 80.2 / 67.4 & - & - & - & - & - & - & - \\
+            mBERT$^{\dagger}$ & en & 102 & 77.7 / 65.2 & 64.3 / 46.6 & 57.9 / 44.3 & 45.7 / 29.8 & 43.8 / 29.7 & 57.1 / 38.6 & 57.5 / 37.3 & 57.7 / 41.6 \\
+            XLM-15$^{\dagger}$ & en & 15 & 74.9 / 62.4 & 68.0 / 49.8 & 62.2 / 47.6 & 54.8 / 36.3 & 48.8 / 27.3 & 61.4 / 41.8 & 61.1 / 39.6 & 61.6 / 43.5 \\
+            XLM-R\textsubscript{Base} & en & 100 & 77.1 / 64.6 & 67.4 / 49.6 & 60.9 / 46.7 & 54.9 / 36.6 & 59.4 / 42.9 & 64.5 / 44.7 & 61.8 / 39.3 & 63.7 / 46.3 \\
+            \bf XLM-R & en & 100 & \bf 80.6 / 67.8 & \bf 74.1 / 56.0 & \bf 68.5 / 53.6 & \bf 63.1 / 43.5 & \bf 69.2 / 51.6 & \bf 71.3 / 50.9 & \bf 68.0 / 45.4 & \bf 70.7 / 52.7  \\
+            \bottomrule
+        \end{tabular}
+            }
+            \caption{\textbf{Results on MLQA question answering} We report the F1 and EM (exact match) scores for zero-shot classification where models are fine-tuned on the English Squad dataset and evaluated on the 7 languages of MLQA. Results with $\dagger$ are taken from the original MLQA paper \citet{lewis2019mlqa}.
+            \label{tab:mlqa}}
+    \end{center}
+\end{table*}
+}
+\newcommand{\insertNER}{
+\begin{table}[t]
+    \begin{center}
+        % \scriptsize
+        \resizebox{1\linewidth}{!}{
+        \begin{tabular}[b]{l cc cccc c}
+            \toprule
+            {\bf Model} & {\bf train} & {\bf \#M} & {\bf en} & {\bf nl} & {\bf es} & {\bf de} & {\bf Avg}\\
+            \midrule
+            \citet{lample-etal-2016-neural} & each & N & 90.74 & 81.74 & 85.75 & 78.76 & 84.25 \\
+            \citet{akbik2018coling} & each & N & \bf 93.18 & 90.44 & - & \bf 88.27 & - \\
+            \midrule
+            \multirow{2}{*}{mBERT$^{\dagger}$} & each & N & 91.97 & 90.94 & 87.38 & 82.82 & 88.28\\
+             & en & 1 & 91.97 & 77.57 & 74.96 & 69.56 & 78.52\\
+             \midrule
+            \multirow{3}{*}{XLM-R\textsubscript{Base}} & each & N & 92.25 & 90.39 & 87.99 & 84.60 & 88.81\\
+             & en & 1 & 92.25 & 78.08 & 76.53 & 69.60 & 79.11\\
+             & all & 1 & 91.08 & 89.09 & 87.28 & 83.17 & 87.66 \\
+             \midrule
+            \multirow{3}{*}{\bf XLM-R} & each & N & 92.92 & \bf 92.53 & \bf 89.72 & 85.81 & 90.24\\
+             & en & 1 & 92.92 & 80.80 & 78.64 & 71.40 & 80.94\\
+             & all & 1 & 92.00 & 91.60 & 89.52 & 84.60 & 89.43 \\
+            \bottomrule
+        \end{tabular}
+            }
+            \caption{\textbf{Results on named entity recognition} on CoNLL-2002 and CoNLL-2003 (F1 score). Results with $\dagger$ are from \citet{wu2019beto}. Note that mBERT and \xlmr do not use a linear-chain CRF, as opposed to \citet{akbik2018coling} and \citet{lample-etal-2016-neural}.
+            \label{tab:ner}}
+        \end{center}
+       \vspace{-0.6cm}
+    \end{table}
+}
+\newcommand{\insertAblationone}{
+\begin{table*}[h!]
+	\begin{minipage}[t]{0.3\linewidth}
+	\begin{center}
+          %\includegraphics[width=\linewidth]{content/xlmroberta_transfer_dilution.pdf}
+          \includegraphics{content/dilution}
+        \captionof{figure}{The transfer-interference trade-off: Low-resource languages benefit from scaling to more languages, until dilution (interference) kicks in and degrades overall performance.}
+            \label{fig:transfer_dilution}
+        \vspace{-0.2cm}
+	\end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+	\begin{center}
+          %\includegraphics[width=\linewidth]{content/xlmroberta_evolution.pdf}
+          \includegraphics{content/wikicc}
+        \captionof{figure}{Wikipedia versus CommonCrawl: An XLM-7 obtains significantly better performance when trained on CC, in particular on low-resource languages.}
+            \label{fig:curse}
+	\end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+	\begin{center}
+        % \includegraphics[width=\linewidth]{content/xlmroberta_evolution.pdf}
+          \includegraphics{content/capacity}
+        \captionof{figure}{Adding more capacity to the model alleviates the curse of multilinguality, but remains an issue for models of moderate size.}
+            \label{fig:capacity}
+	\end{center}
+    \end{minipage}
+    \vspace{-0.2cm}
+\end{table*}
+}
+\newcommand{\insertAblationtwo}{
+\begin{table*}[h!]
+	\begin{minipage}[t]{0.3\linewidth}
+        \begin{center}
+          %\includegraphics[width=\columnwidth]{content/xlmroberta_alpha_tradeoff.pdf}
+          \includegraphics{content/langsampling}
+        \captionof{figure}{On the high-resource versus low-resource trade-off: impact of batch language sampling for XLM-100.
+        \label{fig:alpha}}
+        \end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+        \begin{center}
+          %\includegraphics[width=\columnwidth]{content/xlmroberta_vocab.pdf}
+          \includegraphics{content/vocabsize.pdf}
+        \captionof{figure}{On the impact of vocabulary size at fixed capacity and with increasing capacity for XLM-100.
+        \label{fig:vocab}}
+        \end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+        \begin{center}
+          %\includegraphics[width=\columnwidth]{content/xlmroberta_batch_and_tok.pdf}
+          \includegraphics{content/batchsize.pdf}
+        \captionof{figure}{On the impact of large-scale training, and preprocessing simplification from BPE with tokenization to SPM on raw text data.
+        \label{fig:batch}}
+        \end{center}
+    \end{minipage}
+        \vspace{-0.2cm}
+\end{table*}
+}
+% Multilingual vs monolingual
+\newcommand{\insertMultiMono}{
+    \begin{table}[h!]
+        \begin{center}
+            % \scriptsize
+            \resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{l cc ccccccc c}
+            \toprule
+                {\bf Model} & {\bf D } & {\bf \#vocab} & {\bf en} & {\bf fr} & {\bf de} & {\bf ru} & {\bf zh} & {\bf sw} & {\bf ur} & {\bf Avg}\\
+                \midrule
+                \multicolumn{11}{l}{\it Monolingual baselines}\\
+                \midrule
+                \multirow{2}{*}{BERT} & Wiki & 40k & 84.5 & 78.6 & 80.0 & 75.5 & 77.7 & 60.1 & 57.3 & 73.4 \\
+                 & CC & 40k & 86.7 & 81.2 & 81.2 & 78.2 & 79.5 & 70.8 & 65.1 & 77.5 \\
+                \midrule
+                \multicolumn{11}{l}{\it Multilingual models (cross-lingual transfer)}\\
+                \midrule
+                \multirow{2}{*}{XLM-7} & Wiki & 150k & 82.3 & 76.8 & 74.7 & 72.5 & 73.1 & 60.8 & 62.3 & 71.8 \\
+                 & CC & 150k & 85.7 & 78.6 & 79.5 & 76.4 & 74.8 & 71.2 & 66.9 & 76.2 \\
+                \midrule
+                \multicolumn{11}{l}{\it  Multilingual models (translate-train-all)}\\
+                \midrule
+                \multirow{2}{*}{XLM-7} & Wiki & 150k & 84.6 & 80.1 & 80.2 & 75.7 & 78 & 68.7 & 66.7 & 76.3 \\
+                 & CC & 150k & \bf 87.2 & \bf 82.5 & \bf 82.9 & \bf 79.7 & \bf 80.4 & \bf 75.7 & \bf 71.5 & \bf 80.0 \\
+                % \midrule
+                % XLM (sw,ar) & CC & 60k & N & 2-3 & - & - & - & - & - & 00.0 & - & 00.0 \\
+                % XLM (ur,hi,ar) & CC & 60k & N & 2-3 & - & - & - & - & - & - & 00.0 & 00.0 \\
+                \bottomrule
+            \end{tabular}
+            }
+            \caption{\textbf{Multilingual versus monolingual models (BERT-BASE).} We compare the performance of monolingual models (BERT) versus multilingual models (XLM) on seven languages, using a BERT-BASE architecture. We choose a vocabulary size of 40k and 150k for monolingual and multilingual models.
+            \label{tab:multimono}}
+        \end{center}
+       \vspace{-0.4cm}
+    \end{table}
+}
+% GLUE benchmark results
+\newcommand{\insertGlue}{
+    \begin{table}[h!]
+        \begin{center}
+            % \scriptsize
+            \resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{l|c|cccccc|c}
+            \toprule
+                {\bf Model} & {\bf \#lgs} & {\bf MNLI-m/mm} & {\bf QNLI} & {\bf QQP} & {\bf SST} & {\bf MRPC} & {\bf STS-B} & {\bf Avg}\\
+                \midrule
+                BERT\textsubscript{Large}$^{\dagger}$ & 1 & 86.6/- & 92.3 & 91.3 & 93.2 & 88.0 & 90.0 & 90.2 \\
+                XLNet\textsubscript{Large}$^{\dagger}$ & 1 & 89.8/- & 93.9 & 91.8 & 95.6 & 89.2 & 91.8 & 92.0 \\
+                RoBERTa$^{\dagger}$ & 1 & 90.2/90.2 & 94.7 & 92.2 & 96.4 & 90.9 & 92.4 & 92.8 \\
+                XLM-R & 100 & 88.9/89.0 & 93.8 & 92.3 & 95.0 & 89.5 & 91.2 & 91.8 \\
+                \bottomrule
+            \end{tabular}
+            }
+            \caption{\textbf{GLUE dev results.} Results with $^{\dagger}$ are from \citet{roberta2019}.  We compare the performance of \xlmr to BERT\textsubscript{Large}, XLNet and RoBERTa on the English GLUE benchmark.
+            \label{tab:glue}}
+        \end{center}
+       \vspace{-0.4cm}
+    \end{table}
+}
+% Wiki vs CommonCrawl statistics
+\newcommand{\insertWikivsCC}{
+    \begin{table*}[h]
+        \begin{center}
+          %\includegraphics[width=\linewidth]{content/wiki_vs_cc.pdf}
+          \includegraphics{content/datasize.pdf}
+        \captionof{figure}{Amount of data in GiB (log-scale) for the 88 languages that appear in both the Wiki-100 corpus used for mBERT and XLM-100, and the CC-100 used for XLM-R. CC-100 increases the amount of data by several orders of magnitude, in particular for low-resource languages.
+        \label{fig:wikivscc}}
+        \end{center}
+%       \vspace{-0.4cm}
+    \end{table*}
+}
+% Corpus statistics for CC-100
+\newcommand{\insertDataStatistics}{
+%\resizebox{1\linewidth}{!}{
+\begin{table}[h!]
+\begin{center}
+\small
+\begin{tabular}[b]{clrrclrr}
+\toprule
+\textbf{ISO code} & \textbf{Language} & \textbf{Tokens} (M) & \textbf{Size} (GiB) & \textbf{ISO code} & \textbf{Language} & \textbf{Tokens} (M) & \textbf{Size} (GiB)\\
+\cmidrule(r){1-4}\cmidrule(l){5-8}
+{\bf af }& Afrikaans & 242 & 1.3 &{\bf  lo }& Lao & 17 & 0.6 \\
+{\bf am }& Amharic & 68 & 0.8 &{\bf  lt }& Lithuanian & 1835 & 13.7 \\
+{\bf ar }& Arabic & 2869 & 28.0 &{\bf  lv }& Latvian & 1198 & 8.8 \\
+{\bf as }& Assamese & 5 & 0.1 &{\bf  mg }& Malagasy & 25 & 0.2 \\
+{\bf az }& Azerbaijani & 783 & 6.5 &{\bf  mk }& Macedonian & 449 & 4.8 \\
+{\bf be }& Belarusian & 362 & 4.3 &{\bf  ml }& Malayalam & 313 & 7.6 \\
+{\bf bg }& Bulgarian & 5487 & 57.5 &{\bf  mn }& Mongolian & 248 & 3.0 \\
+{\bf bn }& Bengali & 525 & 8.4 &{\bf  mr }& Marathi & 175 & 2.8 \\
+{\bf - }& Bengali Romanized & 77 & 0.5 &{\bf  ms }& Malay & 1318 & 8.5 \\
+{\bf br }& Breton & 16 & 0.1 &{\bf  my }& Burmese & 15 & 0.4 \\
+{\bf bs }& Bosnian & 14 & 0.1 &{\bf  my }& Burmese & 56 & 1.6 \\
+{\bf ca }& Catalan & 1752 & 10.1 &{\bf  ne }& Nepali & 237 & 3.8 \\
+{\bf cs }& Czech & 2498 & 16.3 &{\bf  nl }& Dutch & 5025 & 29.3 \\
+{\bf cy }& Welsh & 141 & 0.8 &{\bf  no }& Norwegian & 8494 & 49.0 \\
+{\bf da }& Danish & 7823 & 45.6 &{\bf  om }& Oromo & 8 & 0.1 \\
+{\bf de }& German & 10297 & 66.6 &{\bf  or }& Oriya & 36 & 0.6 \\
+{\bf el }& Greek & 4285 & 46.9 &{\bf  pa }& Punjabi & 68 & 0.8 \\
+{\bf en }& English & 55608 & 300.8 &{\bf  pl }& Polish & 6490 & 44.6 \\
+{\bf eo }& Esperanto & 157 & 0.9 &{\bf  ps }& Pashto & 96 & 0.7 \\
+{\bf es }& Spanish & 9374 & 53.3 &{\bf  pt }& Portuguese & 8405 & 49.1 \\
+{\bf et }& Estonian & 843 & 6.1 &{\bf  ro }& Romanian & 10354 & 61.4 \\
+{\bf eu }& Basque & 270 & 2.0 &{\bf  ru }& Russian & 23408 & 278.0 \\
+{\bf fa }& Persian & 13259 & 111.6 &{\bf  sa }& Sanskrit & 17 & 0.3 \\
+{\bf fi }& Finnish & 6730 & 54.3 &{\bf  sd }& Sindhi & 50 & 0.4 \\
+{\bf fr }& French & 9780 & 56.8 &{\bf  si }& Sinhala & 243 & 3.6 \\
+{\bf fy }& Western Frisian & 29 & 0.2 &{\bf  sk }& Slovak & 3525 & 23.2 \\
+{\bf ga }& Irish & 86 & 0.5 &{\bf  sl }& Slovenian & 1669 & 10.3 \\
+{\bf gd }& Scottish Gaelic & 21 & 0.1 &{\bf  so }& Somali & 62 & 0.4 \\
+{\bf gl }& Galician & 495 & 2.9 &{\bf  sq }& Albanian & 918 & 5.4 \\
+{\bf gu }& Gujarati & 140 & 1.9 &{\bf  sr }& Serbian & 843 & 9.1 \\
+{\bf ha }& Hausa & 56 & 0.3 &{\bf  su }& Sundanese & 10 & 0.1 \\
+{\bf he }& Hebrew & 3399 & 31.6 &{\bf  sv }& Swedish & 77.8 & 12.1 \\
+{\bf hi }& Hindi & 1715 & 20.2 &{\bf  sw }& Swahili & 275 & 1.6 \\
+{\bf - }& Hindi Romanized & 88 & 0.5 &{\bf  ta }& Tamil & 595 & 12.2 \\
+{\bf hr }& Croatian & 3297 & 20.5 &{\bf  - }& Tamil Romanized & 36 & 0.3 \\
+{\bf hu }& Hungarian & 7807 & 58.4 &{\bf  te }& Telugu & 249 & 4.7 \\
+{\bf hy }& Armenian & 421 & 5.5 &{\bf  - }& Telugu Romanized & 39 & 0.3 \\
+{\bf id }& Indonesian & 22704 & 148.3 &{\bf  th }& Thai & 1834 & 71.7 \\
+{\bf is }& Icelandic & 505 & 3.2 &{\bf  tl }& Filipino & 556 & 3.1 \\
+{\bf it }& Italian & 4983 & 30.2 &{\bf  tr }& Turkish & 2736 & 20.9 \\
+{\bf ja }& Japanese & 530 & 69.3 &{\bf  ug }& Uyghur & 27 & 0.4 \\
+{\bf jv }& Javanese & 24 & 0.2 &{\bf  uk }& Ukrainian & 6.5 & 84.6 \\
+{\bf ka }& Georgian & 469 & 9.1 &{\bf  ur }& Urdu & 730 & 5.7 \\
+{\bf kk }& Kazakh & 476 & 6.4 &{\bf  - }& Urdu Romanized & 85 & 0.5 \\
+{\bf km }& Khmer & 36 & 1.5 &{\bf  uz }& Uzbek & 91 & 0.7 \\
+{\bf kn }& Kannada & 169 & 3.3 &{\bf  vi }& Vietnamese & 24757 & 137.3 \\
+{\bf ko }& Korean & 5644 & 54.2 &{\bf  xh }& Xhosa & 13 & 0.1 \\
+{\bf ku }& Kurdish (Kurmanji) & 66 & 0.4 &{\bf  yi }& Yiddish & 34 & 0.3 \\
+{\bf ky }& Kyrgyz & 94 & 1.2 &{\bf  zh }& Chinese (Simplified) & 259 & 46.9 \\
+{\bf la }& Latin & 390 & 2.5 &{\bf  zh }& Chinese (Traditional) & 176 & 16.6 \\
+\bottomrule
+\end{tabular}
+\caption{\textbf{Languages and statistics of the CC-100 corpus.} We report the list of 100 languages and include the number of tokens (Millions) and the size of the data (in GiB) for each language. Note that we also include romanized variants of some non latin languages such as Bengali, Hindi, Tamil, Telugu and Urdu.\label{tab:datastats}}
+\end{center}
+\end{table}
+%}
+}
+% Comparison of parameters for different models
+\newcommand{\insertParameters}{
+    \begin{table*}[h!]
+        \begin{center}
+            % \scriptsize
+            %\resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{lrcrrrrrc}
+            \toprule
+                \textbf{Model} & \textbf{\#lgs} & \textbf{tokenization} & \textbf{L} & \textbf{$H_{m}$} & \textbf{$H_{ff}$} & \textbf{A} & \textbf{V} & \textbf{\#params}\\
+                \cmidrule(r){1-1}
+                \cmidrule(lr){2-3}
+                \cmidrule(lr){4-8}
+                \cmidrule(l){9-9}
+                % TODO: rank by number of parameters
+                BERT\textsubscript{Base} & 1 & WordPiece & 12 & 768 & 3072 & 12 & 30k & 110M \\
+                BERT\textsubscript{Large} & 1 & WordPiece & 24 & 1024 & 4096 & 16 & 30k & 335M \\
+                mBERT & 104 & WordPiece & 12 & 768 & 3072 & 12 & 110k & 172M \\
+                RoBERTa\textsubscript{Base} & 1 & bBPE & 12 & 768 & 3072 & 8 & 50k & 125M \\
+                RoBERTa & 1 & bBPE & 24 & 1024 & 4096 & 16 & 50k & 355M \\
+                XLM-15 & 15 & BPE & 12 & 1024 & 4096 & 8 & 95k & 250M \\
+                XLM-17 & 17 & BPE & 16 & 1280 & 5120 & 16 & 200k & 570M \\
+                XLM-100 & 100 & BPE & 16 & 1280 & 5120 & 16 & 200k & 570M \\
+                Unicoder & 15 & BPE & 12 & 1024 & 4096 & 8 & 95k & 250M \\
+                \xlmr\textsubscript{Base} & 100 & SPM & 12 & 768 & 3072 & 12 & 250k & 270M \\
+                \xlmr & 100 & SPM & 24 & 1024 & 4096 & 16 & 250k & 550M \\
+                GPT2 & 1 & bBPE & 48 & 1600 & 6400 & 32 & 50k & 1.5B \\
+                wide-mmNMT & 103 & SPM & 12 & 2048 & 16384 & 32 & 64k & 3B \\
+                deep-mmNMT & 103 & SPM & 24 & 1024 & 16384 & 32 & 64k & 3B \\
+                T5-3B & 1 & WordPiece & 24 & 1024 & 16384 & 32 & 32k & 3B \\
+                T5-11B & 1 & WordPiece & 24 & 1024 & 65536 & 32 & 32k & 11B \\
+                % XLNet\textsubscript{Large}$^{\dagger}$ & 1 & 89.8/- & 93.9 & 91.8 & 95.6 & 89.2 & 91.8 & 92.0 \\
+                % RoBERTa$^{\dagger}$ & 1 & 90.2/90.2 & 94.7 & 92.2 & 96.4 & 90.9 & 92.4 & 92.8 \\
+                % XLM-R & 100 & 88.4/88.5 & 93.1 & 92.2 & 95.1 & 89.7 & 90.4 & 91.5 \\
+                \bottomrule
+            \end{tabular}
+            %}
+            \caption{\textbf{Details on model sizes.}
+            We show the tokenization used by each Transformer model, the number of layers L, the number of hidden states of the model $H_{m}$, the dimension of the feed-forward layer $H_{ff}$, the number of attention heads A, the size of the vocabulary V and the total number of parameters \#params.
+            For Transformer encoders, the number of parameters can be approximated by $4LH_m^2 + 2LH_m H_{ff} + VH_m$.
+            GPT2 numbers are from \citet{radford2019language}, mm-NMT models are from the work of \citet{arivazhagan2019massively} on massively multilingual neural machine translation (mmNMT), and T5 numbers are from \citet{raffel2019exploring}. While \xlmr is among the largest models partly due to its large embedding layer, it has a similar number of parameters than XLM-100, and remains significantly smaller that recently introduced Transformer models for multilingual MT and transfer learning. While this table gives more hindsight on the difference of capacity of each model, note it does not highlight other critical differences between the models.
+            \label{tab:parameters}}
+        \end{center}
+       \vspace{-0.4cm}
+    \end{table*}
+}

references/2019.arxiv.conneau/source/XLMR Paper/content/vocabsize.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e45090856dc149265ada0062c8c2456c3057902dfaaade60aa80905785563506
+size 15677

references/2019.arxiv.conneau/source/XLMR Paper/content/wikicc.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0d7e959db8240f283922c3ca7c6de6f5ad3750681f27f4fcf35d161506a7a21
+size 16304

references/2019.arxiv.conneau/source/XLMR Paper/texput.log ADDED Viewed

	@@ -0,0 +1,21 @@

+This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019) (preloaded format=pdflatex 2019.5.8)  7 APR 2020 17:41
+entering extended mode
+ restricted \write18 enabled.
+ %&-line parsing enabled.
+**acl2020.tex
+! Emergency stop.
+<*> acl2020.tex
+*** (job aborted, file error in nonstop mode)
+Here is how much of TeX's memory you used:
+ 3 strings out of 492616
+ 102 string characters out of 6129482
+ 57117 words of memory out of 5000000
+ 4025 multiletter control sequences out of 15000+600000
+ 3640 words of font info for 14 fonts, out of 8000000 for 9000
+ 1141 hyphenation exceptions out of 8191
+ 0i,0n,0p,1b,6s stack positions out of 5000i,500n,10000p,200000b,80000s
+!  ==> Fatal error occurred, no output PDF file produced!

references/2019.arxiv.conneau/source/XLMR Paper/xlmr.bbl ADDED Viewed

	@@ -0,0 +1,285 @@

+\begin{thebibliography}{40}
+\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
+\bibitem[{Akbik et~al.(2018)Akbik, Blythe, and Vollgraf}]{akbik2018coling}
+Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018.
+\newblock Contextual string embeddings for sequence labeling.
+\newblock In \emph{COLING}, pages 1638--1649.
+\bibitem[{Arivazhagan et~al.(2019)Arivazhagan, Bapna, Firat, Lepikhin, Johnson,
+  Krikun, Chen, Cao, Foster, Cherry et~al.}]{arivazhagan2019massively}
+Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson,
+  Maxim Krikun, Mia~Xu Chen, Yuan Cao, George Foster, Colin Cherry, et~al.
+  2019.
+\newblock Massively multilingual neural machine translation in the wild:
+  Findings and challenges.
+\newblock \emph{arXiv preprint arXiv:1907.05019}.
+\bibitem[{Bowman et~al.(2015)Bowman, Angeli, Potts, and
+  Manning}]{bowman2015large}
+Samuel~R. Bowman, Gabor Angeli, Christopher Potts, and Christopher~D. Manning.
+  2015.
+\newblock A large annotated corpus for learning natural language inference.
+\newblock In \emph{EMNLP}.
+\bibitem[{Conneau et~al.(2018)Conneau, Rinott, Lample, Williams, Bowman,
+  Schwenk, and Stoyanov}]{conneau2018xnli}
+Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel~R.
+  Bowman, Holger Schwenk, and Veselin Stoyanov. 2018.
+\newblock Xnli: Evaluating cross-lingual sentence representations.
+\newblock In \emph{EMNLP}. Association for Computational Linguistics.
+\bibitem[{Devlin et~al.(2018)Devlin, Chang, Lee, and
+  Toutanova}]{devlin2018bert}
+Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018.
+\newblock Bert: Pre-training of deep bidirectional transformers for language
+  understanding.
+\newblock \emph{NAACL}.
+\bibitem[{Grave et~al.(2018)Grave, Bojanowski, Gupta, Joulin, and
+  Mikolov}]{grave2018learning}
+Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas
+  Mikolov. 2018.
+\newblock Learning word vectors for 157 languages.
+\newblock In \emph{LREC}.
+\bibitem[{Huang et~al.(2019)Huang, Liang, Duan, Gong, Shou, Jiang, and
+  Zhou}]{huang2019unicoder}
+Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, and
+  Ming Zhou. 2019.
+\newblock Unicoder: A universal language encoder by pre-training with multiple
+  cross-lingual tasks.
+\newblock \emph{ACL}.
+\bibitem[{Johnson et~al.(2017)Johnson, Schuster, Le, Krikun, Wu, Chen, Thorat,
+  Vi{\'e}gas, Wattenberg, Corrado et~al.}]{johnson2017google}
+Melvin Johnson, Mike Schuster, Quoc~V Le, Maxim Krikun, Yonghui Wu, Zhifeng
+  Chen, Nikhil Thorat, Fernanda Vi{\'e}gas, Martin Wattenberg, Greg Corrado,
+  et~al. 2017.
+\newblock Google’s multilingual neural machine translation system: Enabling
+  zero-shot translation.
+\newblock \emph{TACL}, 5:339--351.
+\bibitem[{Joulin et~al.(2017)Joulin, Grave, and Mikolov}]{joulin2017bag}
+Armand Joulin, Edouard Grave, and Piotr Bojanowski~Tomas Mikolov. 2017.
+\newblock Bag of tricks for efficient text classification.
+\newblock \emph{EACL 2017}, page 427.
+\bibitem[{Jozefowicz et~al.(2016)Jozefowicz, Vinyals, Schuster, Shazeer, and
+  Wu}]{jozefowicz2016exploring}
+Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu.
+  2016.
+\newblock Exploring the limits of language modeling.
+\newblock \emph{arXiv preprint arXiv:1602.02410}.
+\bibitem[{Kudo(2018)}]{kudo2018subword}
+Taku Kudo. 2018.
+\newblock Subword regularization: Improving neural network translation models
+  with multiple subword candidates.
+\newblock In \emph{ACL}, pages 66--75.
+\bibitem[{Kudo and Richardson(2018)}]{kudo2018sentencepiece}
+Taku Kudo and John Richardson. 2018.
+\newblock Sentencepiece: A simple and language independent subword tokenizer
+  and detokenizer for neural text processing.
+\newblock \emph{EMNLP}.
+\bibitem[{Lample et~al.(2016)Lample, Ballesteros, Subramanian, Kawakami, and
+  Dyer}]{lample-etal-2016-neural}
+Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and
+  Chris Dyer. 2016.
+\newblock \href {https://doi.org/10.18653/v1/N16-1030} {Neural architectures
+  for named entity recognition}.
+\newblock In \emph{NAACL}, pages 260--270, San Diego, California. Association
+  for Computational Linguistics.
+\bibitem[{Lample and Conneau(2019)}]{lample2019cross}
+Guillaume Lample and Alexis Conneau. 2019.
+\newblock Cross-lingual language model pretraining.
+\newblock \emph{NeurIPS}.
+\bibitem[{Lewis et~al.(2019)Lewis, O\u{g}uz, Rinott, Riedel, and
+  Schwenk}]{lewis2019mlqa}
+Patrick Lewis, Barlas O\u{g}uz, Ruty Rinott, Sebastian Riedel, and Holger
+  Schwenk. 2019.
+\newblock Mlqa: Evaluating cross-lingual extractive question answering.
+\newblock \emph{arXiv preprint arXiv:1910.07475}.
+\bibitem[{Liu et~al.(2019)Liu, Ott, Goyal, Du, Joshi, Chen, Levy, Lewis,
+  Zettlemoyer, and Stoyanov}]{roberta2019}
+Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
+  Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
+\newblock Roberta: {A} robustly optimized {BERT} pretraining approach.
+\newblock \emph{arXiv preprint arXiv:1907.11692}.
+\bibitem[{Mikolov et~al.(2013{\natexlab{a}})Mikolov, Le, and
+  Sutskever}]{mikolov2013exploiting}
+Tomas Mikolov, Quoc~V Le, and Ilya Sutskever. 2013{\natexlab{a}}.
+\newblock Exploiting similarities among languages for machine translation.
+\newblock \emph{arXiv preprint arXiv:1309.4168}.
+\bibitem[{Mikolov et~al.(2013{\natexlab{b}})Mikolov, Sutskever, Chen, Corrado,
+  and Dean}]{mikolov2013distributed}
+Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg~S Corrado, and Jeff Dean.
+  2013{\natexlab{b}}.
+\newblock Distributed representations of words and phrases and their
+  compositionality.
+\newblock In \emph{NIPS}, pages 3111--3119.
+\bibitem[{Pennington et~al.(2014)Pennington, Socher, and
+  Manning}]{pennington2014glove}
+Jeffrey Pennington, Richard Socher, and Christopher~D. Manning. 2014.
+\newblock \href {http://www.aclweb.org/anthology/D14-1162} {Glove: Global
+  vectors for word representation}.
+\newblock In \emph{EMNLP}, pages 1532--1543.
+\bibitem[{Peters et~al.(2018)Peters, Neumann, Iyyer, Gardner, Clark, Lee, and
+  Zettlemoyer}]{peters2018deep}
+Matthew~E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
+  Kenton Lee, and Luke Zettlemoyer. 2018.
+\newblock Deep contextualized word representations.
+\newblock \emph{NAACL}.
+\bibitem[{Pires et~al.(2019)Pires, Schlinger, and Garrette}]{Pires2019HowMI}
+Telmo Pires, Eva Schlinger, and Dan Garrette. 2019.
+\newblock How multilingual is multilingual bert?
+\newblock In \emph{ACL}.
+\bibitem[{Radford et~al.(2018)Radford, Narasimhan, Salimans, and
+  Sutskever}]{radford2018improving}
+Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018.
+\newblock \href
+  {https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf}
+  {Improving language understanding by generative pre-training}.
+\newblock \emph{URL
+  https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language\_understanding\_paper.pdf}.
+\bibitem[{Radford et~al.(2019)Radford, Wu, Child, Luan, Amodei, and
+  Sutskever}]{radford2019language}
+Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya
+  Sutskever. 2019.
+\newblock Language models are unsupervised multitask learners.
+\newblock \emph{OpenAI Blog}, 1(8).
+\bibitem[{Raffel et~al.(2019)Raffel, Shazeer, Roberts, Lee, Narang, Matena,
+  Zhou, Li, and Liu}]{raffel2019exploring}
+Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael
+  Matena, Yanqi Zhou, Wei Li, and Peter~J. Liu. 2019.
+\newblock Exploring the limits of transfer learning with a unified text-to-text
+  transformer.
+\newblock \emph{arXiv preprint arXiv:1910.10683}.
+\bibitem[{Rajpurkar et~al.(2018)Rajpurkar, Jia, and Liang}]{rajpurkar2018know}
+Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018.
+\newblock Know what you don't know: Unanswerable questions for squad.
+\newblock \emph{ACL}.
+\bibitem[{Rajpurkar et~al.(2016)Rajpurkar, Zhang, Lopyrev, and
+  Liang}]{rajpurkar-etal-2016-squad}
+Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016.
+\newblock \href {https://doi.org/10.18653/v1/D16-1264} {{SQ}u{AD}: 100,000+
+  questions for machine comprehension of text}.
+\newblock In \emph{EMNLP}, pages 2383--2392, Austin, Texas. Association for
+  Computational Linguistics.
+\bibitem[{Sang(2002)}]{sang2002introduction}
+Erik~F Sang. 2002.
+\newblock Introduction to the conll-2002 shared task: Language-independent
+  named entity recognition.
+\newblock \emph{CoNLL}.
+\bibitem[{Schuster et~al.(2019)Schuster, Ram, Barzilay, and
+  Globerson}]{schuster2019cross}
+Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson. 2019.
+\newblock Cross-lingual alignment of contextual word embeddings, with
+  applications to zero-shot dependency parsing.
+\newblock \emph{NAACL}.
+\bibitem[{Siddhant et~al.(2019)Siddhant, Johnson, Tsai, Arivazhagan, Riesa,
+  Bapna, Firat, and Raman}]{siddhant2019evaluating}
+Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa,
+  Ankur Bapna, Orhan Firat, and Karthik Raman. 2019.
+\newblock Evaluating the cross-lingual effectiveness of massively multilingual
+  neural machine translation.
+\newblock \emph{AAAI}.
+\bibitem[{Singh et~al.(2019)Singh, McCann, Keskar, Xiong, and
+  Socher}]{singh2019xlda}
+Jasdeep Singh, Bryan McCann, Nitish~Shirish Keskar, Caiming Xiong, and Richard
+  Socher. 2019.
+\newblock Xlda: Cross-lingual data augmentation for natural language inference
+  and question answering.
+\newblock \emph{arXiv preprint arXiv:1905.11471}.
+\bibitem[{Socher et~al.(2013)Socher, Perelygin, Wu, Chuang, Manning, Ng, and
+  Potts}]{socher2013recursive}
+Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher~D Manning,
+  Andrew Ng, and Christopher Potts. 2013.
+\newblock Recursive deep models for semantic compositionality over a sentiment
+  treebank.
+\newblock In \emph{EMNLP}, pages 1631--1642.
+\bibitem[{Tan et~al.(2019)Tan, Ren, He, Qin, Zhao, and
+  Liu}]{tan2019multilingual}
+Xu~Tan, Yi~Ren, Di~He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019.
+\newblock Multilingual neural machine translation with knowledge distillation.
+\newblock \emph{ICLR}.
+\bibitem[{Tjong Kim~Sang and De~Meulder(2003)}]{tjong2003introduction}
+Erik~F Tjong Kim~Sang and Fien De~Meulder. 2003.
+\newblock Introduction to the conll-2003 shared task: language-independent
+  named entity recognition.
+\newblock In \emph{CoNLL}, pages 142--147. Association for Computational
+  Linguistics.
+\bibitem[{Vaswani et~al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones,
+  Gomez, Kaiser, and Polosukhin}]{transformer17}
+Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
+  Aidan~N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017.
+\newblock Attention is all you need.
+\newblock In \emph{Advances in Neural Information Processing Systems}, pages
+  6000--6010.
+\bibitem[{Wang et~al.(2018)Wang, Singh, Michael, Hill, Levy, and
+  Bowman}]{wang2018glue}
+Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel~R
+  Bowman. 2018.
+\newblock Glue: A multi-task benchmark and analysis platform for natural
+  language understanding.
+\newblock \emph{arXiv preprint arXiv:1804.07461}.
+\bibitem[{Wenzek et~al.(2019)Wenzek, Lachaux, Conneau, Chaudhary, Guzman,
+  Joulin, and Grave}]{wenzek2019ccnet}
+Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary,
+  Francisco Guzman, Armand Joulin, and Edouard Grave. 2019.
+\newblock Ccnet: Extracting high quality monolingual datasets from web crawl
+  data.
+\newblock \emph{arXiv preprint arXiv:1911.00359}.
+\bibitem[{Williams et~al.(2017)Williams, Nangia, and
+  Bowman}]{williams2017broad}
+Adina Williams, Nikita Nangia, and Samuel~R Bowman. 2017.
+\newblock A broad-coverage challenge corpus for sentence understanding through
+  inference.
+\newblock \emph{Proceedings of the 2nd Workshop on Evaluating Vector-Space
+  Representations for NLP}.
+\bibitem[{Wu et~al.(2019)Wu, Conneau, Li, Zettlemoyer, and
+  Stoyanov}]{wu2019emerging}
+Shijie Wu, Alexis Conneau, Haoran Li, Luke Zettlemoyer, and Veselin Stoyanov.
+  2019.
+\newblock Emerging cross-lingual structure in pretrained language models.
+\newblock \emph{ACL}.
+\bibitem[{Wu and Dredze(2019)}]{wu2019beto}
+Shijie Wu and Mark Dredze. 2019.
+\newblock Beto, bentz, becas: The surprising cross-lingual effectiveness of
+  bert.
+\newblock \emph{EMNLP}.
+\bibitem[{Xie et~al.(2019)Xie, Dai, Hovy, Luong, and Le}]{xie2019unsupervised}
+Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc~V Le. 2019.
+\newblock Unsupervised data augmentation for consistency training.
+\newblock \emph{arXiv preprint arXiv:1904.12848}.
+\end{thebibliography}

references/2019.arxiv.conneau/source/XLMR Paper/xlmr.synctex ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:420af1ab9f337834c49b93240fd9062be0a9f1bd9135878e6c96a6d128aa6856
+size 865236

references/2019.arxiv.conneau/source/XLMR Paper/xlmr.tex ADDED Viewed

	@@ -0,0 +1,307 @@

+%
+% File acl2020.tex
+%
+%% Based on the style files for ACL 2020, which were
+%% Based on the style files for ACL 2018, NAACL 2018/19, which were
+%% Based on the style files for ACL-2015, with some improvements
+%%  taken from the NAACL-2016 style
+%% Based on the style files for ACL-2014, which were, in turn,
+%% based on ACL-2013, ACL-2012, ACL-2011, ACL-2010, ACL-IJCNLP-2009,
+%% EACL-2009, IJCNLP-2008...
+%% Based on the style files for EACL 2006 by
+%%e.agirre@ehu.es or Sergi.Balari@uab.es
+%% and that of ACL 08 by Joakim Nivre and Noah Smith
+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{acl2020}
+\usepackage{times}
+\usepackage{latexsym}
+\renewcommand{\UrlFont}{\ttfamily\small}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\usepackage{graphicx}
+\usepackage{subfigure}
+\usepackage{booktabs} % for professional tables
+\usepackage{url}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{array}
+\usepackage{adjustbox}
+\usepackage{multirow}
+% \usepackage{subcaption}
+\usepackage{hyperref}
+\usepackage{longtable}
+\input{content/tables}
+\aclfinalcopy % Uncomment this line for the final submission
+\def\aclpaperid{479} %  Enter the acl Paper ID here
+%\setlength\titlebox{5cm}
+% You can expand the titlebox if you need extra space
+% to show all the authors. Please do not make the titlebox
+% smaller than 5cm (the original size); we will check this
+% in the camera-ready version and ask you to change it back.
+\newcommand\BibTeX{B\textsc{ib}\TeX}
+\usepackage{xspace}
+\newcommand{\xlmr}{\textit{XLM-R}\xspace}
+\newcommand{\mbert}{mBERT\xspace}
+\newcommand{\XX}{\textcolor{red}{XX}\xspace}
+\newcommand{\note}[3]{{\color{#2}[#1: #3]}}
+\newcommand{\ves}[1]{\note{ves}{red}{#1}}
+\newcommand{\luke}[1]{\note{luke}{green}{#1}}
+\newcommand{\myle}[1]{\note{myle}{cyan}{#1}}
+\newcommand{\paco}[1]{\note{paco}{blue}{#1}}
+\newcommand{\eg}[1]{\note{edouard}{orange}{#1}}
+\newcommand{\kk}[1]{\note{kartikay}{pink}{#1}}
+\renewcommand{\UrlFont}{\scriptsize}
+\title{Unsupervised Cross-lingual Representation Learning at Scale}
+\author{Alexis Conneau\thanks{\ \ Equal contribution.} \space\space\space
+  Kartikay Khandelwal\footnotemark[1] \space\space\space \AND
+  \bf Naman Goyal \space\space\space
+  Vishrav Chaudhary \space\space\space
+  Guillaume Wenzek \space\space\space
+  Francisco Guzm\'an \space\space\space \AND
+  \bf Edouard Grave \space\space\space
+  Myle Ott \space\space\space
+  Luke Zettlemoyer \space\space\space
+  Veselin Stoyanov \space\space\space \\ \\ \\
+  \bf Facebook AI
+  }
+\date{}
+\begin{document}
+\maketitle
+\begin{abstract}
+This paper shows that pretraining  multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed \xlmr, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +14.6\% average accuracy on XNLI, +13\% average F1 score on MLQA, and +2.4\% F1 score on NER. \xlmr performs particularly well on low-resource languages, improving 15.7\% in XNLI accuracy for Swahili and 11.4\% for Urdu over previous XLM models. We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; \xlmr is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make our code, data and models publicly available.{\let\thefootnote\relax\footnotetext{\scriptsize Correspondence to {\tt \{aconneau,kartikayk\}@fb.com}}}\footnote{\url{https://github.com/facebookresearch/(fairseq-py,pytext,xlm)}}
+\end{abstract}
+\section{Introduction}
+The goal of this paper is to improve cross-lingual language understanding (XLU), by carefully studying the effects of training unsupervised cross-lingual representations at a very large scale.
+We present \xlmr a transformer-based multilingual masked language model pre-trained on text in 100 languages, which obtains state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.
+Multilingual masked language models (MLM) like \mbert ~\cite{devlin2018bert} and XLM \cite{lample2019cross} have pushed the state-of-the-art on cross-lingual understanding tasks by jointly pretraining large Transformer models~\cite{transformer17} on many languages. These models allow for effective cross-lingual transfer, as seen in a number of benchmarks including cross-lingual natural language inference ~\cite{bowman2015large,williams2017broad,conneau2018xnli}, question answering~\cite{rajpurkar-etal-2016-squad,lewis2019mlqa}, and named entity recognition~\cite{Pires2019HowMI,wu2019beto}.
+However, all of these studies pre-train on Wikipedia, which provides a relatively limited scale especially for lower resource languages.
+In this paper, we first present a comprehensive analysis of the trade-offs and limitations of multilingual language models at scale, inspired by recent monolingual scaling efforts~\cite{roberta2019}.
+We measure the trade-off between high-resource and low-resource languages and the impact of language sampling and vocabulary size.
+%By training models with an increasing number of languages,
+The experiments expose a trade-off as we scale the number of languages for a fixed model capacity: more languages leads to better cross-lingual performance on low-resource languages up until a point, after which the overall performance on monolingual and cross-lingual benchmarks degrades. We refer to this tradeoff as the \emph{curse of multilinguality}, and show that it can be alleviated by simply increasing model capacity.
+We argue, however, that this remains an important limitation for future XLU systems which may aim to improve performance with more modest computational budgets.
+Our best model XLM-RoBERTa (\xlmr)   outperforms \mbert on cross-lingual classification by up to 23\% accuracy on low-resource languages.
+%like Swahili and Urdu.
+It outperforms the previous state of the art by 5.1\% average accuracy on XNLI, 2.42\% average F1-score on Named Entity Recognition, and 9.1\% average F1-score on cross-lingual Question Answering. We also evaluate monolingual fine tuning on the GLUE and XNLI benchmarks, where \xlmr obtains results competitive with state-of-the-art monolingual models, including RoBERTa \cite{roberta2019}.
+These results demonstrate, for the first time, that it is possible to have a single large model for all languages, without sacrificing per-language performance.
+We will make our code, models and data publicly available, with the hope that this will help research in multilingual NLP and low-resource language understanding.
+\section{Related Work}
+From pretrained word embeddings~\citep{mikolov2013distributed, pennington2014glove} to pretrained contextualized representations~\citep{peters2018deep,schuster2019cross} and transformer based language models~\citep{radford2018improving,devlin2018bert}, unsupervised representation learning has significantly improved the state of the art in natural language understanding. Parallel work on cross-lingual understanding~\citep{mikolov2013exploiting,schuster2019cross,lample2019cross} extends these systems to more languages and to the cross-lingual setting in which a model is learned in one language and applied in other languages.
+Most recently, \citet{devlin2018bert} and \citet{lample2019cross} introduced \mbert and XLM - masked language models trained on multiple languages, without any cross-lingual supervision.
+\citet{lample2019cross} propose translation language modeling (TLM) as a way to leverage parallel data and obtain a new state of the art on the cross-lingual natural language inference (XNLI) benchmark~\cite{conneau2018xnli}.
+They further show strong improvements on unsupervised machine translation and pretraining for sequence generation. \citet{wu2019emerging} shows that monolingual BERT representations are similar across languages, explaining in part the natural emergence of multilinguality in bottleneck architectures. Separately, \citet{Pires2019HowMI} demonstrated the effectiveness of multilingual models like \mbert on sequence labeling tasks. \citet{huang2019unicoder} showed gains over XLM using cross-lingual multi-task learning, and \citet{singh2019xlda} demonstrated the efficiency of cross-lingual data augmentation for cross-lingual NLI. However, all of this work was at a relatively modest scale, in terms of the amount of training data, as compared to our approach.
+\insertWikivsCC
+The benefits of scaling language model pretraining by increasing the size of the model as well as the training data has been extensively studied in the literature. For the monolingual case, \citet{jozefowicz2016exploring} show how large-scale LSTM models can obtain much stronger performance on language modeling benchmarks when trained on billions of tokens.
+%[Kartikay: TODO; CHange the reference to GPT2]
+GPT~\cite{radford2018improving} also highlights the importance of scaling the amount of data and RoBERTa \cite{roberta2019} shows that training BERT longer on more data leads to significant boost in performance. Inspired by RoBERTa, we show that mBERT and XLM are undertuned, and that simple improvements in the learning procedure of unsupervised MLM leads to much better performance. We train on cleaned CommonCrawls~\cite{wenzek2019ccnet}, which increase the amount of data for low-resource languages by two orders of magnitude on average. Similar data has also been shown to be effective for learning high quality word embeddings in multiple languages~\cite{grave2018learning}.
+Several efforts have trained massively multilingual machine translation models from large parallel corpora. They uncover the high and low resource trade-off and the problem of capacity dilution~\citep{johnson2017google,tan2019multilingual}. The work most similar to ours is \citet{arivazhagan2019massively}, which trains a single model in 103 languages on over 25 billion parallel sentences.
+\citet{siddhant2019evaluating} further analyze the representations obtained by the encoder of a massively multilingual machine translation system and show that it obtains similar results to mBERT on cross-lingual NLI.
+%, which performs much wors that the XLM models we study.
+Our work, in contrast, focuses on the unsupervised learning of cross-lingual representations and their transfer to discriminative tasks.
+\section{Model and Data}
+\label{sec:model+data}
+In this section, we present the training objective, languages, and data we use. We follow the XLM approach~\cite{lample2019cross} as closely as possible, only introducing changes that improve performance at scale.
+\paragraph{Masked Language Models.}
+We use a Transformer model~\cite{transformer17} trained with the multilingual MLM objective~\cite{devlin2018bert,lample2019cross} using only monolingual data. We sample streams of text from each language and train the model to predict the masked tokens in the input.
+We apply subword tokenization directly on raw text data using Sentence Piece~\cite{kudo2018sentencepiece} with a unigram language model~\cite{kudo2018subword}. We sample batches from different languages using the same sampling distribution as \citet{lample2019cross}, but with $\alpha=0.3$. Unlike \citet{lample2019cross}, we do not use language embeddings, which allows our model to better deal with code-switching.  We use a large vocabulary size of 250K with a full softmax and train two different models: \xlmr\textsubscript{Base} (L = 12, H = 768, A = 12, 270M params) and \xlmr (L = 24, H = 1024, A = 16, 550M params). For all of our ablation studies, we use a BERT\textsubscript{Base} architecture with a vocabulary of 150K tokens. Appendix~\ref{sec:appendix_B} goes into more details about the architecture of the different models referenced in this paper.
+\paragraph{Scaling to a hundred languages.}
+\xlmr is trained on 100 languages;
+we provide a full list of languages and associated statistics in Appendix~\ref{sec:appendix_A}. Figure~\ref{fig:wikivscc} specifies the iso codes of 88 languages that are shared across \xlmr and XLM-100, the model from~\citet{lample2019cross} trained on Wikipedia text in 100 languages.
+Compared to previous work, we replace some languages with more commonly used ones such as romanized Hindi and traditional Chinese. In our ablation studies, we always include the 7 languages for which we have classification and sequence labeling evaluation benchmarks: English, French, German, Russian, Chinese, Swahili and Urdu. We chose this set as it covers a suitable range of language families and includes low-resource languages such as Swahili and Urdu.
+We also consider larger sets of 15, 30, 60 and all 100 languages. When reporting results on high-resource and low-resource, we refer to the average of English and French results, and the average of Swahili and Urdu results respectively.
+\paragraph{Scaling the Amount of Training Data.}
+Following~\citet{wenzek2019ccnet}~\footnote{\url{https://github.com/facebookresearch/cc_net}}, we build a clean CommonCrawl Corpus in 100 languages. We use an internal language identification model in combination with the one from fastText~\cite{joulin2017bag}. We train language models in each language and use it to filter documents as described in \citet{wenzek2019ccnet}. We consider one CommonCrawl dump for English and twelve dumps for all other languages, which significantly increases dataset sizes, especially for low-resource languages like Burmese and Swahili.
+Figure~\ref{fig:wikivscc} shows the difference in size between the Wikipedia Corpus used by mBERT and XLM-100, and the CommonCrawl Corpus we use. As we show in Section~\ref{sec:multimono}, monolingual Wikipedia corpora are too small to enable unsupervised representation learning. Based on our experiments, we found that a few hundred MiB of text data is usually a minimal size for learning a BERT model.
+\section{Evaluation}
+We consider four evaluation benchmarks.
+For cross-lingual understanding, we use cross-lingual natural language inference, named entity recognition, and question answering. We use the GLUE benchmark to evaluate the English performance of \xlmr and compare it to other state-of-the-art models.
+\paragraph{Cross-lingual Natural Language Inference (XNLI).}
+The XNLI dataset comes with ground-truth dev and test sets in 15 languages, and a ground-truth English training set. The training set has been machine-translated to the remaining 14 languages, providing synthetic training data for these languages as well. We evaluate our model on cross-lingual transfer from English to other languages. We also consider three machine translation baselines: (i) \textit{translate-test}: dev and test sets are machine-translated to English and a single English model is used (ii) \textit{translate-train} (per-language): the English training set is machine-translated to each language and we fine-tune a multiligual model on each training set (iii) \textit{translate-train-all} (multi-language): we fine-tune a multilingual model on the concatenation of all training sets from translate-train. For the translations, we use the official data provided by the XNLI project.
+% In case we want to add more details about the CC-100 corpora : We train language models in each language and use it to filter documents as described in Wenzek et al. (2019). We additionally apply a filter based on type-token ratio score of 0.6.  We consider one CommonCrawl snapshot (December, 2018) for English and twelve snapshots from all months of 2018 for all other languages, which significantly increases dataset sizes, especially for low-resource languages like Burmese and Swahili.
+\paragraph{Named Entity Recognition.}
+% WikiAnn http://nlp.cs.rpi.edu/wikiann/
+For NER, we consider the CoNLL-2002~\cite{sang2002introduction} and CoNLL-2003~\cite{tjong2003introduction} datasets in English, Dutch, Spanish and German. We fine-tune multilingual models either (1) on the English set to evaluate cross-lingual transfer, (2) on each set to  evaluate per-language performance, or (3) on all sets to evaluate multilingual learning. We report the F1 score, and compare to baselines from \citet{lample-etal-2016-neural} and \citet{akbik2018coling}.
+\paragraph{Cross-lingual Question Answering.}
+We use the MLQA benchmark from \citet{lewis2019mlqa}, which extends the English SQuAD benchmark to Spanish, German, Arabic, Hindi, Vietnamese and Chinese. We report the F1 score as well as the exact match (EM) score for cross-lingual transfer from English.
+\paragraph{GLUE Benchmark.}
+Finally, we evaluate the English performance of our model on the GLUE benchmark~\cite{wang2018glue} which gathers multiple classification tasks, such as MNLI~\cite{williams2017broad}, SST-2~\cite{socher2013recursive}, or QNLI~\cite{rajpurkar2018know}. We use BERT\textsubscript{Large} and RoBERTa as baselines.
+\section{Analysis and Results}
+\label{sec:analysis}
+In this section, we perform a comprehensive analysis of multilingual masked language models. We conduct most of the analysis on XNLI, which we found to be representative of our findings on other tasks. We then present the results of \xlmr on cross-lingual understanding and GLUE. Finally, we compare multilingual and monolingual models, and present results on low-resource languages.
+\subsection{Improving and Understanding Multilingual Masked Language Models}
+% prior analysis necessary to build \xlmr
+\insertAblationone
+\insertAblationtwo
+Much of the work done on understanding the cross-lingual effectiveness of \mbert or XLM~\cite{Pires2019HowMI,wu2019beto,lewis2019mlqa} has focused on analyzing the performance of fixed pretrained models on downstream tasks. In this section, we present a comprehensive study of different factors that are important to \textit{pretraining} large scale multilingual models. We highlight the trade-offs and limitations of these models as we scale to one hundred languages.
+\paragraph{Transfer-dilution Trade-off and Curse of Multilinguality.}
+Model capacity (i.e. the number of parameters in the model) is constrained due to practical considerations such as memory and speed during training and inference. For a fixed sized model, the per-language capacity decreases as we increase the number of languages. While low-resource language performance can be improved by adding similar higher-resource languages during pretraining, the overall downstream performance suffers from this capacity dilution~\cite{arivazhagan2019massively}. Positive transfer and capacity dilution have to be traded off against each other.
+We illustrate this trade-off in Figure~\ref{fig:transfer_dilution}, which shows XNLI performance vs the number of languages the model is pretrained on. Initially, as we go from 7 to 15 languages, the model is able to take advantage of positive transfer which improves performance, especially on low resource languages. Beyond this point the {\em curse of multilinguality}
+kicks in and degrades performance across all languages.  Specifically, the overall XNLI accuracy decreases from 71.8\% to 67.7\% as we go from XLM-7 to XLM-100. The same trend can be observed for models trained on the larger CommonCrawl Corpus.
+The issue is even more prominent when the capacity of the model is small. To show this, we pretrain models on Wikipedia Data in 7, 30 and 100 languages. As we add more languages, we make the Transformer wider by increasing the hidden size from 768 to 960 to 1152. In Figure~\ref{fig:capacity}, we show that the added capacity allows XLM-30 to be on par with XLM-7, thus overcoming the curse of multilinguality. The added capacity for XLM-100, however, is not enough
+and it still lags behind due to higher vocabulary dilution (recall from Section~\ref{sec:model+data} that we used a fixed vocabulary size of 150K for all models).
+\paragraph{High-resource vs Low-resource Trade-off.}
+The allocation of the model capacity across languages is controlled by several parameters: the training set size, the size of the shared subword vocabulary, and the rate at which we sample training examples from each language. We study the effect of sampling on the performance of high-resource (English and French) and low-resource (Swahili and Urdu) languages for an XLM-100 model trained on Wikipedia (we observe a similar trend for the construction of the subword vocab). Specifically, we investigate the impact of varying the $\alpha$ parameter which controls the exponential smoothing of the language sampling rate. Similar to~\citet{lample2019cross}, we use a sampling rate proportional to the number of sentences in each corpus.  Models trained with higher values of $\alpha$ see batches of high-resource languages more often.
+Figure~\ref{fig:alpha} shows that the higher the value of $\alpha$, the better the performance on high-resource languages, and vice-versa. When considering overall performance, we found $0.3$ to be an optimal value for $\alpha$, and use this for \xlmr.
+\paragraph{Importance of Capacity and Vocabulary.}
+In previous sections and in Figure~\ref{fig:capacity}, we showed the importance of scaling the model size as we increase the number of languages. Similar to the overall model size, we argue that scaling the size of the shared vocabulary (the vocabulary capacity) can improve the performance of multilingual models on downstream tasks. To illustrate this effect, we train XLM-100 models on Wikipedia data with different vocabulary sizes. We keep the overall number of parameters constant by adjusting the width of the transformer. Figure~\ref{fig:vocab} shows that even with a fixed capacity, we observe a 2.8\% increase in XNLI average accuracy as we increase the vocabulary size from 32K to 256K. This suggests that multilingual models can benefit from allocating a higher proportion of the total number of parameters to the embedding layer even though this reduces the size of the Transformer.
+%With bigger models, we believe that using a vocabulary of up to 2 million tokens with an adaptive softmax~\cite{grave2017efficient,baevski2018adaptive} should improve performance even further, but we leave this exploration to future work.
+For simplicity and given the softmax computational constraints, we use a vocabulary of 250k for \xlmr.
+We further illustrate the importance of this parameter, by training three models with the same transformer architecture (BERT\textsubscript{Base}) but with different vocabulary sizes: 128K, 256K and 512K. We observe more than 3\% gains in overall accuracy on XNLI by simply increasing the vocab size from 128k to 512k.
+\paragraph{Larger-scale Datasets and Training.}
+As shown in Figure~\ref{fig:wikivscc}, the CommonCrawl Corpus that we collected has significantly more monolingual data than the previously used Wikipedia corpora. Figure~\ref{fig:curse} shows that for the same BERT\textsubscript{Base} architecture, all models trained on CommonCrawl obtain significantly better performance.
+Apart from scaling the training data, \citet{roberta2019} also showed the benefits of training MLMs longer. In our experiments, we observed similar effects of large-scale training, such as increasing batch size (see Figure~\ref{fig:batch}) and training time, on model performance. Specifically, we found that using validation perplexity as a stopping criterion for pretraining caused the multilingual MLM in \citet{lample2019cross} to be under-tuned. In our experience, performance on downstream tasks continues to improve even after validation perplexity has plateaued. Combining this observation with our implementation of the unsupervised XLM-MLM objective, we were able to improve the performance of \citet{lample2019cross} from 71.3\% to more than 75\% average accuracy on XNLI, which was on par with their supervised translation language modeling (TLM) objective. Based on these results, and given our focus on unsupervised learning, we decided to not use the supervised TLM objective for training our models.
+\paragraph{Simplifying Multilingual Tokenization with Sentence Piece.}
+The different language-specific tokenization tools
+used by mBERT and XLM-100 make these models more difficult to use on raw text. Instead, we train a Sentence Piece model (SPM) and apply it directly on raw text data for all languages. We did not observe any loss in performance for models trained with SPM when compared to models trained with language-specific preprocessing and byte-pair encoding (see Figure~\ref{fig:batch}) and hence use SPM for \xlmr.
+\subsection{Cross-lingual Understanding Results}
+Based on these results, we adapt the setting of \citet{lample2019cross} and use a large Transformer model with 24 layers and 1024 hidden states, with a 250k vocabulary. We use the multilingual MLM loss and train our \xlmr model for 1.5 Million updates on five-hundred 32GB Nvidia V100 GPUs with a batch size of 8192. We leverage the SPM-preprocessed text data from CommonCrawl in 100 languages and sample languages with $\alpha=0.3$. In this section, we show that it outperforms all previous techniques on cross-lingual benchmarks while getting performance on par with RoBERTa on the GLUE benchmark.
+\insertXNLItable
+\paragraph{XNLI.}
+Table~\ref{tab:xnli} shows XNLI results and adds some additional details: (i) the number of models the approach induces (\#M), (ii) the data on which the model was trained (D), and (iii) the number of languages the model was pretrained on (\#lg). As we show in our results, these parameters significantly impact performance. Column \#M specifies whether model selection was done separately on the dev set of each language ($N$ models), or on the joint dev set of all the languages (single model). We observe a 0.6 decrease in overall accuracy when we go from $N$ models to a single model - going from 71.3 to 70.7. We encourage the community to adopt this setting. For cross-lingual transfer, while this approach is not fully zero-shot transfer, we argue that in real applications, a small amount of supervised data is often available for validation in each language.
+\xlmr sets a new state of the art on XNLI.
+On cross-lingual transfer, \xlmr obtains 80.9\%  accuracy, outperforming the XLM-100 and \mbert open-source models by 10.2\% and 14.6\% average accuracy. On the Swahili and Urdu low-resource languages, \xlmr outperforms XLM-100 by 15.7\% and 11.4\%, and \mbert by 23.5\% and 15.8\%. While \xlmr handles 100 languages, we also show that it outperforms the former state of the art Unicoder~\citep{huang2019unicoder} and XLM (MLM+TLM),  which handle only 15 languages, by 5.5\% and 5.8\% average accuracy respectively. Using the multilingual training of translate-train-all, \xlmr further improves performance and reaches 83.6\% accuracy, a new overall state of the art for XNLI, outperforming Unicoder by 5.1\%. Multilingual training is similar to practical applications where training sets are available in various languages for the same task. In the case of XNLI, datasets have been translated, and translate-train-all can be seen as some form of cross-lingual data augmentation~\cite{singh2019xlda}, similar to back-translation~\cite{xie2019unsupervised}.
+\insertNER
+\paragraph{Named Entity Recognition.}
+In Table~\ref{tab:ner}, we report results of \xlmr and \mbert on CoNLL-2002 and CoNLL-2003. We consider the LSTM + CRF approach from \citet{lample-etal-2016-neural} and the Flair model from \citet{akbik2018coling} as baselines. We evaluate the performance of the model on each of the target languages in three different settings: (i) train on English data only (en) (ii) train on data in target language (each) (iii) train on data in all languages (all). Results of \mbert are reported from \citet{wu2019beto}. Note that we do not use a linear-chain CRF on top of \xlmr and \mbert representations, which gives an advantage to \citet{akbik2018coling}. Without the CRF, our \xlmr model still performs on par with the state of the art, outperforming \citet{akbik2018coling} on Dutch by $2.09$ points. On this task, \xlmr also outperforms \mbert by 2.42 F1 on average for cross-lingual transfer, and 1.86 F1 when trained on each language. Training on all languages leads to an average F1 score of 89.43\%, outperforming cross-lingual transfer approach by 8.49\%.
+\paragraph{Question Answering.}
+We also obtain new state of the art results on the MLQA cross-lingual question answering benchmark, introduced by \citet{lewis2019mlqa}. We follow their procedure by training on the English training data and evaluating on the 7 languages of the dataset.
+We report results in Table~\ref{tab:mlqa}.
+\xlmr obtains F1 and accuracy scores of 70.7\% and 52.7\% while the previous state of the art was 61.6\% and 43.5\%. \xlmr also outperforms \mbert by 13.0\% F1-score and 11.1\% accuracy. It even outperforms BERT-Large on English, confirming its strong monolingual performance.
+\insertMLQA
+\subsection{Multilingual versus Monolingual}
+\label{sec:multimono}
+In this section, we present results of multilingual XLM models against monolingual BERT models.
+\paragraph{GLUE: \xlmr versus RoBERTa.}
+Our goal is to obtain a multilingual model with strong performance on both, cross-lingual understanding tasks as well as natural language understanding tasks for each language. To that end, we evaluate \xlmr on the GLUE benchmark. We show in Table~\ref{tab:glue}, that \xlmr obtains better average dev performance than BERT\textsubscript{Large} by 1.6\% and reaches performance on par with XLNet\textsubscript{Large}. The RoBERTa model outperforms \xlmr by only 1.0\% on average. We believe future work can reduce this gap even further by alleviating the curse of multilinguality and vocabulary dilution. These results demonstrate the possibility of learning one model for many languages while maintaining strong performance on per-language downstream tasks.
+\insertGlue
+\paragraph{XNLI: XLM versus BERT.}
+A recurrent criticism against multilingual models is that they obtain worse performance than their monolingual counterparts. In addition to the comparison of \xlmr and RoBERTa, we provide the first comprehensive study to assess this claim on the XNLI benchmark. We extend our comparison between multilingual XLM models and monolingual BERT models on 7 languages and compare performance in Table~\ref{tab:multimono}. We train 14 monolingual BERT models on Wikipedia and CommonCrawl (capped at 60 GiB),
+%\footnote{For simplicity, we use a reduced version of our corpus by capping the size of each monolingual dataset to 60 GiB.}
+and two XLM-7 models. We increase the vocabulary size of the multilingual model for a better comparison.
+% To our surprise - and backed by further study on internal benchmarks -
+We found that \textit{multilingual models can outperform their monolingual BERT counterparts}. Specifically, in Table~\ref{tab:multimono}, we show that for cross-lingual transfer, monolingual baselines outperform XLM-7 for both Wikipedia and CC by 1.6\% and 1.3\% average accuracy. However, by making use of multilingual training (translate-train-all) and leveraging training sets coming from multiple languages, XLM-7 can outperform the BERT models: our XLM-7 trained on CC obtains 80.0\% average accuracy on the 7 languages, while the average performance of BERT models trained on CC is 77.5\%. This is a surprising result that shows that the capacity of multilingual models to leverage training data coming from multiple languages for a particular task can overcome the capacity dilution problem to obtain better overall performance.
+\insertMultiMono
+\subsection{Representation Learning for Low-resource Languages}
+We observed in Table~\ref{tab:multimono} that pretraining on Wikipedia for Swahili and Urdu performed similarly to a randomly initialized model; most likely due to the small size of the data for these languages. On the other hand, pretraining on CC improved performance by up to 10 points. This confirms our assumption that mBERT and XLM-100 rely heavily on cross-lingual transfer but do not model the low-resource languages as well as \xlmr. Specifically, in the translate-train-all setting, we observe that the biggest gains for XLM models trained on CC, compared to their Wikipedia counterparts, are on low-resource languages; 7\% and 4.8\% improvement on Swahili and Urdu respectively.
+\section{Conclusion}
+In this work, we introduced \xlmr, our new state of the art multilingual masked language model trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages. We show that it provides strong gains over previous multilingual models like \mbert and XLM on classification, sequence labeling and question answering. We exposed the limitations of multilingual MLMs, in particular by uncovering the high-resource versus low-resource trade-off, the curse of multilinguality and the importance of key hyperparameters. We also expose the surprising effectiveness of multilingual models over monolingual models, and show strong improvements on low-resource languages.
+% \section*{Acknowledgements}
+\bibliography{acl2020}
+\bibliographystyle{acl_natbib}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% DELETE THIS PART. DO NOT PLACE CONTENT AFTER THE REFERENCES!
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+ \newpage
+ \clearpage
+ \appendix
+ \onecolumn
+ \section*{Appendix}
+ \section{Languages and statistics for CC-100 used by \xlmr}
+ In this section we present the list of languages in the CC-100 corpus we created for training \xlmr. We also report statistics such as the number of tokens and the size of each monolingual corpus.
+ \label{sec:appendix_A}
+ \insertDataStatistics
+% \newpage
+ \section{Model Architectures and Sizes}
+ As we showed in section~\ref{sec:analysis}, capacity is an important parameter for learning strong cross-lingual representations. In the table below, we list multiple monolingual and multilingual models used by the research community and summarize their architectures and total number of parameters.
+ \label{sec:appendix_B}
+\insertParameters
+% \section{Do \emph{not} have an appendix here}
+% \textbf{\emph{Do not put content after the references.}}
+% %
+% Put anything that you might normally include after the references in a separate
+% supplementary file.
+% We recommend that you build supplementary material in a separate document.
+% If you must create one PDF and cut it up, please be careful to use a tool that
+% doesn't alter the margins, and that doesn't aggressively rewrite the PDF file.
+% pdftk usually works fine.
+% \textbf{Please do not use Apple's preview to cut off supplementary material.} In
+% previous years it has altered margins, and created headaches at the camera-ready
+% stage.
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\end{document}

references/2019.arxiv.conneau/source/acl2020.bib ADDED Viewed

	@@ -0,0 +1,739 @@

+@inproceedings{koehn2007moses,
+  title={Moses: Open source toolkit for statistical machine translation},
+  author={Koehn, Philipp and Hoang, Hieu and Birch, Alexandra and Callison-Burch, Chris and Federico, Marcello and Bertoldi, Nicola and Cowan, Brooke and Shen, Wade and Moran, Christine and Zens, Richard and others},
+  booktitle={Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions},
+  pages={177--180},
+  year={2007},
+  organization={Association for Computational Linguistics}
+}
+@article{xie2019unsupervised,
+  title={Unsupervised data augmentation for consistency training},
+  author={Xie, Qizhe and Dai, Zihang and Hovy, Eduard and Luong, Minh-Thang and Le, Quoc V},
+  journal={arXiv preprint arXiv:1904.12848},
+  year={2019}
+}
+@article{baevski2018adaptive,
+  title={Adaptive input representations for neural language modeling},
+  author={Baevski, Alexei and Auli, Michael},
+  journal={arXiv preprint arXiv:1809.10853},
+  year={2018}
+}
+@article{wu2019emerging,
+  title={Emerging Cross-lingual Structure in Pretrained Language Models},
+  author={Wu, Shijie and Conneau, Alexis and Li, Haoran and Zettlemoyer, Luke and Stoyanov, Veselin},
+  journal={ACL},
+  year={2019}
+}
+@inproceedings{grave2017efficient,
+  title={Efficient softmax approximation for GPUs},
+  author={Grave, Edouard and Joulin, Armand and Ciss{\'e}, Moustapha and J{\'e}gou, Herv{\'e} and others},
+  booktitle={Proceedings of the 34th International Conference on Machine Learning-Volume 70},
+  pages={1302--1310},
+  year={2017},
+  organization={JMLR. org}
+}
+@article{sang2002introduction,
+  title={Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition},
+  author={Sang, Erik F},
+  journal={CoNLL},
+  year={2002}
+}
+@article{singh2019xlda,
+  title={XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering},
+  author={Singh, Jasdeep and McCann, Bryan and Keskar, Nitish Shirish and Xiong, Caiming and Socher, Richard},
+  journal={arXiv preprint arXiv:1905.11471},
+  year={2019}
+}
+@inproceedings{tjong2003introduction,
+  title={Introduction to the CoNLL-2003 shared task: language-independent named entity recognition},
+  author={Tjong Kim Sang, Erik F and De Meulder, Fien},
+  booktitle={CoNLL},
+  pages={142--147},
+  year={2003},
+  organization={Association for Computational Linguistics}
+}
+@misc{ud-v2.3,
+ title = {Universal Dependencies 2.3},
+ author = {Nivre, Joakim et al.},
+ url = {http://hdl.handle.net/11234/1-2895},
+ note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
+ copyright = {Licence Universal Dependencies v2.3},
+ year = {2018} }
+@article{huang2019unicoder,
+  title={Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks},
+  author={Huang, Haoyang and Liang, Yaobo and Duan, Nan and Gong, Ming and Shou, Linjun and Jiang, Daxin and Zhou, Ming},
+  journal={ACL},
+  year={2019}
+}
+@article{kingma2014adam,
+  title={Adam: A method for stochastic optimization},
+  author={Kingma, Diederik P and Ba, Jimmy},
+  journal={arXiv preprint arXiv:1412.6980},
+  year={2014}
+}
+@article{bojanowski2017enriching,
+  title={Enriching word vectors with subword information},
+  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
+  journal={TACL},
+  volume={5},
+  pages={135--146},
+  year={2017},
+  publisher={MIT Press}
+}
+@article{werbos1990backpropagation,
+  title={Backpropagation through time: what it does and how to do it},
+  author={Werbos, Paul J},
+  journal={Proceedings of the IEEE},
+  volume={78},
+  number={10},
+  pages={1550--1560},
+  year={1990},
+  publisher={IEEE}
+}
+@article{hochreiter1997long,
+  title={Long short-term memory},
+  author={Hochreiter, Sepp and Schmidhuber, J{\"u}rgen},
+  journal={Neural computation},
+  volume={9},
+  number={8},
+  pages={1735--1780},
+  year={1997},
+  publisher={MIT Press}
+}
+@article{al2018character,
+  title={Character-level language modeling with deeper self-attention},
+  author={Al-Rfou, Rami and Choe, Dokook and Constant, Noah and Guo, Mandy and Jones, Llion},
+  journal={arXiv preprint arXiv:1808.04444},
+  year={2018}
+}
+@misc{dai2019transformerxl,
+  title={Transformer-{XL}: Language Modeling with Longer-Term Dependency},
+  author={Zihang Dai and Zhilin Yang and Yiming Yang and William W. Cohen and Jaime Carbonell and Quoc V. Le and Ruslan Salakhutdinov},
+  year={2019},
+  url={https://openreview.net/forum?id=HJePno0cYm},
+}
+@article{jozefowicz2016exploring,
+  title={Exploring the limits of language modeling},
+  author={Jozefowicz, Rafal and Vinyals, Oriol and Schuster, Mike and Shazeer, Noam and Wu, Yonghui},
+  journal={arXiv preprint arXiv:1602.02410},
+  year={2016}
+}
+@inproceedings{mikolov2010recurrent,
+  title={Recurrent neural network based language model},
+  author={Mikolov, Tom{\'a}{\v{s}} and Karafi{\'a}t, Martin and Burget, Luk{\'a}{\v{s}} and {\v{C}}ernock{\`y}, Jan and Khudanpur, Sanjeev},
+  booktitle={Eleventh Annual Conference of the International Speech Communication Association},
+  year={2010}
+}
+@article{gehring2017convolutional,
+  title={Convolutional sequence to sequence learning},
+  author={Gehring, Jonas and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N},
+  journal={arXiv preprint arXiv:1705.03122},
+  year={2017}
+}
+@article{sennrich2016edinburgh,
+  title={Edinburgh neural machine translation systems for wmt 16},
+  author={Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
+  journal={arXiv preprint arXiv:1606.02891},
+  year={2016}
+}
+@inproceedings{howard2018universal,
+  title={Universal language model fine-tuning for text classification},
+  author={Howard, Jeremy and Ruder, Sebastian},
+  booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  volume={1},
+  pages={328--339},
+  year={2018}
+}
+@inproceedings{unsupNMTartetxe,
+  title = {Unsupervised neural machine translation},
+  author = {Mikel Artetxe and Gorka Labaka and Eneko Agirre and Kyunghyun Cho},
+  booktitle = {International Conference on Learning Representations (ICLR)},
+  year = {2018}
+}
+@inproceedings{artetxe2017learning,
+  title={Learning bilingual word embeddings with (almost) no bilingual data},
+  author={Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
+  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  volume={1},
+  pages={451--462},
+  year={2017}
+}
+@inproceedings{socher2013recursive,
+  title={Recursive deep models for semantic compositionality over a sentiment treebank},
+  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew and Potts, Christopher},
+  booktitle={EMNLP},
+  pages={1631--1642},
+  year={2013}
+}
+@inproceedings{bowman2015large,
+  title={A large annotated corpus for learning natural language inference},
+  author={Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher and Manning, Christopher D.},
+  booktitle={EMNLP},
+  year={2015}
+}
+@inproceedings{multinli:2017,
+  Title = {A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference},
+  Author = {Adina Williams and Nikita Nangia and Samuel R. Bowman},
+  Booktitle = {NAACL},
+  year = {2017}
+}
+@article{paszke2017automatic,
+  title={Automatic differentiation in pytorch},
+  author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
+  journal={NIPS 2017 Autodiff Workshop},
+  year={2017}
+}
+@inproceedings{conneau2018craminto,
+  title={What you can cram into a single vector: Probing sentence embeddings for linguistic properties},
+  author={Conneau, Alexis and Kruszewski, German and Lample, Guillaume and Barrault, Lo{\"\i}c and Baroni, Marco},
+  booktitle = {ACL},
+  year={2018}
+}
+@inproceedings{Conneau:2018:iclr_muse,
+  title={Word Translation without Parallel Data},
+  author={Alexis Conneau and Guillaume Lample and {Marc'Aurelio} Ranzato and Ludovic Denoyer and Hervé Jegou},
+  booktitle = {ICLR},
+  year={2018}
+}
+@article{johnson2017google,
+  title={Google’s multilingual neural machine translation system: Enabling zero-shot translation},
+  author={Johnson, Melvin and Schuster, Mike and Le, Quoc V and Krikun, Maxim and Wu, Yonghui and Chen, Zhifeng and Thorat, Nikhil and Vi{\'e}gas, Fernanda and Wattenberg, Martin and Corrado, Greg and others},
+  journal={TACL},
+  volume={5},
+  pages={339--351},
+  year={2017},
+  publisher={MIT Press}
+}
+@article{radford2019language,
+  title={Language models are unsupervised multitask learners},
+  author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
+  journal={OpenAI Blog},
+  volume={1},
+  number={8},
+  year={2019}
+}
+@inproceedings{unsupNMTlample,
+title = {Unsupervised machine translation using monolingual corpora only},
+author = {Lample, Guillaume and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
+booktitle = {ICLR},
+year = {2018}
+}
+@inproceedings{lample2018phrase,
+  title={Phrase-Based \& Neural Unsupervised Machine Translation},
+  author={Lample, Guillaume and Ott, Myle and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
+  booktitle={EMNLP},
+  year={2018}
+}
+@article{hendrycks2016bridging,
+  title={Bridging nonlinearities and stochastic regularizers with Gaussian error linear units},
+  author={Hendrycks, Dan and Gimpel, Kevin},
+  journal={arXiv preprint arXiv:1606.08415},
+  year={2016}
+}
+@inproceedings{chang2008optimizing,
+  title={Optimizing Chinese word segmentation for machine translation performance},
+  author={Chang, Pi-Chuan and Galley, Michel and Manning, Christopher D},
+  booktitle={Proceedings of the third workshop on statistical machine translation},
+  pages={224--232},
+  year={2008}
+}
+@inproceedings{rajpurkar-etal-2016-squad,
+    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
+    author = "Rajpurkar, Pranav  and
+      Zhang, Jian  and
+      Lopyrev, Konstantin  and
+      Liang, Percy",
+    booktitle = "EMNLP",
+    month = nov,
+    year = "2016",
+    address = "Austin, Texas",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/D16-1264",
+    doi = "10.18653/v1/D16-1264",
+    pages = "2383--2392",
+}
+@article{lewis2019mlqa,
+  title={MLQA: Evaluating Cross-lingual Extractive Question Answering},
+  author={Lewis, Patrick and O\u{g}uz, Barlas and Rinott, Ruty and Riedel, Sebastian and Schwenk, Holger},
+  journal={arXiv preprint arXiv:1910.07475},
+  year={2019}
+}
+@inproceedings{sennrich2015neural,
+  title={Neural machine translation of rare words with subword units},
+  author={Sennrich, Rico and Haddow, Barry and Birch, Alexandra},
+  booktitle={Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics},
+  pages = {1715-1725},
+  year={2015}
+}
+@article{eriguchi2018zero,
+  title={Zero-shot cross-lingual classification using multilingual neural machine translation},
+  author={Eriguchi, Akiko and Johnson, Melvin and Firat, Orhan and Kazawa, Hideto and Macherey, Wolfgang},
+  journal={arXiv preprint arXiv:1809.04686},
+  year={2018}
+}
+@article{smith2017offline,
+  title={Offline bilingual word vectors, orthogonal transformations and the inverted softmax},
+  author={Smith, Samuel L and Turban, David HP and Hamblin, Steven and Hammerla, Nils Y},
+  journal={International Conference on Learning Representations},
+  year={2017}
+}
+@article{artetxe2016learning,
+  title={Learning principled bilingual mappings of word embeddings while preserving monolingual invariance},
+  author={Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
+  journal={Proceedings of EMNLP},
+  year={2016}
+}
+@article{ammar2016massively,
+  title={Massively multilingual word embeddings},
+  author={Ammar, Waleed and Mulcaire, George and Tsvetkov, Yulia and Lample, Guillaume and Dyer, Chris and Smith, Noah A},
+  journal={arXiv preprint arXiv:1602.01925},
+  year={2016}
+}
+@article{marcobaroni2015hubness,
+  title={Hubness and pollution: Delving into cross-space mapping for zero-shot learning},
+  author={Lazaridou, Angeliki and Dinu, Georgiana and Baroni, Marco},
+  journal={Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics},
+  year={2015}
+}
+@article{xing2015normalized,
+  title={Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation},
+  author={Xing, Chao and Wang, Dong and Liu, Chao and Lin, Yiye},
+  journal={Proceedings of NAACL},
+  year={2015}
+}
+@article{faruqui2014improving,
+  title={Improving Vector Space Word Representations Using Multilingual Correlation},
+  author={Faruqui, Manaal and Dyer, Chris},
+  journal={Proceedings of EACL},
+  year={2014}
+}
+@article{taylor1953cloze,
+  title={“Cloze procedure”: A new tool for measuring readability},
+  author={Taylor, Wilson L},
+  journal={Journalism Bulletin},
+  volume={30},
+  number={4},
+  pages={415--433},
+  year={1953},
+  publisher={SAGE Publications Sage CA: Los Angeles, CA}
+}
+@inproceedings{mikolov2013distributed,
+  title={Distributed representations of words and phrases and their compositionality},
+  author={Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff},
+  booktitle={NIPS},
+  pages={3111--3119},
+  year={2013}
+}
+@article{mikolov2013exploiting,
+  title={Exploiting similarities among languages for machine translation},
+  author={Mikolov, Tomas and Le, Quoc V and Sutskever, Ilya},
+  journal={arXiv preprint arXiv:1309.4168},
+  year={2013}
+}
+@article{artetxe2018massively,
+  title={Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond},
+  author={Artetxe, Mikel and Schwenk, Holger},
+  journal={arXiv preprint arXiv:1812.10464},
+  year={2018}
+}
+@article{williams2017broad,
+  title={A broad-coverage challenge corpus for sentence understanding through inference},
+  author={Williams, Adina and Nangia, Nikita and Bowman, Samuel R},
+  journal={Proceedings of the 2nd Workshop on Evaluating Vector-Space Representations for NLP},
+  year={2017}
+}
+@InProceedings{conneau2018xnli,
+  author = "Conneau, Alexis
+        and Rinott, Ruty
+        and Lample, Guillaume
+        and Williams, Adina
+        and Bowman, Samuel R.
+        and Schwenk, Holger
+        and Stoyanov, Veselin",
+  title = "XNLI: Evaluating Cross-lingual Sentence Representations",
+  booktitle = "EMNLP",
+  year = "2018",
+  publisher = "Association for Computational Linguistics",
+  location = "Brussels, Belgium",
+}
+@article{wada2018unsupervised,
+  title={Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models},
+  author={Wada, Takashi and Iwata, Tomoharu},
+  journal={arXiv preprint arXiv:1809.02306},
+  year={2018}
+}
+@article{xu2013cross,
+  title={Cross-lingual language modeling for low-resource speech recognition},
+  author={Xu, Ping and Fung, Pascale},
+  journal={IEEE Transactions on Audio, Speech, and Language Processing},
+  volume={21},
+  number={6},
+  pages={1134--1144},
+  year={2013},
+  publisher={IEEE}
+}
+@article{hermann2014multilingual,
+  title={Multilingual models for compositional distributed semantics},
+  author={Hermann, Karl Moritz and Blunsom, Phil},
+  journal={arXiv preprint arXiv:1404.4641},
+  year={2014}
+}
+@inproceedings{transformer17,
+title = {Attention is all you need},
+author = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
+booktitle={Advances in Neural Information Processing Systems},
+pages={6000--6010},
+year = {2017}
+}
+@article{liu2019multi,
+  title={Multi-task deep neural networks for natural language understanding},
+  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
+  journal={arXiv preprint arXiv:1901.11504},
+  year={2019}
+}
+@article{wang2018glue,
+  title={GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
+  author={Wang, Alex and Singh, Amapreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R},
+  journal={arXiv preprint arXiv:1804.07461},
+  year={2018}
+}
+@article{radford2018improving,
+  title={Improving language understanding by generative pre-training},
+  author={Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya},
+  journal={URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language\_understanding\_paper.pdf},
+  url={https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf},
+  year={2018}
+}
+@article{conneau2018senteval,
+  title={SentEval: An Evaluation Toolkit for Universal Sentence Representations},
+  author={Conneau, Alexis and Kiela, Douwe},
+  journal={LREC},
+  year={2018}
+}
+@article{devlin2018bert,
+  title={Bert: Pre-training of deep bidirectional transformers for language understanding},
+  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
+  journal={NAACL},
+  year={2018}
+}
+@article{peters2018deep,
+  title={Deep contextualized word representations},
+  author={Peters, Matthew E and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke},
+  journal={NAACL},
+  year={2018}
+}
+@article{ramachandran2016unsupervised,
+  title={Unsupervised pretraining for sequence to sequence learning},
+  author={Ramachandran, Prajit and Liu, Peter J and Le, Quoc V},
+  journal={arXiv preprint arXiv:1611.02683},
+  year={2016}
+}
+@inproceedings{kunchukuttan2018iit,
+  title={The IIT Bombay English-Hindi Parallel Corpus},
+  author={Kunchukuttan Anoop and Mehta Pratik and Bhattacharyya Pushpak},
+  booktitle={LREC},
+  year={2018}
+}
+@article{wu2019beto,
+  title={Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT},
+  author={Wu, Shijie and Dredze, Mark},
+  journal={EMNLP},
+  year={2019}
+}
+@inproceedings{lample-etal-2016-neural,
+    title = "Neural Architectures for Named Entity Recognition",
+    author = "Lample, Guillaume  and
+      Ballesteros, Miguel  and
+      Subramanian, Sandeep  and
+      Kawakami, Kazuya  and
+      Dyer, Chris",
+    booktitle = "NAACL",
+    month = jun,
+    year = "2016",
+    address = "San Diego, California",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/N16-1030",
+    doi = "10.18653/v1/N16-1030",
+    pages = "260--270",
+}
+@inproceedings{akbik2018coling,
+  title={Contextual String Embeddings for Sequence Labeling},
+  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
+  booktitle = {COLING},
+  pages     = {1638--1649},
+  year      = {2018}
+}
+@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,
+    title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
+    author = "Tjong Kim Sang, Erik F.  and
+      De Meulder, Fien",
+    booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
+    year = "2003",
+    url = "https://www.aclweb.org/anthology/W03-0419",
+    pages = "142--147",
+}
+@inproceedings{tjong-kim-sang-2002-introduction,
+    title = "Introduction to the {C}o{NLL}-2002 Shared Task: Language-Independent Named Entity Recognition",
+    author = "Tjong Kim Sang, Erik F.",
+    booktitle = "{COLING}-02: The 6th Conference on Natural Language Learning 2002 ({C}o{NLL}-2002)",
+    year = "2002",
+    url = "https://www.aclweb.org/anthology/W02-2024",
+}
+@InProceedings{TIEDEMANN12.463,
+  author = {Jörg Tiedemann},
+  title = {Parallel Data, Tools and Interfaces in OPUS},
+  booktitle = {LREC},
+  year = {2012},
+  month = {may},
+  date = {23-25},
+  address = {Istanbul, Turkey},
+  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
+  publisher = {European Language Resources Association (ELRA)},
+  isbn = {978-2-9517408-7-7},
+  language = {english}
+ }
+@inproceedings{ziemski2016united,
+  title={The United Nations Parallel Corpus v1. 0.},
+  author={Ziemski, Michal and Junczys-Dowmunt, Marcin and Pouliquen, Bruno},
+  booktitle={LREC},
+  year={2016}
+}
+@article{roberta2019,
+  author    = {Yinhan Liu and
+               Myle Ott and
+               Naman Goyal and
+               Jingfei Du and
+               Mandar Joshi and
+               Danqi Chen and
+               Omer Levy and
+               Mike Lewis and
+               Luke Zettlemoyer and
+               Veselin Stoyanov},
+  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
+  journal    = {arXiv preprint arXiv:1907.11692},
+  year      = {2019}
+}
+@article{tan2019multilingual,
+  title={Multilingual neural machine translation with knowledge distillation},
+  author={Tan, Xu and Ren, Yi and He, Di and Qin, Tao and Zhao, Zhou and Liu, Tie-Yan},
+  journal={ICLR},
+  year={2019}
+}
+@article{siddhant2019evaluating,
+  title={Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation},
+  author={Siddhant, Aditya and Johnson, Melvin and Tsai, Henry and Arivazhagan, Naveen and Riesa, Jason and Bapna, Ankur and Firat, Orhan and Raman, Karthik},
+  journal={AAAI},
+  year={2019}
+}
+@inproceedings{camacho2017semeval,
+  title={Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity},
+  author={Camacho-Collados, Jose and Pilehvar, Mohammad Taher and Collier, Nigel and Navigli, Roberto},
+  booktitle={Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)},
+  pages={15--26},
+  year={2017}
+}
+@inproceedings{Pires2019HowMI,
+  title={How Multilingual is Multilingual BERT?},
+  author={Telmo Pires and Eva Schlinger and Dan Garrette},
+  booktitle={ACL},
+  year={2019}
+}
+@article{lample2019cross,
+  title={Cross-lingual language model pretraining},
+  author={Lample, Guillaume and Conneau, Alexis},
+  journal={NeurIPS},
+  year={2019}
+}
+@article{schuster2019cross,
+  title={Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing},
+  author={Schuster, Tal and Ram, Ori and Barzilay, Regina and Globerson, Amir},
+  journal={NAACL},
+  year={2019}
+}
+@inproceedings{chang2008optimizing,
+  title={Optimizing Chinese word segmentation for machine translation performance},
+  author={Chang, Pi-Chuan and Galley, Michel and Manning, Christopher D},
+  booktitle={Proceedings of the third workshop on statistical machine translation},
+  pages={224--232},
+  year={2008}
+}
+@inproceedings{koehn2007moses,
+  title={Moses: Open source toolkit for statistical machine translation},
+  author={Koehn, Philipp and Hoang, Hieu and Birch, Alexandra and Callison-Burch, Chris and Federico, Marcello and Bertoldi, Nicola and Cowan, Brooke and Shen, Wade and Moran, Christine and Zens, Richard and others},
+  booktitle={Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions},
+  pages={177--180},
+  year={2007},
+  organization={Association for Computational Linguistics}
+}
+@article{wenzek2019ccnet,
+  title={CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data},
+  author={Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzman, Francisco and Joulin, Armand and Grave, Edouard},
+  journal={arXiv preprint arXiv:1911.00359},
+  year={2019}
+}
+@inproceedings{zhou2016cross,
+  title={Cross-lingual sentiment classification with bilingual document representation learning},
+  author={Zhou, Xinjie and Wan, Xiaojun and Xiao, Jianguo},
+  booktitle={Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  pages={1403--1412},
+  year={2016}
+}
+@article{goyal2017accurate,
+  title={Accurate, large minibatch sgd: Training imagenet in 1 hour},
+  author={Goyal, Priya and Doll{\'a}r, Piotr and Girshick, Ross and Noordhuis, Pieter and Wesolowski, Lukasz and Kyrola, Aapo and Tulloch, Andrew and Jia, Yangqing and He, Kaiming},
+  journal={arXiv preprint arXiv:1706.02677},
+  year={2017}
+}
+@article{arivazhagan2019massively,
+  title={Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges},
+  author={Arivazhagan, Naveen and Bapna, Ankur and Firat, Orhan and Lepikhin, Dmitry and Johnson, Melvin and Krikun, Maxim and Chen, Mia Xu and Cao, Yuan and Foster, George and Cherry, Colin and others},
+  journal={arXiv preprint arXiv:1907.05019},
+  year={2019}
+}
+@inproceedings{pan2017cross,
+  title={Cross-lingual name tagging and linking for 282 languages},
+  author={Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng},
+  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  volume={1},
+  pages={1946--1958},
+  year={2017}
+}
+@article{raffel2019exploring,
+    title={Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
+    author={Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
+    year={2019},
+    journal={arXiv preprint arXiv:1910.10683},
+}
+@inproceedings{pennington2014glove,
+  author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning},
+  booktitle = {EMNLP},
+  title = {GloVe: Global Vectors for Word Representation},
+  year = {2014},
+  pages = {1532--1543},
+  url = {http://www.aclweb.org/anthology/D14-1162},
+}
+@article{kudo2018sentencepiece,
+  title={Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing},
+  author={Kudo, Taku and Richardson, John},
+  journal={EMNLP},
+  year={2018}
+}
+@article{rajpurkar2018know,
+  title={Know What You Don't Know: Unanswerable Questions for SQuAD},
+  author={Rajpurkar, Pranav and Jia, Robin and Liang, Percy},
+  journal={ACL},
+  year={2018}
+}
+@article{joulin2017bag,
+  title={Bag of Tricks for Efficient Text Classification},
+  author={Joulin, Armand and Grave, Edouard and Mikolov, Piotr Bojanowski Tomas},
+  journal={EACL 2017},
+  pages={427},
+  year={2017}
+}
+@inproceedings{kudo2018subword,
+  title={Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates},
+  author={Kudo, Taku},
+  booktitle={ACL},
+  pages={66--75},
+  year={2018}
+}
+@inproceedings{grave2018learning,
+  title={Learning Word Vectors for 157 Languages},
+  author={Grave, Edouard and Bojanowski, Piotr and Gupta, Prakhar and Joulin, Armand and Mikolov, Tomas},
+  booktitle={LREC},
+  year={2018}
+}

references/2019.arxiv.conneau/source/acl2020.sty ADDED Viewed

	@@ -0,0 +1,560 @@

+% This is the LaTex style file for ACL 2020, based off of ACL 2019.
+% Addressing bibtex issues mentioned in https://github.com/acl-org/acl-pub/issues/2
+% Other major modifications include
+% changing the color of the line numbers to a light gray; changing font size of abstract to be 10pt; changing caption font size to be 10pt.
+% -- M Mitchell and Stephanie Lukin
+% 2017: modified to support DOI links in bibliography.  Now uses
+% natbib package rather than defining citation commands in this file.
+% Use with acl_natbib.bst bib style.  -- Dan Gildea
+% This is the LaTeX style for ACL 2016. It contains Margaret Mitchell's
+% line number adaptations (ported by Hai Zhao and Yannick Versley).
+% It is nearly identical to the style files for ACL 2015,
+% ACL 2014, EACL 2006, ACL2005, ACL 2002, ACL 2001, ACL 2000,
+% EACL 95 and EACL 99.
+%
+% Changes made include: adapt layout to A4 and centimeters, widen abstract
+% This is the LaTeX style file for ACL 2000.  It is nearly identical to the
+% style files for EACL 95 and EACL 99.  Minor changes include editing the
+% instructions to reflect use of \documentclass rather than \documentstyle
+% and removing the white space before the title on the first page
+% -- John Chen, June 29, 2000
+% This is the LaTeX style file for EACL-95.  It is identical to the
+% style file for ANLP '94 except that the margins are adjusted for A4
+% paper.  -- abney 13 Dec 94
+% The ANLP '94 style file is a slightly modified
+% version of the style used for AAAI and IJCAI, using some changes
+% prepared by Fernando Pereira and others and some minor changes
+% by Paul Jacobs.
+% Papers prepared using the aclsub.sty file and acl.bst bibtex style
+% should be easily converted to final format using this style.
+% (1) Submission information (\wordcount, \subject, and \makeidpage)
+% should be removed.
+% (2) \summary should be removed.  The summary material should come
+% after \maketitle and should be in the ``abstract'' environment
+% (between \begin{abstract} and \end{abstract}).
+% (3) Check all citations.  This style should handle citations correctly
+% and also allows multiple citations separated by semicolons.
+% (4) Check figures and examples.  Because the final format is double-
+% column, some adjustments may have to be made to fit text in the column
+% or to choose full-width (\figure*} figures.
+% Place this in a file called aclap.sty in the TeX search path.
+% (Placing it in the same directory as the paper should also work.)
+% Prepared by Peter F. Patel-Schneider, liberally using the ideas of
+% other style hackers, including Barbara Beeton.
+% This style is NOT guaranteed to work.  It is provided in the hope
+% that it will make the preparation of papers easier.
+%
+% There are undoubtably bugs in this style.  If you make bug fixes,
+% improvements, etc.  please let me know.  My e-mail address is:
+%       pfps@research.att.com
+% Papers are to be prepared using the ``acl_natbib'' bibliography style,
+% as follows:
+%       \documentclass[11pt]{article}
+%       \usepackage{acl2000}
+%       \title{Title}
+%       \author{Author 1 \and Author 2 \\ Address line \\ Address line \And
+%               Author 3 \\ Address line \\ Address line}
+%       \begin{document}
+%       ...
+%       \bibliography{bibliography-file}
+%       \bibliographystyle{acl_natbib}
+%       \end{document}
+% Author information can be set in various styles:
+% For several authors from the same institution:
+% \author{Author 1 \and ... \and Author n \\
+%         Address line \\ ... \\ Address line}
+% if the names do not fit well on one line use
+%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
+% For authors from different institutions:
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \And  ... \And
+%         Author n \\ Address line \\ ... \\ Address line}
+% To start a seperate ``row'' of authors use \AND, as in
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \AND
+%         Author 2 \\ Address line \\ ... \\ Address line \And
+%         Author 3 \\ Address line \\ ... \\ Address line}
+% If the title and author information does not fit in the area allocated,
+% place \setlength\titlebox{<new height>} right after
+% \usepackage{acl2015}
+% where <new height> can be something larger than 5cm
+% include hyperref, unless user specifies nohyperref option like this:
+% \usepackage[nohyperref]{naaclhlt2018}
+\newif\ifacl@hyperref
+\DeclareOption{hyperref}{\acl@hyperreftrue}
+\DeclareOption{nohyperref}{\acl@hyperreffalse}
+\ExecuteOptions{hyperref} % default is to use hyperref
+\ProcessOptions\relax
+\ifacl@hyperref
+  \RequirePackage{hyperref}
+  \usepackage{xcolor}		% make links dark blue
+  \definecolor{darkblue}{rgb}{0, 0, 0.5}
+  \hypersetup{colorlinks=true,citecolor=darkblue, linkcolor=darkblue, urlcolor=darkblue}
+\else
+  % This definition is used if the hyperref package is not loaded.
+  % It provides a backup, no-op definiton of \href.
+  % This is necessary because \href command is used in the acl_natbib.bst file.
+  \def\href#1#2{{#2}}
+  % We still need to load xcolor in this case because the lighter line numbers require it. (SC/KG/WL)
+  \usepackage{xcolor}
+\fi
+\typeout{Conference Style for ACL 2019}
+% NOTE:  Some laser printers have a serious problem printing TeX output.
+% These printing devices, commonly known as ``write-white'' laser
+% printers, tend to make characters too light.  To get around this
+% problem, a darker set of fonts must be created for these devices.
+%
+\newcommand{\Thanks}[1]{\thanks{\ #1}}
+% A4 modified by Eneko; again modified by Alexander for 5cm titlebox
+\setlength{\paperwidth}{21cm}   % A4
+\setlength{\paperheight}{29.7cm}% A4
+\setlength\topmargin{-0.5cm}
+\setlength\oddsidemargin{0cm}
+\setlength\textheight{24.7cm}
+\setlength\textwidth{16.0cm}
+\setlength\columnsep{0.6cm}
+\newlength\titlebox
+\setlength\titlebox{5cm}
+\setlength\headheight{5pt}
+\setlength\headsep{0pt}
+\thispagestyle{empty}
+\pagestyle{empty}
+\flushbottom \twocolumn \sloppy
+% We're never going to need a table of contents, so just flush it to
+% save space --- suggested by drstrip@sandia-2
+\def\addcontentsline#1#2#3{}
+\newif\ifaclfinal
+\aclfinalfalse
+\def\aclfinalcopy{\global\aclfinaltrue}
+%% ----- Set up hooks to repeat content on every page of the output doc,
+%% necessary for the line numbers in the submitted version.  --MM
+%%
+%% Copied from CVPR 2015's cvpr_eso.sty, which appears to be largely copied from everyshi.sty.
+%%
+%% Original cvpr_eso.sty available at: http://www.pamitc.org/cvpr15/author_guidelines.php
+%% Original evershi.sty available at: https://www.ctan.org/pkg/everyshi
+%%
+%% Copyright (C) 2001 Martin Schr\"oder:
+%%
+%%                         Martin Schr"oder
+%%                         Cr"usemannallee 3
+%%                         D-28213 Bremen
+%%                         Martin.Schroeder@ACM.org
+%%
+%% This program may be redistributed and/or modified under the terms
+%% of the LaTeX Project Public License, either version 1.0 of this
+%% license, or (at your option) any later version.
+%% The latest version of this license is in
+%%    CTAN:macros/latex/base/lppl.txt.
+%%
+%% Happy users are requested to send [Martin] a postcard. :-)
+%%
+\newcommand{\@EveryShipoutACL@Hook}{}
+\newcommand{\@EveryShipoutACL@AtNextHook}{}
+\newcommand*{\EveryShipoutACL}[1]
+   {\g@addto@macro\@EveryShipoutACL@Hook{#1}}
+\newcommand*{\AtNextShipoutACL@}[1]
+   {\g@addto@macro\@EveryShipoutACL@AtNextHook{#1}}
+\newcommand{\@EveryShipoutACL@Shipout}{%
+   \afterassignment\@EveryShipoutACL@Test
+   \global\setbox\@cclv= %
+   }
+\newcommand{\@EveryShipoutACL@Test}{%
+   \ifvoid\@cclv\relax
+      \aftergroup\@EveryShipoutACL@Output
+   \else
+      \@EveryShipoutACL@Output
+   \fi%
+   }
+\newcommand{\@EveryShipoutACL@Output}{%
+   \@EveryShipoutACL@Hook%
+   \@EveryShipoutACL@AtNextHook%
+      \gdef\@EveryShipoutACL@AtNextHook{}%
+   \@EveryShipoutACL@Org@Shipout\box\@cclv%
+   }
+\newcommand{\@EveryShipoutACL@Org@Shipout}{}
+\newcommand*{\@EveryShipoutACL@Init}{%
+   \message{ABD: EveryShipout initializing macros}%
+   \let\@EveryShipoutACL@Org@Shipout\shipout
+   \let\shipout\@EveryShipoutACL@Shipout
+   }
+\AtBeginDocument{\@EveryShipoutACL@Init}
+%% ----- Set up for placing additional items into the submitted version --MM
+%%
+%% Based on eso-pic.sty
+%%
+%% Original available at: https://www.ctan.org/tex-archive/macros/latex/contrib/eso-pic
+%% Copyright (C) 1998-2002 by Rolf Niepraschk <niepraschk@ptb.de>
+%%
+%% Which may be distributed and/or modified under the conditions of
+%% the LaTeX Project Public License, either version 1.2 of this license
+%% or (at your option) any later version.  The latest version of this
+%% license is in:
+%%
+%%    http://www.latex-project.org/lppl.txt
+%%
+%% and version 1.2 or later is part of all distributions of LaTeX version
+%% 1999/12/01 or later.
+%%
+%% In contrast to the original, we do not include the definitions for/using:
+%% gridpicture, div[2], isMEMOIR[1], gridSetup[6][], subgridstyle{dotted}, labelfactor{}, gap{}, gridunitname{}, gridunit{}, gridlines{\thinlines}, subgridlines{\thinlines}, the {keyval} package, evenside margin, nor any definitions with 'color'.
+%%
+%% These are beyond  what is needed for the NAACL/ACL style.
+%%
+\newcommand\LenToUnit[1]{#1\@gobble}
+\newcommand\AtPageUpperLeft[1]{%
+  \begingroup
+    \@tempdima=0pt\relax\@tempdimb=\ESO@yoffsetI\relax
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtPageLowerLeft[1]{\AtPageUpperLeft{%
+  \put(0,\LenToUnit{-\paperheight}){#1}}}
+\newcommand\AtPageCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.5\paperheight}){#1}}}
+\newcommand\AtPageLowerCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-\paperheight}){#1}}}%
+\newcommand\AtPageLowishCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.96\paperheight}){#1}}}
+\newcommand\AtTextUpperLeft[1]{%
+  \begingroup
+    \setlength\@tempdima{1in}%
+    \advance\@tempdima\oddsidemargin%
+    \@tempdimb=\ESO@yoffsetI\relax\advance\@tempdimb-1in\relax%
+    \advance\@tempdimb-\topmargin%
+    \advance\@tempdimb-\headheight\advance\@tempdimb-\headsep%
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtTextLowerLeft[1]{\AtTextUpperLeft{%
+  \put(0,\LenToUnit{-\textheight}){#1}}}
+\newcommand\AtTextCenter[1]{\AtTextUpperLeft{%
+  \put(\LenToUnit{.5\textwidth},\LenToUnit{-.5\textheight}){#1}}}
+\newcommand{\ESO@HookI}{} \newcommand{\ESO@HookII}{}
+\newcommand{\ESO@HookIII}{}
+\newcommand{\AddToShipoutPicture}{%
+  \@ifstar{\g@addto@macro\ESO@HookII}{\g@addto@macro\ESO@HookI}}
+\newcommand{\ClearShipoutPicture}{\global\let\ESO@HookI\@empty}
+\newcommand{\@ShipoutPicture}{%
+  \bgroup
+    \@tempswafalse%
+    \ifx\ESO@HookI\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookII\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookIII\@empty\else\@tempswatrue\fi%
+    \if@tempswa%
+      \@tempdima=1in\@tempdimb=-\@tempdima%
+      \advance\@tempdimb\ESO@yoffsetI%
+      \unitlength=1pt%
+      \global\setbox\@cclv\vbox{%
+        \vbox{\let\protect\relax
+          \pictur@(0,0)(\strip@pt\@tempdima,\strip@pt\@tempdimb)%
+            \ESO@HookIII\ESO@HookI\ESO@HookII%
+            \global\let\ESO@HookII\@empty%
+          \endpicture}%
+          \nointerlineskip%
+        \box\@cclv}%
+    \fi
+  \egroup
+}
+\EveryShipoutACL{\@ShipoutPicture}
+\newif\ifESO@dvips\ESO@dvipsfalse
+\newif\ifESO@grid\ESO@gridfalse
+\newif\ifESO@texcoord\ESO@texcoordfalse
+\newcommand*\ESO@griddelta{}\newcommand*\ESO@griddeltaY{}
+\newcommand*\ESO@gridDelta{}\newcommand*\ESO@gridDeltaY{}
+\newcommand*\ESO@yoffsetI{}\newcommand*\ESO@yoffsetII{}
+\ifESO@texcoord
+  \def\ESO@yoffsetI{0pt}\def\ESO@yoffsetII{-\paperheight}
+  \edef\ESO@griddeltaY{-\ESO@griddelta}\edef\ESO@gridDeltaY{-\ESO@gridDelta}
+\else
+  \def\ESO@yoffsetI{\paperheight}\def\ESO@yoffsetII{0pt}
+  \edef\ESO@griddeltaY{\ESO@griddelta}\edef\ESO@gridDeltaY{\ESO@gridDelta}
+\fi
+%% ----- Submitted version markup: Page numbers, ruler, and confidentiality.  Using ideas/code from cvpr.sty 2015. --MM
+\font\aclhv  = phvb at 8pt
+%% Define vruler %%
+%\makeatletter
+\newbox\aclrulerbox
+\newcount\aclrulercount
+\newdimen\aclruleroffset
+\newdimen\cv@lineheight
+\newdimen\cv@boxheight
+\newbox\cv@tmpbox
+\newcount\cv@refno
+\newcount\cv@tot
+% NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
+\newcount\cv@tmpc@ \newcount\cv@tmpc
+\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
+\cv@tmpc=1 %
+\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
+   \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
+\ifnum#2<0\advance\cv@tmpc1\relax-\fi
+\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
+\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\makevruler[#1][#2][#3][#4][#5]{\begingroup\offinterlineskip
+\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
+\global\setbox\aclrulerbox=\vbox to \textheight{%
+{\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
+\color{gray}
+\cv@lineheight=#1\global\aclrulercount=#2%
+\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
+\cv@refno1\vskip-\cv@lineheight\vskip1ex%
+\loop\setbox\cv@tmpbox=\hbox to0cm{{\aclhv\hfil\fillzeros[#4]\aclrulercount}}%
+\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
+\advance\cv@refno1\global\advance\aclrulercount#3\relax
+\ifnum\cv@refno<\cv@tot\repeat}}\endgroup}%
+%\makeatother
+\def\aclpaperid{***}
+\def\confidential{\textcolor{black}{ACL 2020 Submission~\aclpaperid.  Confidential Review Copy.  DO NOT DISTRIBUTE.}}
+%% Page numbering, Vruler and Confidentiality %%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+% SC/KG/WL - changed line numbering to gainsboro
+\definecolor{gainsboro}{rgb}{0.8, 0.8, 0.8}
+%\def\aclruler#1{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}  %% old line
+\def\aclruler#1{\textcolor{gainsboro}{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}}
+\def\leftoffset{-2.1cm} %original: -45pt
+\def\rightoffset{17.5cm} %original: 500pt
+\ifaclfinal\else\pagenumbering{arabic}
+\AddToShipoutPicture{%
+\ifaclfinal\else
+\AtPageLowishCenter{\textcolor{black}{\thepage}}
+\aclruleroffset=\textheight
+\advance\aclruleroffset4pt
+  \AtTextUpperLeft{%
+    \put(\LenToUnit{\leftoffset},\LenToUnit{-\aclruleroffset}){%left ruler
+      \aclruler{\aclrulercount}}
+    \put(\LenToUnit{\rightoffset},\LenToUnit{-\aclruleroffset}){%right ruler
+      \aclruler{\aclrulercount}}
+  }
+  \AtTextUpperLeft{%confidential
+    \put(0,\LenToUnit{1cm}){\parbox{\textwidth}{\centering\aclhv\confidential}}
+  }
+\fi
+}
+%%%% ----- End settings for placing additional items into the submitted version --MM ----- %%%%
+%%%% ----- Begin settings for both submitted and camera-ready version ----- %%%%
+%% Title and Authors %%
+\newcommand\outauthor{
+    \begin{tabular}[t]{c}
+	\ifaclfinal
+	     \bf\@author
+	\else
+		% Avoiding common accidental de-anonymization issue. --MM
+        \bf Anonymous ACL submission
+	\fi
+    \end{tabular}}
+% Changing the expanded titlebox for submissions to 2.5 in (rather than 6.5cm)
+% and moving it to the style sheet, rather than within the example tex file. --MM
+\ifaclfinal
+\else
+	\addtolength\titlebox{.25in}
+\fi
+% Mostly taken from deproc.
+\def\maketitle{\par
+ \begingroup
+   \def\thefootnote{\fnsymbol{footnote}}
+   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}}
+   \twocolumn[\@maketitle] \@thanks
+ \endgroup
+ \setcounter{footnote}{0}
+ \let\maketitle\relax \let\@maketitle\relax
+ \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax}
+\def\@maketitle{\vbox to \titlebox{\hsize\textwidth
+ \linewidth\hsize \vskip 0.125in minus 0.125in \centering
+ {\Large\bf \@title \par} \vskip 0.2in plus 1fil minus 0.1in
+ {\def\and{\unskip\enspace{\rm and}\enspace}%
+  \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil
+           \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}%
+  \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
+          \vskip 0.25in plus 1fil minus 0.125in
+           \hbox to \linewidth\bgroup\large \hfil\hfil
+             \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}
+  \hbox to \linewidth\bgroup\large \hfil\hfil
+    \hbox to 0pt\bgroup\hss
+	\outauthor
+   \hss\egroup
+    \hfil\hfil\egroup}
+  \vskip 0.3in plus 2fil minus 0.1in
+}}
+% margins and font size for abstract
+\renewenvironment{abstract}%
+		 {\centerline{\large\bf Abstract}%
+		  \begin{list}{}%
+		     {\setlength{\rightmargin}{0.6cm}%
+		      \setlength{\leftmargin}{0.6cm}}%
+		   \item[]\ignorespaces%
+		   \@setsize\normalsize{12pt}\xpt\@xpt
+		   }%
+		 {\unskip\end{list}}
+%\renewenvironment{abstract}{\centerline{\large\bf
+% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
+% Resizing figure and table captions - SL
+\newcommand{\figcapfont}{\rm}
+\newcommand{\tabcapfont}{\rm}
+\renewcommand{\fnum@figure}{\figcapfont Figure \thefigure}
+\renewcommand{\fnum@table}{\tabcapfont Table \thetable}
+\renewcommand{\figcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+\renewcommand{\tabcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+% Support for interacting with the caption, subfigure, and subcaption packages - SL
+\usepackage{caption}
+\DeclareCaptionFont{10pt}{\fontsize{10pt}{12pt}\selectfont}
+\captionsetup{font=10pt}
+\RequirePackage{natbib}
+% for citation commands in the .tex, authors can use:
+% \citep, \citet, and \citeyearpar for compatibility with natbib, or
+% \cite, \newcite, and \shortcite for compatibility with older ACL .sty files
+\renewcommand\cite{\citep}	% to get "(Author Year)" with natbib
+\newcommand\shortcite{\citeyearpar}% to get "(Year)" with natbib
+\newcommand\newcite{\citet}	% to get "Author (Year)" with natbib
+% DK/IV: Workaround for annoying hyperref pagewrap bug
+% \RequirePackage{etoolbox}
+% \patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{\errmessage{\noexpand patch failed}}
+% bibliography
+\def\@up#1{\raise.2ex\hbox{#1}}
+% Don't put a label in the bibliography at all.  Just use the unlabeled format
+% instead.
+\def\thebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{References\@mkboth
+ {References}{References}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthebibliography=\endlist
+% Allow for a bibliography of sources of attested examples
+\def\thesourcebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{Sources of Attested Examples\@mkboth
+ {Sources of Attested Examples}{Sources of Attested Examples}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthesourcebibliography=\endlist
+% sections with less space
+\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus
+    -0.5ex minus -.2ex}{1.5ex plus 0.3ex minus .2ex}{\large\bf\raggedright}}
+\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus
+    -0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}}
+%% changed by KO to - values to get teh initial parindent right
+\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex plus
+   -0.5ex minus -.2ex}{0.5ex plus .2ex}{\normalsize\bf\raggedright}}
+\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+\def\subparagraph{\@startsection{subparagraph}{5}{\parindent}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+% Footnotes
+\footnotesep 6.65pt %
+\skip\footins 9pt plus 4pt minus 2pt
+\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt }
+\setcounter{footnote}{0}
+% Lists and paragraphs
+\parindent 1em
+\topsep 4pt plus 1pt minus 2pt
+\partopsep 1pt plus 0.5pt minus 0.5pt
+\itemsep 2pt plus 1pt minus 0.5pt
+\parsep 2pt plus 1pt minus 0.5pt
+\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
+\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em
+\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
+\def\@listi{\leftmargin\leftmargini}
+\def\@listii{\leftmargin\leftmarginii
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
+   \topsep 2pt plus 1pt minus 0.5pt
+   \parsep 1pt plus 0.5pt minus 0.5pt
+   \itemsep \parsep}
+\def\@listiii{\leftmargin\leftmarginiii
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
+    \topsep 1pt plus 0.5pt minus 0.5pt
+    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
+    \itemsep \topsep}
+\def\@listiv{\leftmargin\leftmarginiv
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
+\def\@listv{\leftmargin\leftmarginv
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
+\def\@listvi{\leftmargin\leftmarginvi
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
+\abovedisplayskip 7pt plus2pt minus5pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip  0pt plus3pt%
+\belowdisplayshortskip  4pt plus3pt minus3pt%
+% Less leading in most fonts (due to the narrow columns)
+% The choices were between 1-pt and 1.5-pt leading
+\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
+\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
+\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
+\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
+\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
+\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
+\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
+\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
+\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
+\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}

references/2019.arxiv.conneau/source/acl_natbib.bst ADDED Viewed

	@@ -0,0 +1,1975 @@

+%%% acl_natbib.bst
+%%% Modification of BibTeX style file acl_natbib_nourl.bst
+%%% ... by urlbst, version 0.7 (marked with "% urlbst")
+%%% See <http://purl.org/nxg/dist/urlbst>
+%%% Added webpage entry type, and url and lastchecked fields.
+%%% Added eprint support.
+%%% Added DOI support.
+%%% Added PUBMED support.
+%%% Added hyperref support.
+%%% Original headers follow...
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file acl_natbib_nourl.bst
+%
+% intended as input to urlbst script
+% $ ./urlbst --hyperref --inlinelinks acl_natbib_nourl.bst > acl_natbib.bst
+%
+% adapted from compling.bst
+% in order to mimic the style files for ACL conferences prior to 2017
+% by making the following three changes:
+% - for @incollection, page numbers now follow volume title.
+% - for @inproceedings, address now follows conference name.
+%	(address is intended as location of conference,
+%	 not address of publisher.)
+% - for papers with three authors, use et al. in citation
+% Dan Gildea 2017/06/08
+% - fixed a bug with format.chapter - error given if chapter is empty
+%   with inbook.
+% Shay Cohen 2018/02/16
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file compling.bst
+%
+% Intended for the journal Computational Linguistics (ACL/MIT Press)
+% Created by Ron Artstein on 2005/08/22
+% For use with <natbib.sty> for author-year citations.
+%
+% I created this file in order to allow submissions to the journal
+% Computational Linguistics using the <natbib> package for author-year
+% citations, which offers a lot more flexibility than <fullname>, CL's
+% official citation package. This file adheres strictly to the official
+% style guide available from the MIT Press:
+%
+% http://mitpress.mit.edu/journals/coli/compling_style.pdf
+%
+% This includes all the various quirks of the style guide, for example:
+% - a chapter from a monograph (@inbook) has no page numbers.
+% - an article from an edited volume (@incollection) has page numbers
+%   after the publisher and address.
+% - an article from a proceedings volume (@inproceedings) has page
+%   numbers before the publisher and address.
+%
+% Where the style guide was inconsistent or not specific enough I
+% looked at actual published articles and exercised my own judgment.
+% I noticed two inconsistencies in the style guide:
+%
+% - The style guide gives one example of an article from an edited
+%   volume with the editor's name spelled out in full, and another
+%   with the editors' names abbreviated. I chose to accept the first
+%   one as correct, since the style guide generally shuns abbreviations,
+%   and editors' names are also spelled out in some recently published
+%   articles.
+%
+% - The style guide gives one example of a reference where the word
+%   "and" between two authors is preceded by a comma. This is most
+%   likely a typo, since in all other cases with just two authors or
+%   editors there is no comma before the word "and".
+%
+% One case where the style guide is not being specific is the placement
+% of the edition number, for which no example is given. I chose to put
+% it immediately after the title, which I (subjectively) find natural,
+% and is also the place of the edition in a few recently published
+% articles.
+%
+% This file correctly reproduces all of the examples in the official
+% style guide, except for the two inconsistencies noted above. I even
+% managed to get it to correctly format the proceedings example which
+% has an organization, a publisher, and two addresses (the conference
+% location and the publisher's address), though I cheated a bit by
+% putting the conference location and month as part of the title field;
+% I feel that in this case the conference location and month can be
+% considered as part of the title, and that adding a location field
+% is not justified. Note also that a location field is not standard,
+% so entries made with this field would not port nicely to other styles.
+% However, if authors feel that there's a need for a location field
+% then tell me and I'll see what I can do.
+%
+% The file also produces to my satisfaction all the bibliographical
+% entries in my recent (joint) submission to CL (this was the original
+% motivation for creating the file). I also tested it by running it
+% on a larger set of entries and eyeballing the results. There may of
+% course still be errors, especially with combinations of fields that
+% are not that common, or with cross-references (which I seldom use).
+% If you find such errors please write to me.
+%
+% I hope people find this file useful. Please email me with comments
+% and suggestions.
+%
+% Ron Artstein
+% artstein [at] essex.ac.uk
+% August 22, 2005.
+%
+% Some technical notes.
+%
+% This file is based on a file generated with the package <custom-bib>
+% by Patrick W. Daly (see selected options below), which was then
+% manually customized to conform with certain CL requirements which
+% cannot be met by <custom-bib>. Departures from the generated file
+% include:
+%
+% Function inbook: moved publisher and address to the end; moved
+% edition after title; replaced function format.chapter.pages by
+% new function format.chapter to output chapter without pages.
+%
+% Function inproceedings: moved publisher and address to the end;
+% replaced function format.in.ed.booktitle by new function
+% format.in.booktitle to output the proceedings title without
+% the editor.
+%
+% Functions book, incollection, manual: moved edition after title.
+%
+% Function mastersthesis: formatted title as for articles (unlike
+% phdthesis which is formatted as book) and added month.
+%
+% Function proceedings: added new.sentence between organization and
+% publisher when both are present.
+%
+% Function format.lab.names: modified so that it gives all the
+% authors' surnames for in-text citations for one, two and three
+% authors and only uses "et. al" for works with four authors or more
+% (thanks to Ken Shan for convincing me to go through the trouble of
+% modifying this function rather than using unreliable hacks).
+%
+% Changes:
+%
+% 2006-10-27: Changed function reverse.pass so that the extra label is
+% enclosed in parentheses when the year field ends in an uppercase or
+% lowercase letter (change modeled after Uli Sauerland's modification
+% of nals.bst). RA.
+%
+%
+% The preamble of the generated file begins below:
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%
+%% This is file `compling.bst',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% merlin.mbs  (with options: `ay,nat,vonx,nm-revv1,jnrlst,keyxyr,blkyear,dt-beg,yr-per,note-yr,num-xser,pre-pub,xedn,nfss')
+%% ----------------------------------------
+%% *** Intended for the journal Computational Linguistics ***
+%%
+%% Copyright 1994-2002 Patrick W Daly
+ % ===============================================================
+ % IMPORTANT NOTICE:
+ % This bibliographic style (bst) file has been generated from one or
+ % more master bibliographic style (mbs) files, listed above.
+ %
+ % This generated file can be redistributed and/or modified under the terms
+ % of the LaTeX Project Public License Distributed from CTAN
+ % archives in directory macros/latex/base/lppl.txt; either
+ % version 1 of the License, or any later version.
+ % ===============================================================
+ % Name and version information of the main mbs file:
+ % \ProvidesFile{merlin.mbs}[2002/10/21 4.05 (PWD, AO, DPC)]
+ %   For use with BibTeX version 0.99a or later
+ %-------------------------------------------------------------------
+ % This bibliography style file is intended for texts in ENGLISH
+ % This is an author-year citation style bibliography. As such, it is
+ % non-standard LaTeX, and requires a special package file to function properly.
+ % Such a package is    natbib.sty   by Patrick W. Daly
+ % The form of the \bibitem entries is
+ %   \bibitem[Jones et al.(1990)]{key}...
+ %   \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}...
+ % The essential feature is that the label (the part in brackets) consists
+ % of the author names, as they should appear in the citation, with the year
+ % in parentheses following. There must be no space before the opening
+ % parenthesis!
+ % With natbib v5.3, a full list of authors may also follow the year.
+ % In natbib.sty, it is possible to define the type of enclosures that is
+ % really wanted (brackets or parentheses), but in either case, there must
+ % be parentheses in the label.
+ % The \cite command functions as follows:
+ %   \citet{key} ==>>                Jones et al. (1990)
+ %   \citet*{key} ==>>               Jones, Baker, and Smith (1990)
+ %   \citep{key} ==>>                (Jones et al., 1990)
+ %   \citep*{key} ==>>               (Jones, Baker, and Smith, 1990)
+ %   \citep[chap. 2]{key} ==>>       (Jones et al., 1990, chap. 2)
+ %   \citep[e.g.][]{key} ==>>        (e.g. Jones et al., 1990)
+ %   \citep[e.g.][p. 32]{key} ==>>   (e.g. Jones et al., p. 32)
+ %   \citeauthor{key} ==>>           Jones et al.
+ %   \citeauthor*{key} ==>>          Jones, Baker, and Smith
+ %   \citeyear{key} ==>>             1990
+ %---------------------------------------------------------------------
+ENTRY
+  { address
+    author
+    booktitle
+    chapter
+    edition
+    editor
+    howpublished
+    institution
+    journal
+    key
+    month
+    note
+    number
+    organization
+    pages
+    publisher
+    school
+    series
+    title
+    type
+    volume
+    year
+    eprint % urlbst
+    doi % urlbst
+    pubmed % urlbst
+    url % urlbst
+    lastchecked % urlbst
+  }
+  {}
+  { label extra.label sort.label short.list }
+INTEGERS { output.state before.all mid.sentence after.sentence after.block }
+% urlbst...
+% urlbst constants and state variables
+STRINGS { urlintro
+  eprinturl eprintprefix doiprefix doiurl pubmedprefix pubmedurl
+  citedstring onlinestring linktextstring
+  openinlinelink closeinlinelink }
+INTEGERS { hrefform inlinelinks makeinlinelink
+  addeprints adddoiresolver addpubmedresolver }
+FUNCTION {init.urlbst.variables}
+{
+  % The following constants may be adjusted by hand, if desired
+  % The first set allow you to enable or disable certain functionality.
+  #1 'addeprints :=         % 0=no eprints; 1=include eprints
+  #1 'adddoiresolver :=     % 0=no DOI resolver; 1=include it
+  #1 'addpubmedresolver :=     % 0=no PUBMED resolver; 1=include it
+  #2 'hrefform :=           % 0=no crossrefs; 1=hypertex xrefs; 2=hyperref refs
+  #1 'inlinelinks :=        % 0=URLs explicit; 1=URLs attached to titles
+  % String constants, which you _might_ want to tweak.
+  "URL: " 'urlintro := % prefix before URL; typically "Available from:" or "URL":
+  "online" 'onlinestring := % indication that resource is online; typically "online"
+  "cited " 'citedstring := % indicator of citation date; typically "cited "
+  "[link]" 'linktextstring := % dummy link text; typically "[link]"
+  "http://arxiv.org/abs/" 'eprinturl := % prefix to make URL from eprint ref
+  "arXiv:" 'eprintprefix := % text prefix printed before eprint ref; typically "arXiv:"
+  "https://doi.org/" 'doiurl := % prefix to make URL from DOI
+  "doi:" 'doiprefix :=      % text prefix printed before DOI ref; typically "doi:"
+  "http://www.ncbi.nlm.nih.gov/pubmed/" 'pubmedurl := % prefix to make URL from PUBMED
+  "PMID:" 'pubmedprefix :=      % text prefix printed before PUBMED ref; typically "PMID:"
+  % The following are internal state variables, not configuration constants,
+  % so they shouldn't be fiddled with.
+  #0 'makeinlinelink :=     % state variable managed by possibly.setup.inlinelink
+  "" 'openinlinelink :=     % ditto
+  "" 'closeinlinelink :=    % ditto
+}
+INTEGERS {
+  bracket.state
+  outside.brackets
+  open.brackets
+  within.brackets
+  close.brackets
+}
+% ...urlbst to here
+FUNCTION {init.state.consts}
+{ #0 'outside.brackets := % urlbst...
+  #1 'open.brackets :=
+  #2 'within.brackets :=
+  #3 'close.brackets := % ...urlbst to here
+  #0 'before.all :=
+  #1 'mid.sentence :=
+  #2 'after.sentence :=
+  #3 'after.block :=
+}
+STRINGS { s t}
+% urlbst
+FUNCTION {output.nonnull.original}
+{ 's :=
+  output.state mid.sentence =
+    { ", " * write$ }
+    { output.state after.block =
+        { add.period$ write$
+          newline$
+          "\newblock " write$
+        }
+        { output.state before.all =
+            'write$
+            { add.period$ " " * write$ }
+          if$
+        }
+      if$
+      mid.sentence 'output.state :=
+    }
+  if$
+  s
+}
+% urlbst...
+% The following three functions are for handling inlinelink.  They wrap
+% a block of text which is potentially output with write$ by multiple
+% other functions, so we don't know the content a priori.
+% They communicate between each other using the variables makeinlinelink
+% (which is true if a link should be made), and closeinlinelink (which holds
+% the string which should close any current link.  They can be called
+% at any time, but start.inlinelink will be a no-op unless something has
+% previously set makeinlinelink true, and the two ...end.inlinelink functions
+% will only do their stuff if start.inlinelink has previously set
+% closeinlinelink to be non-empty.
+% (thanks to 'ijvm' for suggested code here)
+FUNCTION {uand}
+{ 'skip$ { pop$ #0 } if$ } % 'and' (which isn't defined at this point in the file)
+FUNCTION {possibly.setup.inlinelink}
+{ makeinlinelink hrefform #0 > uand
+    { doi empty$ adddoiresolver uand
+        { pubmed empty$ addpubmedresolver uand
+            { eprint empty$ addeprints uand
+                { url empty$
+                    { "" }
+                    { url }
+                  if$ }
+                { eprinturl eprint * }
+              if$ }
+            { pubmedurl pubmed * }
+          if$ }
+        { doiurl doi * }
+      if$
+      % an appropriately-formatted URL is now on the stack
+      hrefform #1 = % hypertex
+        { "\special {html:<a href=" quote$ * swap$ * quote$ * "> }{" * 'openinlinelink :=
+          "\special {html:</a>}" 'closeinlinelink := }
+        { "\href {" swap$ * "} {" * 'openinlinelink := % hrefform=#2 -- hyperref
+          % the space between "} {" matters: a URL of just the right length can cause "\% newline em"
+          "}" 'closeinlinelink := }
+      if$
+      #0 'makeinlinelink :=
+      }
+    'skip$
+  if$ % makeinlinelink
+}
+FUNCTION {add.inlinelink}
+{ openinlinelink empty$
+    'skip$
+    { openinlinelink swap$ * closeinlinelink *
+      "" 'openinlinelink :=
+      }
+  if$
+}
+FUNCTION {output.nonnull}
+{ % Save the thing we've been asked to output
+  's :=
+  % If the bracket-state is close.brackets, then add a close-bracket to
+  % what is currently at the top of the stack, and set bracket.state
+  % to outside.brackets
+  bracket.state close.brackets =
+    { "]" *
+      outside.brackets 'bracket.state :=
+    }
+    'skip$
+  if$
+  bracket.state outside.brackets =
+    { % We're outside all brackets -- this is the normal situation.
+      % Write out what's currently at the top of the stack, using the
+      % original output.nonnull function.
+      s
+      add.inlinelink
+      output.nonnull.original % invoke the original output.nonnull
+    }
+    { % Still in brackets.  Add open-bracket or (continuation) comma, add the
+      % new text (in s) to the top of the stack, and move to the close-brackets
+      % state, ready for next time (unless inbrackets resets it).  If we come
+      % into this branch, then output.state is carefully undisturbed.
+      bracket.state open.brackets =
+        { " [" * }
+        { ", " * } % bracket.state will be within.brackets
+      if$
+      s *
+      close.brackets 'bracket.state :=
+    }
+  if$
+}
+% Call this function just before adding something which should be presented in
+% brackets.  bracket.state is handled specially within output.nonnull.
+FUNCTION {inbrackets}
+{ bracket.state close.brackets =
+    { within.brackets 'bracket.state := } % reset the state: not open nor closed
+    { open.brackets 'bracket.state := }
+  if$
+}
+FUNCTION {format.lastchecked}
+{ lastchecked empty$
+    { "" }
+    { inbrackets citedstring lastchecked * }
+  if$
+}
+% ...urlbst to here
+FUNCTION {output}
+{ duplicate$ empty$
+    'pop$
+    'output.nonnull
+  if$
+}
+FUNCTION {output.check}
+{ 't :=
+  duplicate$ empty$
+    { pop$ "empty " t * " in " * cite$ * warning$ }
+    'output.nonnull
+  if$
+}
+FUNCTION {fin.entry.original} % urlbst (renamed from fin.entry, so it can be wrapped below)
+{ add.period$
+  write$
+  newline$
+}
+FUNCTION {new.block}
+{ output.state before.all =
+    'skip$
+    { after.block 'output.state := }
+  if$
+}
+FUNCTION {new.sentence}
+{ output.state after.block =
+    'skip$
+    { output.state before.all =
+        'skip$
+        { after.sentence 'output.state := }
+      if$
+    }
+  if$
+}
+FUNCTION {add.blank}
+{  " " * before.all 'output.state :=
+}
+FUNCTION {date.block}
+{
+  new.block
+}
+FUNCTION {not}
+{   { #0 }
+    { #1 }
+  if$
+}
+FUNCTION {and}
+{   'skip$
+    { pop$ #0 }
+  if$
+}
+FUNCTION {or}
+{   { pop$ #1 }
+    'skip$
+  if$
+}
+FUNCTION {new.block.checkb}
+{ empty$
+  swap$ empty$
+  and
+    'skip$
+    'new.block
+  if$
+}
+FUNCTION {field.or.null}
+{ duplicate$ empty$
+    { pop$ "" }
+    'skip$
+  if$
+}
+FUNCTION {emphasize}
+{ duplicate$ empty$
+    { pop$ "" }
+    { "\emph{" swap$ * "}" * }
+  if$
+}
+FUNCTION {tie.or.space.prefix}
+{ duplicate$ text.length$ #3 <
+    { "~" }
+    { " " }
+  if$
+  swap$
+}
+FUNCTION {capitalize}
+{ "u" change.case$ "t" change.case$ }
+FUNCTION {space.word}
+{ " " swap$ * " " * }
+ % Here are the language-specific definitions for explicit words.
+ % Each function has a name bbl.xxx where xxx is the English word.
+ % The language selected here is ENGLISH
+FUNCTION {bbl.and}
+{ "and"}
+FUNCTION {bbl.etal}
+{ "et~al." }
+FUNCTION {bbl.editors}
+{ "editors" }
+FUNCTION {bbl.editor}
+{ "editor" }
+FUNCTION {bbl.edby}
+{ "edited by" }
+FUNCTION {bbl.edition}
+{ "edition" }
+FUNCTION {bbl.volume}
+{ "volume" }
+FUNCTION {bbl.of}
+{ "of" }
+FUNCTION {bbl.number}
+{ "number" }
+FUNCTION {bbl.nr}
+{ "no." }
+FUNCTION {bbl.in}
+{ "in" }
+FUNCTION {bbl.pages}
+{ "pages" }
+FUNCTION {bbl.page}
+{ "page" }
+FUNCTION {bbl.chapter}
+{ "chapter" }
+FUNCTION {bbl.techrep}
+{ "Technical Report" }
+FUNCTION {bbl.mthesis}
+{ "Master's thesis" }
+FUNCTION {bbl.phdthesis}
+{ "Ph.D. thesis" }
+MACRO {jan} {"January"}
+MACRO {feb} {"February"}
+MACRO {mar} {"March"}
+MACRO {apr} {"April"}
+MACRO {may} {"May"}
+MACRO {jun} {"June"}
+MACRO {jul} {"July"}
+MACRO {aug} {"August"}
+MACRO {sep} {"September"}
+MACRO {oct} {"October"}
+MACRO {nov} {"November"}
+MACRO {dec} {"December"}
+MACRO {acmcs} {"ACM Computing Surveys"}
+MACRO {acta} {"Acta Informatica"}
+MACRO {cacm} {"Communications of the ACM"}
+MACRO {ibmjrd} {"IBM Journal of Research and Development"}
+MACRO {ibmsj} {"IBM Systems Journal"}
+MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
+MACRO {ieeetc} {"IEEE Transactions on Computers"}
+MACRO {ieeetcad}
+ {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
+MACRO {ipl} {"Information Processing Letters"}
+MACRO {jacm} {"Journal of the ACM"}
+MACRO {jcss} {"Journal of Computer and System Sciences"}
+MACRO {scp} {"Science of Computer Programming"}
+MACRO {sicomp} {"SIAM Journal on Computing"}
+MACRO {tocs} {"ACM Transactions on Computer Systems"}
+MACRO {tods} {"ACM Transactions on Database Systems"}
+MACRO {tog} {"ACM Transactions on Graphics"}
+MACRO {toms} {"ACM Transactions on Mathematical Software"}
+MACRO {toois} {"ACM Transactions on Office Information Systems"}
+MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
+MACRO {tcs} {"Theoretical Computer Science"}
+FUNCTION {bibinfo.check}
+{ swap$
+  duplicate$ missing$
+    {
+      pop$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ pop$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {bibinfo.warn}
+{ swap$
+  duplicate$ missing$
+    {
+      swap$ "missing " swap$ * " in " * cite$ * warning$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ "empty " swap$ * " in " * cite$ * warning$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+STRINGS  { bibinfo}
+INTEGERS { nameptr namesleft numnames }
+FUNCTION {format.names}
+{ 'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      duplicate$ #1 >
+        { "{ff~}{vv~}{ll}{, jj}" }
+        { "{ff~}{vv~}{ll}{, jj}" }	% first name first for first author
+%        { "{vv~}{ll}{, ff}{, jj}" }	% last name first for first author
+      if$
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.names.ed}
+{
+  'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ff~}{vv~}{ll}{, jj}"
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.key}
+{ empty$
+    { key field.or.null }
+    { "" }
+  if$
+}
+FUNCTION {format.authors}
+{ author "author" format.names
+}
+FUNCTION {get.bbl.editor}
+{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ }
+FUNCTION {format.editors}
+{ editor "editor" format.names duplicate$ empty$ 'skip$
+    {
+      "," *
+      " " *
+      get.bbl.editor
+      *
+    }
+  if$
+}
+FUNCTION {format.note}
+{
+ note empty$
+    { "" }
+    { note #1 #1 substring$
+      duplicate$ "{" =
+        'skip$
+        { output.state mid.sentence =
+          { "l" }
+          { "u" }
+        if$
+        change.case$
+        }
+      if$
+      note #2 global.max$ substring$ * "note" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.title}
+{ title
+  duplicate$ empty$ 'skip$
+    { "t" change.case$ }
+  if$
+  "title" bibinfo.check
+}
+FUNCTION {format.full.names}
+{'s :=
+ "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv~}{ll}" format.name$
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  numnames #2 >
+                    { "," * }
+                    'skip$
+                  if$
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {author.editor.key.full}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.full.names }
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {author.key.full}
+{ author empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {editor.key.full}
+{ editor empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { editor format.full.names }
+  if$
+}
+FUNCTION {make.full.names}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.full
+    { type$ "proceedings" =
+        'editor.key.full
+        'author.key.full
+      if$
+    }
+  if$
+}
+FUNCTION {output.bibitem.original} % urlbst (renamed from output.bibitem, so it can be wrapped below)
+{ newline$
+  "\bibitem[{" write$
+  label write$
+  ")" make.full.names duplicate$ short.list =
+     { pop$ }
+     { * }
+   if$
+  "}]{" * write$
+  cite$ write$
+  "}" write$
+  newline$
+  ""
+  before.all 'output.state :=
+}
+FUNCTION {n.dashify}
+{
+  't :=
+  ""
+    { t empty$ not }
+    { t #1 #1 substring$ "-" =
+        { t #1 #2 substring$ "--" = not
+            { "--" *
+              t #2 global.max$ substring$ 't :=
+            }
+            {   { t #1 #1 substring$ "-" = }
+                { "-" *
+                  t #2 global.max$ substring$ 't :=
+                }
+              while$
+            }
+          if$
+        }
+        { t #1 #1 substring$ *
+          t #2 global.max$ substring$ 't :=
+        }
+      if$
+    }
+  while$
+}
+FUNCTION {word.in}
+{ bbl.in capitalize
+  " " * }
+FUNCTION {format.date}
+{ year "year" bibinfo.check duplicate$ empty$
+    {
+    }
+    'skip$
+  if$
+  extra.label *
+  before.all 'output.state :=
+  after.sentence 'output.state :=
+}
+FUNCTION {format.btitle}
+{ title "title" bibinfo.check
+  duplicate$ empty$ 'skip$
+    {
+      emphasize
+    }
+  if$
+}
+FUNCTION {either.or.check}
+{ empty$
+    'pop$
+    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
+  if$
+}
+FUNCTION {format.bvolume}
+{ volume empty$
+    { "" }
+    { bbl.volume volume tie.or.space.prefix
+      "volume" bibinfo.check * *
+      series "series" bibinfo.check
+      duplicate$ empty$ 'pop$
+        { swap$ bbl.of space.word * swap$
+          emphasize * }
+      if$
+      "volume and number" number either.or.check
+    }
+  if$
+}
+FUNCTION {format.number.series}
+{ volume empty$
+    { number empty$
+        { series field.or.null }
+        { series empty$
+            { number "number" bibinfo.check }
+        { output.state mid.sentence =
+            { bbl.number }
+            { bbl.number capitalize }
+          if$
+          number tie.or.space.prefix "number" bibinfo.check * *
+          bbl.in space.word *
+          series "series" bibinfo.check *
+        }
+      if$
+    }
+      if$
+    }
+    { "" }
+  if$
+}
+FUNCTION {format.edition}
+{ edition duplicate$ empty$ 'skip$
+    {
+      output.state mid.sentence =
+        { "l" }
+        { "t" }
+      if$ change.case$
+      "edition" bibinfo.check
+      " " * bbl.edition *
+    }
+  if$
+}
+INTEGERS { multiresult }
+FUNCTION {multi.page.check}
+{ 't :=
+  #0 'multiresult :=
+    { multiresult not
+      t empty$ not
+      and
+    }
+    { t #1 #1 substring$
+      duplicate$ "-" =
+      swap$ duplicate$ "," =
+      swap$ "+" =
+      or or
+        { #1 'multiresult := }
+        { t #2 global.max$ substring$ 't := }
+      if$
+    }
+  while$
+  multiresult
+}
+FUNCTION {format.pages}
+{ pages duplicate$ empty$ 'skip$
+    { duplicate$ multi.page.check
+        {
+          bbl.pages swap$
+          n.dashify
+        }
+        {
+          bbl.page swap$
+        }
+      if$
+      tie.or.space.prefix
+      "pages" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.journal.pages}
+{ pages duplicate$ empty$ 'pop$
+    { swap$ duplicate$ empty$
+        { pop$ pop$ format.pages }
+        {
+          ":" *
+          swap$
+          n.dashify
+          "pages" bibinfo.check
+          *
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.vol.num.pages}
+{ volume field.or.null
+  duplicate$ empty$ 'skip$
+    {
+      "volume" bibinfo.check
+    }
+  if$
+  number "number" bibinfo.check duplicate$ empty$ 'skip$
+    {
+      swap$ duplicate$ empty$
+        { "there's a number but no volume in " cite$ * warning$ }
+        'skip$
+      if$
+      swap$
+      "(" swap$ * ")" *
+    }
+  if$ *
+  format.journal.pages
+}
+FUNCTION {format.chapter}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.chapter.pages}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+      pages empty$
+        'skip$
+        { ", " * format.pages * }
+      if$
+    }
+  if$
+}
+FUNCTION {format.booktitle}
+{
+  booktitle "booktitle" bibinfo.check
+  emphasize
+}
+FUNCTION {format.in.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.in.ed.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      editor "editor" format.names.ed duplicate$ empty$ 'pop$
+        {
+          "," *
+          " " *
+          get.bbl.editor
+          ", " *
+          * swap$
+          * }
+      if$
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.thesis.type}
+{ type duplicate$ empty$
+    'pop$
+    { swap$ pop$
+      "t" change.case$ "type" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.tr.number}
+{ number "number" bibinfo.check
+  type duplicate$ empty$
+    { pop$ bbl.techrep }
+    'skip$
+  if$
+  "type" bibinfo.check
+  swap$ duplicate$ empty$
+    { pop$ "t" change.case$ }
+    { tie.or.space.prefix * * }
+  if$
+}
+FUNCTION {format.article.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.book.crossref}
+{ volume duplicate$ empty$
+    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
+      pop$ word.in
+    }
+    { bbl.volume
+      capitalize
+      swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word *
+    }
+  if$
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.incoll.inproc.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.org.or.pub}
+{ 't :=
+  ""
+  address empty$ t empty$ and
+    'skip$
+    {
+      t empty$
+        { address "address" bibinfo.check *
+        }
+        { t *
+          address empty$
+            'skip$
+            { ", " * address "address" bibinfo.check * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.publisher.address}
+{ publisher "publisher" bibinfo.warn format.org.or.pub
+}
+FUNCTION {format.organization.address}
+{ organization "organization" bibinfo.check format.org.or.pub
+}
+% urlbst...
+% Functions for making hypertext links.
+% In all cases, the stack has (link-text href-url)
+%
+% make 'null' specials
+FUNCTION {make.href.null}
+{
+  pop$
+}
+% make hypertex specials
+FUNCTION {make.href.hypertex}
+{
+  "\special {html:<a href=" quote$ *
+  swap$ * quote$ * "> }" * swap$ *
+  "\special {html:</a>}" *
+}
+% make hyperref specials
+FUNCTION {make.href.hyperref}
+{
+  "\href {" swap$ * "} {\path{" * swap$ * "}}" *
+}
+FUNCTION {make.href}
+{ hrefform #2 =
+    'make.href.hyperref      % hrefform = 2
+    { hrefform #1 =
+        'make.href.hypertex  % hrefform = 1
+        'make.href.null      % hrefform = 0 (or anything else)
+      if$
+    }
+  if$
+}
+% If inlinelinks is true, then format.url should be a no-op, since it's
+% (a) redundant, and (b) could end up as a link-within-a-link.
+FUNCTION {format.url}
+{ inlinelinks #1 = url empty$ or
+   { "" }
+   { hrefform #1 =
+       { % special case -- add HyperTeX specials
+         urlintro "\url{" url * "}" * url make.href.hypertex * }
+       { urlintro "\url{" * url * "}" * }
+     if$
+   }
+  if$
+}
+FUNCTION {format.eprint}
+{ eprint empty$
+    { "" }
+    { eprintprefix eprint * eprinturl eprint * make.href }
+  if$
+}
+FUNCTION {format.doi}
+{ doi empty$
+    { "" }
+    { doiprefix doi * doiurl doi * make.href }
+  if$
+}
+FUNCTION {format.pubmed}
+{ pubmed empty$
+    { "" }
+    { pubmedprefix pubmed * pubmedurl pubmed * make.href }
+  if$
+}
+% Output a URL.  We can't use the more normal idiom (something like
+% `format.url output'), because the `inbrackets' within
+% format.lastchecked applies to everything between calls to `output',
+% so that `format.url format.lastchecked * output' ends up with both
+% the URL and the lastchecked in brackets.
+FUNCTION {output.url}
+{ url empty$
+    'skip$
+    { new.block
+      format.url output
+      format.lastchecked output
+    }
+  if$
+}
+FUNCTION {output.web.refs}
+{
+  new.block
+  inlinelinks
+    'skip$ % links were inline -- don't repeat them
+    {
+      output.url
+      addeprints eprint empty$ not and
+        { format.eprint output.nonnull }
+        'skip$
+      if$
+      adddoiresolver doi empty$ not and
+        { format.doi output.nonnull }
+        'skip$
+      if$
+      addpubmedresolver pubmed empty$ not and
+        { format.pubmed output.nonnull }
+        'skip$
+      if$
+    }
+  if$
+}
+% Wrapper for output.bibitem.original.
+% If the URL field is not empty, set makeinlinelink to be true,
+% so that an inline link will be started at the next opportunity
+FUNCTION {output.bibitem}
+{ outside.brackets 'bracket.state :=
+  output.bibitem.original
+  inlinelinks url empty$ not doi empty$ not or pubmed empty$ not or eprint empty$ not or and
+    { #1 'makeinlinelink := }
+    { #0 'makeinlinelink := }
+  if$
+}
+% Wrapper for fin.entry.original
+FUNCTION {fin.entry}
+{ output.web.refs  % urlbst
+  makeinlinelink       % ooops, it appears we didn't have a title for inlinelink
+    { possibly.setup.inlinelink % add some artificial link text here, as a fallback
+      linktextstring output.nonnull }
+    'skip$
+  if$
+  bracket.state close.brackets = % urlbst
+    { "]" * }
+    'skip$
+  if$
+  fin.entry.original
+}
+% Webpage entry type.
+% Title and url fields required;
+% author, note, year, month, and lastchecked fields optional
+% See references
+%   ISO 690-2 http://www.nlc-bnc.ca/iso/tc46sc9/standard/690-2e.htm
+%   http://www.classroom.net/classroom/CitingNetResources.html
+%   http://neal.ctstateu.edu/history/cite.html
+%   http://www.cas.usf.edu/english/walker/mla.html
+% for citation formats for web pages.
+FUNCTION {webpage}
+{ output.bibitem
+  author empty$
+    { editor empty$
+        'skip$  % author and editor both optional
+        { format.editors output.nonnull }
+      if$
+    }
+    { editor empty$
+        { format.authors output.nonnull }
+        { "can't use both author and editor fields in " cite$ * warning$ }
+      if$
+    }
+  if$
+  new.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$
+  format.title "title" output.check
+  inbrackets onlinestring output
+  new.block
+  year empty$
+    'skip$
+    { format.date "year" output.check }
+  if$
+  % We don't need to output the URL details ('lastchecked' and 'url'),
+  % because fin.entry does that for us, using output.web.refs.  The only
+  % reason we would want to put them here is if we were to decide that
+  % they should go in front of the rather miscellaneous information in 'note'.
+  new.block
+  note output
+  fin.entry
+}
+% ...urlbst to here
+FUNCTION {article}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    {
+      journal
+      "journal" bibinfo.check
+      emphasize
+      "journal" output.check
+      possibly.setup.inlinelink format.vol.num.pages output% urlbst
+    }
+    { format.article.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {book}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    { format.bvolume output
+      new.block
+      format.number.series output
+      new.sentence
+      format.publisher.address output
+    }
+    {
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {booklet}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inbook}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    {
+      format.bvolume output
+      format.number.series output
+      format.chapter "chapter" output.check
+      new.sentence
+      format.publisher.address output
+      new.block
+    }
+    {
+      format.chapter "chapter" output.check
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {incollection}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.ed.booktitle "booktitle" output.check
+      format.edition output
+      format.bvolume output
+      format.number.series output
+      format.chapter.pages output
+      new.sentence
+      format.publisher.address output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.chapter.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inproceedings}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.booktitle "booktitle" output.check
+      format.bvolume output
+      format.number.series output
+      format.pages output
+      address "address" bibinfo.check output
+      new.sentence
+      organization "organization" bibinfo.check output
+      publisher "publisher" bibinfo.check output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {conference} { inproceedings }
+FUNCTION {manual}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  organization address new.block.checkb
+  organization "organization" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {mastersthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  bbl.mthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  month "month" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {misc}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title output
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {phdthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle
+  "title" output.check
+  new.block
+  bbl.phdthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {proceedings}
+{ output.bibitem
+  format.editors output
+  editor format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.bvolume output
+  format.number.series output
+  new.sentence
+  publisher empty$
+    { format.organization.address output }
+    { organization "organization" bibinfo.check output
+      new.sentence
+      format.publisher.address output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {techreport}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  format.tr.number output.nonnull
+  institution "institution" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {unpublished}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  format.note "note" output.check
+  fin.entry
+}
+FUNCTION {default.type} { misc }
+READ
+FUNCTION {sortify}
+{ purify$
+  "l" change.case$
+}
+INTEGERS { len }
+FUNCTION {chop.word}
+{ 's :=
+  'len :=
+  s #1 len substring$ =
+    { s len #1 + global.max$ substring$ }
+    's
+  if$
+}
+FUNCTION {format.lab.names}
+{ 's :=
+  "" 't :=
+  s #1 "{vv~}{ll}" format.name$
+  s num.names$ duplicate$
+  #2 >
+    { pop$
+      " " * bbl.etal *
+    }
+    { #2 <
+        'skip$
+        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
+            {
+              " " * bbl.etal *
+            }
+            { bbl.and space.word * s #2 "{vv~}{ll}" format.name$
+              * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {author.key.label}
+{ author empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {author.editor.key.label}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.lab.names }
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {editor.key.label}
+{ editor empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { editor format.lab.names }
+  if$
+}
+FUNCTION {calc.short.authors}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.label
+    { type$ "proceedings" =
+        'editor.key.label
+        'author.key.label
+      if$
+    }
+  if$
+  'short.list :=
+}
+FUNCTION {calc.label}
+{ calc.short.authors
+  short.list
+  "("
+  *
+  year duplicate$ empty$
+  short.list key field.or.null = or
+     { pop$ "" }
+     'skip$
+  if$
+  *
+  'label :=
+}
+FUNCTION {sort.format.names}
+{ 's :=
+  #1 'nameptr :=
+  ""
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ll{ }}{  ff{ }}{  jj{ }}"
+      format.name$ 't :=
+      nameptr #1 >
+        {
+          "   "  *
+          namesleft #1 = t "others" = and
+            { "zzzzz" * }
+            { t sortify * }
+          if$
+        }
+        { t sortify * }
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {sort.format.title}
+{ 't :=
+  "A " #2
+    "An " #3
+      "The " #4 t chop.word
+    chop.word
+  chop.word
+  sortify
+  #1 global.max$ substring$
+}
+FUNCTION {author.sort}
+{ author empty$
+    { key empty$
+        { "to sort, need author or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {author.editor.sort}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { "to sort, need author, editor, or key in " cite$ * warning$
+              ""
+            }
+            { key sortify }
+          if$
+        }
+        { editor sort.format.names }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {editor.sort}
+{ editor empty$
+    { key empty$
+        { "to sort, need editor or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { editor sort.format.names }
+  if$
+}
+FUNCTION {presort}
+{ calc.label
+  label sortify
+  "    "
+  *
+  type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.sort
+    { type$ "proceedings" =
+        'editor.sort
+        'author.sort
+      if$
+    }
+  if$
+  #1 entry.max$ substring$
+  'sort.label :=
+  sort.label
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {presort}
+SORT
+STRINGS { last.label next.extra }
+INTEGERS { last.extra.num number.label }
+FUNCTION {initialize.extra.label.stuff}
+{ #0 int.to.chr$ 'last.label :=
+  "" 'next.extra :=
+  #0 'last.extra.num :=
+  #0 'number.label :=
+}
+FUNCTION {forward.pass}
+{ last.label label =
+    { last.extra.num #1 + 'last.extra.num :=
+      last.extra.num int.to.chr$ 'extra.label :=
+    }
+    { "a" chr.to.int$ 'last.extra.num :=
+      "" 'extra.label :=
+      label 'last.label :=
+    }
+  if$
+  number.label #1 + 'number.label :=
+}
+FUNCTION {reverse.pass}
+{ next.extra "b" =
+    { "a" 'extra.label := }
+    'skip$
+  if$
+  extra.label 'next.extra :=
+  extra.label
+  duplicate$ empty$
+    'skip$
+    { year field.or.null #-1 #1 substring$ chr.to.int$ #65 <
+      { "{\natexlab{" swap$ * "}}" * }
+      { "{(\natexlab{" swap$ * "})}" * }
+    if$ }
+  if$
+  'extra.label :=
+  label extra.label * 'label :=
+}
+EXECUTE {initialize.extra.label.stuff}
+ITERATE {forward.pass}
+REVERSE {reverse.pass}
+FUNCTION {bib.sort.order}
+{ sort.label
+  "    "
+  *
+  year field.or.null sortify
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {bib.sort.order}
+SORT
+FUNCTION {begin.bib}
+{ preamble$ empty$
+    'skip$
+    { preamble$ write$ newline$ }
+  if$
+  "\begin{thebibliography}{" number.label int.to.str$ * "}" *
+  write$ newline$
+  "\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi"
+  write$ newline$
+}
+EXECUTE {begin.bib}
+EXECUTE {init.urlbst.variables} % urlbst
+EXECUTE {init.state.consts}
+ITERATE {call.type$}
+FUNCTION {end.bib}
+{ newline$
+  "\end{thebibliography}" write$ newline$
+}
+EXECUTE {end.bib}
+%% End of customized bst file
+%%
+%% End of file `compling.bst'.

references/2019.arxiv.conneau/source/appendix.tex ADDED Viewed

	@@ -0,0 +1,45 @@

+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{acl2020}
+\usepackage{times}
+\usepackage{latexsym}
+\renewcommand{\UrlFont}{\ttfamily\small}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\usepackage{graphicx}
+\usepackage{subfigure}
+\usepackage{booktabs} % for professional tables
+\usepackage{url}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{array}
+\usepackage{adjustbox}
+\usepackage{multirow}
+% \usepackage{subcaption}
+\usepackage{hyperref}
+\usepackage{longtable}
+\usepackage{bibentry}
+\newcommand{\xlmr}{\textit{XLM-R}\xspace}
+\newcommand{\mbert}{mBERT\xspace}
+\input{content/tables}
+\begin{document}
+\nobibliography{acl2020}
+\bibliographystyle{acl_natbib}
+\appendix
+\onecolumn
+\section*{Supplementary materials}
+\section{Languages and statistics for CC-100 used by \xlmr}
+In this section we present the list of languages in the CC-100 corpus we created for training \xlmr. We also report statistics such as the number of tokens and the size of each monolingual corpus.
+\label{sec:appendix_A}
+\insertDataStatistics
+\newpage
+\section{Model Architectures and Sizes}
+As we showed in section 5, capacity is an important parameter for learning strong cross-lingual representations. In the table below, we list multiple monolingual and multilingual models used by the research community and summarize their architectures and total number of parameters.
+\label{sec:appendix_B}
+\insertParameters
+\end{document}

references/2019.arxiv.conneau/source/content/batchsize.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e0c4e1c156379efeba93f0c1a6717bb12ab0b2aa0bdd361a7fda362ff01442e
+size 14673

references/2019.arxiv.conneau/source/content/capacity.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:00087aeb1a14190e7800a77cecacb04e8ce1432c029e0276b4d8b02b7ff66edb
+size 16459

references/2019.arxiv.conneau/source/content/datasize.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d07fdd658101ef6caf7e2808faa6045ab175315b6435e25ff14ecedac584118
+size 26052

references/2019.arxiv.conneau/source/content/dilution.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80d1555811c23e2c521fbb007d84dfddb85e7020cc9333058368d3a1d63e240a
+size 16376

references/2019.arxiv.conneau/source/content/langsampling.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2f2f95649a23b0a46f8553f4e0e29000aff1971385b9addf6f478acc5a516a3
+size 15612

references/2019.arxiv.conneau/source/content/tables.tex ADDED Viewed

	@@ -0,0 +1,398 @@

+\newcommand{\insertXNLItable}{
+    \begin{table*}[h!]
+        \begin{center}
+            % \scriptsize
+            \resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{l ccc ccccccccccccccc c}
+            \toprule
+            {\bf Model} & {\bf D }& {\bf \#M} & {\bf \#lg} & {\bf en} & {\bf fr} & {\bf es} & {\bf de} & {\bf el} & {\bf bg} & {\bf ru} & {\bf tr} & {\bf ar} & {\bf vi} & {\bf th} & {\bf zh} & {\bf hi} & {\bf sw} & {\bf ur} & {\bf Avg}\\
+                \midrule
+                %\cmidrule(r){1-1}
+                %\cmidrule(lr){2-4}
+                %\cmidrule(lr){5-19}
+                %\cmidrule(l){20-20}
+                \multicolumn{19}{l}{\it Fine-tune multilingual model on English training set (Cross-lingual Transfer)} \\
+                %\midrule
+                \midrule
+                \citet{lample2019cross} & Wiki+MT & N & 15 & 85.0 & 78.7 & 78.9 & 77.8 & 76.6 & 77.4 & 75.3 & 72.5 & 73.1 & 76.1 & 73.2 & 76.5 & 69.6 & 68.4 & 67.3 & 75.1 \\
+                \citet{huang2019unicoder} & Wiki+MT & N & 15 & 85.1 & 79.0 & 79.4 & 77.8 & 77.2 & 77.2 & 76.3 & 72.8 & 73.5 & 76.4 & 73.6 & 76.2 & 69.4 & 69.7 & 66.7 & 75.4 \\
+                %\midrule
+                \citet{devlin2018bert} & Wiki & N & 102 & 82.1 & 73.8 & 74.3 & 71.1 & 66.4 & 68.9 & 69.0 & 61.6 & 64.9 & 69.5 & 55.8 & 69.3 & 60.0 & 50.4 & 58.0 & 66.3  \\
+                \citet{lample2019cross}  & Wiki & N & 100 & 83.7 & 76.2 & 76.6 & 73.7 & 72.4 & 73.0 & 72.1 & 68.1 & 68.4 & 72.0 & 68.2 & 71.5 & 64.5 & 58.0 & 62.4 & 71.3 \\
+                \citet{lample2019cross}  & Wiki & 1 & 100 & 83.2 & 76.7 & 77.7 & 74.0 & 72.7 & 74.1 & 72.7 & 68.7 & 68.6 & 72.9 & 68.9 & 72.5 & 65.6 & 58.2 & 62.4 & 70.7 \\
+                \bf XLM-R\textsubscript{Base}  & CC & 1 & 100 & 85.8 & 79.7 & 80.7 & 78.7 & 77.5 & 79.6 & 78.1 & 74.2 & 73.8 & 76.5 & 74.6 & 76.7 & 72.4 & 66.5 & 68.3 & 76.2 \\
+                \bf XLM-R & CC & 1 & 100 & \bf 89.1 & \bf 84.1 & \bf 85.1 & \bf 83.9 & \bf 82.9 & \bf 84.0 & \bf 81.2 & \bf 79.6 & \bf 79.8 & \bf 80.8 & \bf 78.1 & \bf 80.2 & \bf 76.9 & \bf 73.9 & \bf 73.8 & \bf 80.9 \\
+                \midrule
+                \multicolumn{19}{l}{\it Translate everything to English and use English-only model (TRANSLATE-TEST)} \\
+                \midrule
+                BERT-en & Wiki & 1 & 1 & 88.8 & 81.4 & 82.3 & 80.1 & 80.3 & 80.9 & 76.2 & 76.0 & 75.4 & 72.0 & 71.9 & 75.6 & 70.0 & 65.8 & 65.8 & 76.2 \\
+                RoBERTa & Wiki+CC & 1 & 1 & \underline{\bf 91.3} & 82.9 & 84.3 & 81.2 & 81.7 & 83.1 & 78.3 & 76.8 & 76.6 & 74.2 & 74.1 & 77.5 & 70.9 & 66.7 & 66.8 & 77.8 \\
+                % XLM-en & Wiki & 1 & 1 & 00.0 & 00.0 & 00.0 & 00.0 & 00.0  & 00.0 & 00.0 \\
+                \midrule
+                \multicolumn{19}{l}{\it Fine-tune multilingual model on each training set (TRANSLATE-TRAIN)} \\
+                \midrule
+                \citet{lample2019cross}  & Wiki & N & 100 & 82.9 & 77.6 & 77.9 & 77.9 & 77.1 & 75.7 & 75.5 & 72.6 & 71.2 & 75.8 & 73.1 & 76.2 & 70.4 & 66.5  & 62.4 & 74.2 \\
+                \midrule
+                \multicolumn{19}{l}{\it Fine-tune multilingual model on all training sets (TRANSLATE-TRAIN-ALL)} \\
+                \midrule
+                \citet{lample2019cross}$^{\dagger}$ & Wiki+MT & 1 & 15 & 85.0 & 80.8 & 81.3 & 80.3 & 79.1 & 80.9 & 78.3 & 75.6 & 77.6 & 78.5 & 76.0 & 79.5 & 72.9 & 72.8 & 68.5 & 77.8 \\
+                \citet{huang2019unicoder} & Wiki+MT & 1 & 15 & 85.6 & 81.1 & 82.3 & 80.9 & 79.5 & 81.4 & 79.7 & 76.8 & 78.2 & 77.9 & 77.1 & 80.5 & 73.4 & 73.8 & 69.6 & 78.5 \\
+                %\midrule
+                \citet{lample2019cross}  & Wiki & 1 & 100 & 84.5 & 80.1 & 81.3 & 79.3 & 78.6 & 79.4 & 77.5 & 75.2 & 75.6 & 78.3 & 75.7 & 78.3 & 72.1 & 69.2 & 67.7 & 76.9 \\
+                \bf XLM-R\textsubscript{Base} & CC & 1 & 100 & 85.4 & 81.4 & 82.2 & 80.3 & 80.4 & 81.3 & 79.7 & 78.6 & 77.3 & 79.7 & 77.9 & 80.2 & 76.1 & 73.1 & 73.0 & 79.1 \\
+                \bf XLM-R & CC & 1 & 100 & \bf 89.1 & \underline{\bf 85.1} & \underline{\bf 86.6} & \underline{\bf 85.7} & \underline{\bf 85.3} & \underline{\bf 85.9} & \underline{\bf 83.5} & \underline{\bf 83.2} & \underline{\bf 83.1} & \underline{\bf 83.7} & \underline{\bf 81.5} & \underline{\bf 83.7} & \underline{\bf 81.6} & \underline{\bf 78.0} & \underline{\bf 78.1} & \underline{\bf 83.6} \\
+                \bottomrule
+            \end{tabular}
+            }
+            \caption{\textbf{Results on cross-lingual classification.} We report the accuracy on each of the 15 XNLI languages and the average accuracy. We specify the dataset D used for pretraining, the number of models \#M the approach requires and the number of languages \#lg the model handles. Our \xlmr results are averaged over five different seeds. We show that using the translate-train-all approach which leverages training sets from multiple languages, \xlmr obtains a new state of the art on XNLI of $83.6$\% average accuracy. Results with $^{\dagger}$ are from \citet{huang2019unicoder}. %It also outperforms previous methods on cross-lingual transfer.
+            \label{tab:xnli}}
+        \end{center}
+%       \vspace{-0.4cm}
+    \end{table*}
+}
+% Evolution of performance w.r.t number of languages
+\newcommand{\insertLanguagesize}{
+    \begin{table*}[h!]
+    \begin{minipage}{0.49\textwidth}
+      \includegraphics[scale=0.4]{content/wiki_vs_cc.pdf}
+    \end{minipage}
+        \hfill
+    \begin{minipage}{0.4\textwidth}
+    	\captionof{figure}{\textbf{Distribution of the amount of data (in MB) per language for Wikipedia and CommonCrawl.} The Wikipedia data used in open-source mBERT and XLM is not sufficient for the model to develop an understanding of low-resource languages. The CommonCrawl data we collect alleviates that issue and creates the conditions for a single model to understand text coming from multiple languages. \label{fig:lgs}}
+    \end{minipage}
+%    \vspace{-0.5cm}
+    \end{table*}
+}
+% Evolution of performance w.r.t number of languages
+\newcommand{\insertXLMmorelanguages}{
+    \begin{table*}[h!]
+    \begin{minipage}{0.49\textwidth}
+      \includegraphics[scale=0.4]{content/evolution_languages}
+    \end{minipage}
+        \hfill
+    \begin{minipage}{0.4\textwidth}
+    	\captionof{figure}{\textbf{Evolution of XLM performance on SeqLab, XNLI and GLUE as the number of languages increases.} While there are subtlteties as to what languages lose more accuracy than others as we add more languages, we observe a steady decrease of the overall monolingual and cross-lingual performance. \label{fig:lgsunused}}
+    \end{minipage}
+%    \vspace{-0.5cm}
+    \end{table*}
+}
+\newcommand{\insertMLQA}{
+\begin{table*}[h!]
+    \begin{center}
+        % \scriptsize
+        \resizebox{1\linewidth}{!}{
+        \begin{tabular}[h]{l cc ccccccc c}
+        \toprule
+            {\bf Model} & {\bf train} & {\bf \#lgs} & {\bf en} & {\bf es} & {\bf de} & {\bf ar} & {\bf hi} & {\bf vi} & {\bf zh} & {\bf Avg} \\
+            \midrule
+            BERT-Large$^{\dagger}$ & en & 1 & 80.2 / 67.4 & - & - & - & - & - & - & - \\
+            mBERT$^{\dagger}$ & en & 102 & 77.7 / 65.2 & 64.3 / 46.6 & 57.9 / 44.3 & 45.7 / 29.8 & 43.8 / 29.7 & 57.1 / 38.6 & 57.5 / 37.3 & 57.7 / 41.6 \\
+            XLM-15$^{\dagger}$ & en & 15 & 74.9 / 62.4 & 68.0 / 49.8 & 62.2 / 47.6 & 54.8 / 36.3 & 48.8 / 27.3 & 61.4 / 41.8 & 61.1 / 39.6 & 61.6 / 43.5 \\
+            XLM-R\textsubscript{Base} & en & 100 & 77.1 / 64.6 & 67.4 / 49.6 & 60.9 / 46.7 & 54.9 / 36.6 & 59.4 / 42.9 & 64.5 / 44.7 & 61.8 / 39.3 & 63.7 / 46.3 \\
+            \bf XLM-R & en & 100 & \bf 80.6 / 67.8 & \bf 74.1 / 56.0 & \bf 68.5 / 53.6 & \bf 63.1 / 43.5 & \bf 69.2 / 51.6 & \bf 71.3 / 50.9 & \bf 68.0 / 45.4 & \bf 70.7 / 52.7  \\
+            \bottomrule
+        \end{tabular}
+            }
+            \caption{\textbf{Results on MLQA question answering} We report the F1 and EM (exact match) scores for zero-shot classification where models are fine-tuned on the English Squad dataset and evaluated on the 7 languages of MLQA. Results with $\dagger$ are taken from the original MLQA paper \citet{lewis2019mlqa}.
+            \label{tab:mlqa}}
+    \end{center}
+\end{table*}
+}
+\newcommand{\insertNER}{
+\begin{table}[t]
+    \begin{center}
+        % \scriptsize
+        \resizebox{1\linewidth}{!}{
+        \begin{tabular}[b]{l cc cccc c}
+            \toprule
+            {\bf Model} & {\bf train} & {\bf \#M} & {\bf en} & {\bf nl} & {\bf es} & {\bf de} & {\bf Avg}\\
+            \midrule
+            \citet{lample-etal-2016-neural} & each & N & 90.74 & 81.74 & 85.75 & 78.76 & 84.25 \\
+            \citet{akbik2018coling} & each & N & \bf 93.18 & 90.44 & - & \bf 88.27 & - \\
+            \midrule
+            \multirow{2}{*}{mBERT$^{\dagger}$} & each & N & 91.97 & 90.94 & 87.38 & 82.82 & 88.28\\
+             & en & 1 & 91.97 & 77.57 & 74.96 & 69.56 & 78.52\\
+             \midrule
+            \multirow{3}{*}{XLM-R\textsubscript{Base}} & each & N & 92.25 & 90.39 & 87.99 & 84.60 & 88.81\\
+             & en & 1 & 92.25 & 78.08 & 76.53 & 69.60 & 79.11\\
+             & all & 1 & 91.08 & 89.09 & 87.28 & 83.17 & 87.66 \\
+             \midrule
+            \multirow{3}{*}{\bf XLM-R} & each & N & 92.92 & \bf 92.53 & \bf 89.72 & 85.81 & 90.24\\
+             & en & 1 & 92.92 & 80.80 & 78.64 & 71.40 & 80.94\\
+             & all & 1 & 92.00 & 91.60 & 89.52 & 84.60 & 89.43 \\
+            \bottomrule
+        \end{tabular}
+            }
+            \caption{\textbf{Results on named entity recognition} on CoNLL-2002 and CoNLL-2003 (F1 score). Results with $\dagger$ are from \citet{wu2019beto}. Note that mBERT and \xlmr do not use a linear-chain CRF, as opposed to \citet{akbik2018coling} and \citet{lample-etal-2016-neural}.
+            \label{tab:ner}}
+        \end{center}
+       \vspace{-0.6cm}
+    \end{table}
+}
+\newcommand{\insertAblationone}{
+\begin{table*}[h!]
+	\begin{minipage}[t]{0.3\linewidth}
+	\begin{center}
+          %\includegraphics[width=\linewidth]{content/xlmroberta_transfer_dilution.pdf}
+          \includegraphics{content/dilution}
+        \captionof{figure}{The transfer-interference trade-off: Low-resource languages benefit from scaling to more languages, until dilution (interference) kicks in and degrades overall performance.}
+            \label{fig:transfer_dilution}
+        \vspace{-0.2cm}
+	\end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+	\begin{center}
+          %\includegraphics[width=\linewidth]{content/xlmroberta_evolution.pdf}
+          \includegraphics{content/wikicc}
+        \captionof{figure}{Wikipedia versus CommonCrawl: An XLM-7 obtains significantly better performance when trained on CC, in particular on low-resource languages.}
+            \label{fig:curse}
+	\end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+	\begin{center}
+        % \includegraphics[width=\linewidth]{content/xlmroberta_evolution.pdf}
+          \includegraphics{content/capacity}
+        \captionof{figure}{Adding more capacity to the model alleviates the curse of multilinguality, but remains an issue for models of moderate size.}
+            \label{fig:capacity}
+	\end{center}
+    \end{minipage}
+    \vspace{-0.2cm}
+\end{table*}
+}
+\newcommand{\insertAblationtwo}{
+\begin{table*}[h!]
+	\begin{minipage}[t]{0.3\linewidth}
+        \begin{center}
+          %\includegraphics[width=\columnwidth]{content/xlmroberta_alpha_tradeoff.pdf}
+          \includegraphics{content/langsampling}
+        \captionof{figure}{On the high-resource versus low-resource trade-off: impact of batch language sampling for XLM-100.
+        \label{fig:alpha}}
+        \end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+        \begin{center}
+          %\includegraphics[width=\columnwidth]{content/xlmroberta_vocab.pdf}
+          \includegraphics{content/vocabsize.pdf}
+        \captionof{figure}{On the impact of vocabulary size at fixed capacity and with increasing capacity for XLM-100.
+        \label{fig:vocab}}
+        \end{center}
+    \end{minipage}
+    \hfill
+	\begin{minipage}[t]{0.3\linewidth}
+        \begin{center}
+          %\includegraphics[width=\columnwidth]{content/xlmroberta_batch_and_tok.pdf}
+          \includegraphics{content/batchsize.pdf}
+        \captionof{figure}{On the impact of large-scale training, and preprocessing simplification from BPE with tokenization to SPM on raw text data.
+        \label{fig:batch}}
+        \end{center}
+    \end{minipage}
+        \vspace{-0.2cm}
+\end{table*}
+}
+% Multilingual vs monolingual
+\newcommand{\insertMultiMono}{
+    \begin{table}[h!]
+        \begin{center}
+            % \scriptsize
+            \resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{l cc ccccccc c}
+            \toprule
+                {\bf Model} & {\bf D } & {\bf \#vocab} & {\bf en} & {\bf fr} & {\bf de} & {\bf ru} & {\bf zh} & {\bf sw} & {\bf ur} & {\bf Avg}\\
+                \midrule
+                \multicolumn{11}{l}{\it Monolingual baselines}\\
+                \midrule
+                \multirow{2}{*}{BERT} & Wiki & 40k & 84.5 & 78.6 & 80.0 & 75.5 & 77.7 & 60.1 & 57.3 & 73.4 \\
+                 & CC & 40k & 86.7 & 81.2 & 81.2 & 78.2 & 79.5 & 70.8 & 65.1 & 77.5 \\
+                \midrule
+                \multicolumn{11}{l}{\it Multilingual models (cross-lingual transfer)}\\
+                \midrule
+                \multirow{2}{*}{XLM-7} & Wiki & 150k & 82.3 & 76.8 & 74.7 & 72.5 & 73.1 & 60.8 & 62.3 & 71.8 \\
+                 & CC & 150k & 85.7 & 78.6 & 79.5 & 76.4 & 74.8 & 71.2 & 66.9 & 76.2 \\
+                \midrule
+                \multicolumn{11}{l}{\it  Multilingual models (translate-train-all)}\\
+                \midrule
+                \multirow{2}{*}{XLM-7} & Wiki & 150k & 84.6 & 80.1 & 80.2 & 75.7 & 78 & 68.7 & 66.7 & 76.3 \\
+                 & CC & 150k & \bf 87.2 & \bf 82.5 & \bf 82.9 & \bf 79.7 & \bf 80.4 & \bf 75.7 & \bf 71.5 & \bf 80.0 \\
+                % \midrule
+                % XLM (sw,ar) & CC & 60k & N & 2-3 & - & - & - & - & - & 00.0 & - & 00.0 \\
+                % XLM (ur,hi,ar) & CC & 60k & N & 2-3 & - & - & - & - & - & - & 00.0 & 00.0 \\
+                \bottomrule
+            \end{tabular}
+            }
+            \caption{\textbf{Multilingual versus monolingual models (BERT-BASE).} We compare the performance of monolingual models (BERT) versus multilingual models (XLM) on seven languages, using a BERT-BASE architecture. We choose a vocabulary size of 40k and 150k for monolingual and multilingual models.
+            \label{tab:multimono}}
+        \end{center}
+       \vspace{-0.4cm}
+    \end{table}
+}
+% GLUE benchmark results
+\newcommand{\insertGlue}{
+    \begin{table}[h!]
+        \begin{center}
+            % \scriptsize
+            \resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{l|c|cccccc|c}
+            \toprule
+                {\bf Model} & {\bf \#lgs} & {\bf MNLI-m/mm} & {\bf QNLI} & {\bf QQP} & {\bf SST} & {\bf MRPC} & {\bf STS-B} & {\bf Avg}\\
+                \midrule
+                BERT\textsubscript{Large}$^{\dagger}$ & 1 & 86.6/- & 92.3 & 91.3 & 93.2 & 88.0 & 90.0 & 90.2 \\
+                XLNet\textsubscript{Large}$^{\dagger}$ & 1 & 89.8/- & 93.9 & 91.8 & 95.6 & 89.2 & 91.8 & 92.0 \\
+                RoBERTa$^{\dagger}$ & 1 & 90.2/90.2 & 94.7 & 92.2 & 96.4 & 90.9 & 92.4 & 92.8 \\
+                XLM-R & 100 & 88.9/89.0 & 93.8 & 92.3 & 95.0 & 89.5 & 91.2 & 91.8 \\
+                \bottomrule
+            \end{tabular}
+            }
+            \caption{\textbf{GLUE dev results.} Results with $^{\dagger}$ are from \citet{roberta2019}.  We compare the performance of \xlmr to BERT\textsubscript{Large}, XLNet and RoBERTa on the English GLUE benchmark.
+            \label{tab:glue}}
+        \end{center}
+       \vspace{-0.4cm}
+    \end{table}
+}
+% Wiki vs CommonCrawl statistics
+\newcommand{\insertWikivsCC}{
+    \begin{table*}[h]
+        \begin{center}
+          %\includegraphics[width=\linewidth]{content/wiki_vs_cc.pdf}
+          \includegraphics{content/datasize.pdf}
+        \captionof{figure}{Amount of data in GiB (log-scale) for the 88 languages that appear in both the Wiki-100 corpus used for mBERT and XLM-100, and the CC-100 used for XLM-R. CC-100 increases the amount of data by several orders of magnitude, in particular for low-resource languages.
+        \label{fig:wikivscc}}
+        \end{center}
+%       \vspace{-0.4cm}
+    \end{table*}
+}
+% Corpus statistics for CC-100
+\newcommand{\insertDataStatistics}{
+%\resizebox{1\linewidth}{!}{
+\begin{table}[h!]
+\begin{center}
+\small
+\begin{tabular}[b]{clrrclrr}
+\toprule
+\textbf{ISO code} & \textbf{Language} & \textbf{Tokens} (M) & \textbf{Size} (GiB) & \textbf{ISO code} & \textbf{Language} & \textbf{Tokens} (M) & \textbf{Size} (GiB)\\
+\cmidrule(r){1-4}\cmidrule(l){5-8}
+{\bf af }& Afrikaans & 242 & 1.3 &{\bf  lo }& Lao & 17 & 0.6 \\
+{\bf am }& Amharic & 68 & 0.8 &{\bf  lt }& Lithuanian & 1835 & 13.7 \\
+{\bf ar }& Arabic & 2869 & 28.0 &{\bf  lv }& Latvian & 1198 & 8.8 \\
+{\bf as }& Assamese & 5 & 0.1 &{\bf  mg }& Malagasy & 25 & 0.2 \\
+{\bf az }& Azerbaijani & 783 & 6.5 &{\bf  mk }& Macedonian & 449 & 4.8 \\
+{\bf be }& Belarusian & 362 & 4.3 &{\bf  ml }& Malayalam & 313 & 7.6 \\
+{\bf bg }& Bulgarian & 5487 & 57.5 &{\bf  mn }& Mongolian & 248 & 3.0 \\
+{\bf bn }& Bengali & 525 & 8.4 &{\bf  mr }& Marathi & 175 & 2.8 \\
+{\bf - }& Bengali Romanized & 77 & 0.5 &{\bf  ms }& Malay & 1318 & 8.5 \\
+{\bf br }& Breton & 16 & 0.1 &{\bf  my }& Burmese & 15 & 0.4 \\
+{\bf bs }& Bosnian & 14 & 0.1 &{\bf  my }& Burmese & 56 & 1.6 \\
+{\bf ca }& Catalan & 1752 & 10.1 &{\bf  ne }& Nepali & 237 & 3.8 \\
+{\bf cs }& Czech & 2498 & 16.3 &{\bf  nl }& Dutch & 5025 & 29.3 \\
+{\bf cy }& Welsh & 141 & 0.8 &{\bf  no }& Norwegian & 8494 & 49.0 \\
+{\bf da }& Danish & 7823 & 45.6 &{\bf  om }& Oromo & 8 & 0.1 \\
+{\bf de }& German & 10297 & 66.6 &{\bf  or }& Oriya & 36 & 0.6 \\
+{\bf el }& Greek & 4285 & 46.9 &{\bf  pa }& Punjabi & 68 & 0.8 \\
+{\bf en }& English & 55608 & 300.8 &{\bf  pl }& Polish & 6490 & 44.6 \\
+{\bf eo }& Esperanto & 157 & 0.9 &{\bf  ps }& Pashto & 96 & 0.7 \\
+{\bf es }& Spanish & 9374 & 53.3 &{\bf  pt }& Portuguese & 8405 & 49.1 \\
+{\bf et }& Estonian & 843 & 6.1 &{\bf  ro }& Romanian & 10354 & 61.4 \\
+{\bf eu }& Basque & 270 & 2.0 &{\bf  ru }& Russian & 23408 & 278.0 \\
+{\bf fa }& Persian & 13259 & 111.6 &{\bf  sa }& Sanskrit & 17 & 0.3 \\
+{\bf fi }& Finnish & 6730 & 54.3 &{\bf  sd }& Sindhi & 50 & 0.4 \\
+{\bf fr }& French & 9780 & 56.8 &{\bf  si }& Sinhala & 243 & 3.6 \\
+{\bf fy }& Western Frisian & 29 & 0.2 &{\bf  sk }& Slovak & 3525 & 23.2 \\
+{\bf ga }& Irish & 86 & 0.5 &{\bf  sl }& Slovenian & 1669 & 10.3 \\
+{\bf gd }& Scottish Gaelic & 21 & 0.1 &{\bf  so }& Somali & 62 & 0.4 \\
+{\bf gl }& Galician & 495 & 2.9 &{\bf  sq }& Albanian & 918 & 5.4 \\
+{\bf gu }& Gujarati & 140 & 1.9 &{\bf  sr }& Serbian & 843 & 9.1 \\
+{\bf ha }& Hausa & 56 & 0.3 &{\bf  su }& Sundanese & 10 & 0.1 \\
+{\bf he }& Hebrew & 3399 & 31.6 &{\bf  sv }& Swedish & 77.8 & 12.1 \\
+{\bf hi }& Hindi & 1715 & 20.2 &{\bf  sw }& Swahili & 275 & 1.6 \\
+{\bf - }& Hindi Romanized & 88 & 0.5 &{\bf  ta }& Tamil & 595 & 12.2 \\
+{\bf hr }& Croatian & 3297 & 20.5 &{\bf  - }& Tamil Romanized & 36 & 0.3 \\
+{\bf hu }& Hungarian & 7807 & 58.4 &{\bf  te }& Telugu & 249 & 4.7 \\
+{\bf hy }& Armenian & 421 & 5.5 &{\bf  - }& Telugu Romanized & 39 & 0.3 \\
+{\bf id }& Indonesian & 22704 & 148.3 &{\bf  th }& Thai & 1834 & 71.7 \\
+{\bf is }& Icelandic & 505 & 3.2 &{\bf  tl }& Filipino & 556 & 3.1 \\
+{\bf it }& Italian & 4983 & 30.2 &{\bf  tr }& Turkish & 2736 & 20.9 \\
+{\bf ja }& Japanese & 530 & 69.3 &{\bf  ug }& Uyghur & 27 & 0.4 \\
+{\bf jv }& Javanese & 24 & 0.2 &{\bf  uk }& Ukrainian & 6.5 & 84.6 \\
+{\bf ka }& Georgian & 469 & 9.1 &{\bf  ur }& Urdu & 730 & 5.7 \\
+{\bf kk }& Kazakh & 476 & 6.4 &{\bf  - }& Urdu Romanized & 85 & 0.5 \\
+{\bf km }& Khmer & 36 & 1.5 &{\bf  uz }& Uzbek & 91 & 0.7 \\
+{\bf kn }& Kannada & 169 & 3.3 &{\bf  vi }& Vietnamese & 24757 & 137.3 \\
+{\bf ko }& Korean & 5644 & 54.2 &{\bf  xh }& Xhosa & 13 & 0.1 \\
+{\bf ku }& Kurdish (Kurmanji) & 66 & 0.4 &{\bf  yi }& Yiddish & 34 & 0.3 \\
+{\bf ky }& Kyrgyz & 94 & 1.2 &{\bf  zh }& Chinese (Simplified) & 259 & 46.9 \\
+{\bf la }& Latin & 390 & 2.5 &{\bf  zh }& Chinese (Traditional) & 176 & 16.6 \\
+\bottomrule
+\end{tabular}
+\caption{\textbf{Languages and statistics of the CC-100 corpus.} We report the list of 100 languages and include the number of tokens (Millions) and the size of the data (in GiB) for each language. Note that we also include romanized variants of some non latin languages such as Bengali, Hindi, Tamil, Telugu and Urdu.\label{tab:datastats}}
+\end{center}
+\end{table}
+%}
+}
+% Comparison of parameters for different models
+\newcommand{\insertParameters}{
+    \begin{table*}[h!]
+        \begin{center}
+            % \scriptsize
+            %\resizebox{1\linewidth}{!}{
+            \begin{tabular}[b]{lrcrrrrrc}
+            \toprule
+                \textbf{Model} & \textbf{\#lgs} & \textbf{tokenization} & \textbf{L} & \textbf{$H_{m}$} & \textbf{$H_{ff}$} & \textbf{A} & \textbf{V} & \textbf{\#params}\\
+                \cmidrule(r){1-1}
+                \cmidrule(lr){2-3}
+                \cmidrule(lr){4-8}
+                \cmidrule(l){9-9}
+                % TODO: rank by number of parameters
+                BERT\textsubscript{Base} & 1 & WordPiece & 12 & 768 & 3072 & 12 & 30k & 110M \\
+                BERT\textsubscript{Large} & 1 & WordPiece & 24 & 1024 & 4096 & 16 & 30k & 335M \\
+                mBERT & 104 & WordPiece & 12 & 768 & 3072 & 12 & 110k & 172M \\
+                RoBERTa\textsubscript{Base} & 1 & bBPE & 12 & 768 & 3072 & 8 & 50k & 125M \\
+                RoBERTa & 1 & bBPE & 24 & 1024 & 4096 & 16 & 50k & 355M \\
+                XLM-15 & 15 & BPE & 12 & 1024 & 4096 & 8 & 95k & 250M \\
+                XLM-17 & 17 & BPE & 16 & 1280 & 5120 & 16 & 200k & 570M \\
+                XLM-100 & 100 & BPE & 16 & 1280 & 5120 & 16 & 200k & 570M \\
+                Unicoder & 15 & BPE & 12 & 1024 & 4096 & 8 & 95k & 250M \\
+                \xlmr\textsubscript{Base} & 100 & SPM & 12 & 768 & 3072 & 12 & 250k & 270M \\
+                \xlmr & 100 & SPM & 24 & 1024 & 4096 & 16 & 250k & 550M \\
+                GPT2 & 1 & bBPE & 48 & 1600 & 6400 & 32 & 50k & 1.5B \\
+                wide-mmNMT & 103 & SPM & 12 & 2048 & 16384 & 32 & 64k & 3B \\
+                deep-mmNMT & 103 & SPM & 24 & 1024 & 16384 & 32 & 64k & 3B \\
+                T5-3B & 1 & WordPiece & 24 & 1024 & 16384 & 32 & 32k & 3B \\
+                T5-11B & 1 & WordPiece & 24 & 1024 & 65536 & 32 & 32k & 11B \\
+                % XLNet\textsubscript{Large}$^{\dagger}$ & 1 & 89.8/- & 93.9 & 91.8 & 95.6 & 89.2 & 91.8 & 92.0 \\
+                % RoBERTa$^{\dagger}$ & 1 & 90.2/90.2 & 94.7 & 92.2 & 96.4 & 90.9 & 92.4 & 92.8 \\
+                % XLM-R & 100 & 88.4/88.5 & 93.1 & 92.2 & 95.1 & 89.7 & 90.4 & 91.5 \\
+                \bottomrule
+            \end{tabular}
+            %}
+            \caption{\textbf{Details on model sizes.}
+            We show the tokenization used by each Transformer model, the number of layers L, the number of hidden states of the model $H_{m}$, the dimension of the feed-forward layer $H_{ff}$, the number of attention heads A, the size of the vocabulary V and the total number of parameters \#params.
+            For Transformer encoders, the number of parameters can be approximated by $4LH_m^2 + 2LH_m H_{ff} + VH_m$.
+            GPT2 numbers are from \citet{radford2019language}, mm-NMT models are from the work of \citet{arivazhagan2019massively} on massively multilingual neural machine translation (mmNMT), and T5 numbers are from \citet{raffel2019exploring}. While \xlmr is among the largest models partly due to its large embedding layer, it has a similar number of parameters than XLM-100, and remains significantly smaller that recently introduced Transformer models for multilingual MT and transfer learning. While this table gives more hindsight on the difference of capacity of each model, note it does not highlight other critical differences between the models.
+            \label{tab:parameters}}
+        \end{center}
+       \vspace{-0.4cm}
+    \end{table*}
+}

references/2019.arxiv.conneau/source/content/vocabsize.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e45090856dc149265ada0062c8c2456c3057902dfaaade60aa80905785563506
+size 15677

references/2019.arxiv.conneau/source/content/wikicc.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0d7e959db8240f283922c3ca7c6de6f5ad3750681f27f4fcf35d161506a7a21
+size 16304

references/2019.arxiv.conneau/source/texput.log ADDED Viewed

	@@ -0,0 +1,21 @@

+This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019) (preloaded format=pdflatex 2019.5.8)  7 APR 2020 17:41
+entering extended mode
+ restricted \write18 enabled.
+ %&-line parsing enabled.
+**acl2020.tex
+! Emergency stop.
+<*> acl2020.tex
+*** (job aborted, file error in nonstop mode)
+Here is how much of TeX's memory you used:
+ 3 strings out of 492616
+ 102 string characters out of 6129482
+ 57117 words of memory out of 5000000
+ 4025 multiletter control sequences out of 15000+600000
+ 3640 words of font info for 14 fonts, out of 8000000 for 9000
+ 1141 hyphenation exceptions out of 8191
+ 0i,0n,0p,1b,6s stack positions out of 5000i,500n,10000p,200000b,80000s
+!  ==> Fatal error occurred, no output PDF file produced!

references/2019.arxiv.conneau/source/xlmr.bbl ADDED Viewed

	@@ -0,0 +1,285 @@

+\begin{thebibliography}{40}
+\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
+\bibitem[{Akbik et~al.(2018)Akbik, Blythe, and Vollgraf}]{akbik2018coling}
+Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018.
+\newblock Contextual string embeddings for sequence labeling.
+\newblock In \emph{COLING}, pages 1638--1649.
+\bibitem[{Arivazhagan et~al.(2019)Arivazhagan, Bapna, Firat, Lepikhin, Johnson,
+  Krikun, Chen, Cao, Foster, Cherry et~al.}]{arivazhagan2019massively}
+Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson,
+  Maxim Krikun, Mia~Xu Chen, Yuan Cao, George Foster, Colin Cherry, et~al.
+  2019.
+\newblock Massively multilingual neural machine translation in the wild:
+  Findings and challenges.
+\newblock \emph{arXiv preprint arXiv:1907.05019}.
+\bibitem[{Bowman et~al.(2015)Bowman, Angeli, Potts, and
+  Manning}]{bowman2015large}
+Samuel~R. Bowman, Gabor Angeli, Christopher Potts, and Christopher~D. Manning.
+  2015.
+\newblock A large annotated corpus for learning natural language inference.
+\newblock In \emph{EMNLP}.
+\bibitem[{Conneau et~al.(2018)Conneau, Rinott, Lample, Williams, Bowman,
+  Schwenk, and Stoyanov}]{conneau2018xnli}
+Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel~R.
+  Bowman, Holger Schwenk, and Veselin Stoyanov. 2018.
+\newblock Xnli: Evaluating cross-lingual sentence representations.
+\newblock In \emph{EMNLP}. Association for Computational Linguistics.
+\bibitem[{Devlin et~al.(2018)Devlin, Chang, Lee, and
+  Toutanova}]{devlin2018bert}
+Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018.
+\newblock Bert: Pre-training of deep bidirectional transformers for language
+  understanding.
+\newblock \emph{NAACL}.
+\bibitem[{Grave et~al.(2018)Grave, Bojanowski, Gupta, Joulin, and
+  Mikolov}]{grave2018learning}
+Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas
+  Mikolov. 2018.
+\newblock Learning word vectors for 157 languages.
+\newblock In \emph{LREC}.
+\bibitem[{Huang et~al.(2019)Huang, Liang, Duan, Gong, Shou, Jiang, and
+  Zhou}]{huang2019unicoder}
+Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, and
+  Ming Zhou. 2019.
+\newblock Unicoder: A universal language encoder by pre-training with multiple
+  cross-lingual tasks.
+\newblock \emph{ACL}.
+\bibitem[{Johnson et~al.(2017)Johnson, Schuster, Le, Krikun, Wu, Chen, Thorat,
+  Vi{\'e}gas, Wattenberg, Corrado et~al.}]{johnson2017google}
+Melvin Johnson, Mike Schuster, Quoc~V Le, Maxim Krikun, Yonghui Wu, Zhifeng
+  Chen, Nikhil Thorat, Fernanda Vi{\'e}gas, Martin Wattenberg, Greg Corrado,
+  et~al. 2017.
+\newblock Google’s multilingual neural machine translation system: Enabling
+  zero-shot translation.
+\newblock \emph{TACL}, 5:339--351.
+\bibitem[{Joulin et~al.(2017)Joulin, Grave, and Mikolov}]{joulin2017bag}
+Armand Joulin, Edouard Grave, and Piotr Bojanowski~Tomas Mikolov. 2017.
+\newblock Bag of tricks for efficient text classification.
+\newblock \emph{EACL 2017}, page 427.
+\bibitem[{Jozefowicz et~al.(2016)Jozefowicz, Vinyals, Schuster, Shazeer, and
+  Wu}]{jozefowicz2016exploring}
+Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu.
+  2016.
+\newblock Exploring the limits of language modeling.
+\newblock \emph{arXiv preprint arXiv:1602.02410}.
+\bibitem[{Kudo(2018)}]{kudo2018subword}
+Taku Kudo. 2018.
+\newblock Subword regularization: Improving neural network translation models
+  with multiple subword candidates.
+\newblock In \emph{ACL}, pages 66--75.
+\bibitem[{Kudo and Richardson(2018)}]{kudo2018sentencepiece}
+Taku Kudo and John Richardson. 2018.
+\newblock Sentencepiece: A simple and language independent subword tokenizer
+  and detokenizer for neural text processing.
+\newblock \emph{EMNLP}.
+\bibitem[{Lample et~al.(2016)Lample, Ballesteros, Subramanian, Kawakami, and
+  Dyer}]{lample-etal-2016-neural}
+Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and
+  Chris Dyer. 2016.
+\newblock \href {https://doi.org/10.18653/v1/N16-1030} {Neural architectures
+  for named entity recognition}.
+\newblock In \emph{NAACL}, pages 260--270, San Diego, California. Association
+  for Computational Linguistics.
+\bibitem[{Lample and Conneau(2019)}]{lample2019cross}
+Guillaume Lample and Alexis Conneau. 2019.
+\newblock Cross-lingual language model pretraining.
+\newblock \emph{NeurIPS}.
+\bibitem[{Lewis et~al.(2019)Lewis, O\u{g}uz, Rinott, Riedel, and
+  Schwenk}]{lewis2019mlqa}
+Patrick Lewis, Barlas O\u{g}uz, Ruty Rinott, Sebastian Riedel, and Holger
+  Schwenk. 2019.
+\newblock Mlqa: Evaluating cross-lingual extractive question answering.
+\newblock \emph{arXiv preprint arXiv:1910.07475}.
+\bibitem[{Liu et~al.(2019)Liu, Ott, Goyal, Du, Joshi, Chen, Levy, Lewis,
+  Zettlemoyer, and Stoyanov}]{roberta2019}
+Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
+  Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
+\newblock Roberta: {A} robustly optimized {BERT} pretraining approach.
+\newblock \emph{arXiv preprint arXiv:1907.11692}.
+\bibitem[{Mikolov et~al.(2013{\natexlab{a}})Mikolov, Le, and
+  Sutskever}]{mikolov2013exploiting}
+Tomas Mikolov, Quoc~V Le, and Ilya Sutskever. 2013{\natexlab{a}}.
+\newblock Exploiting similarities among languages for machine translation.
+\newblock \emph{arXiv preprint arXiv:1309.4168}.
+\bibitem[{Mikolov et~al.(2013{\natexlab{b}})Mikolov, Sutskever, Chen, Corrado,
+  and Dean}]{mikolov2013distributed}
+Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg~S Corrado, and Jeff Dean.
+  2013{\natexlab{b}}.
+\newblock Distributed representations of words and phrases and their
+  compositionality.
+\newblock In \emph{NIPS}, pages 3111--3119.
+\bibitem[{Pennington et~al.(2014)Pennington, Socher, and
+  Manning}]{pennington2014glove}
+Jeffrey Pennington, Richard Socher, and Christopher~D. Manning. 2014.
+\newblock \href {http://www.aclweb.org/anthology/D14-1162} {Glove: Global
+  vectors for word representation}.
+\newblock In \emph{EMNLP}, pages 1532--1543.
+\bibitem[{Peters et~al.(2018)Peters, Neumann, Iyyer, Gardner, Clark, Lee, and
+  Zettlemoyer}]{peters2018deep}
+Matthew~E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
+  Kenton Lee, and Luke Zettlemoyer. 2018.
+\newblock Deep contextualized word representations.
+\newblock \emph{NAACL}.
+\bibitem[{Pires et~al.(2019)Pires, Schlinger, and Garrette}]{Pires2019HowMI}
+Telmo Pires, Eva Schlinger, and Dan Garrette. 2019.
+\newblock How multilingual is multilingual bert?
+\newblock In \emph{ACL}.
+\bibitem[{Radford et~al.(2018)Radford, Narasimhan, Salimans, and
+  Sutskever}]{radford2018improving}
+Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018.
+\newblock \href
+  {https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf}
+  {Improving language understanding by generative pre-training}.
+\newblock \emph{URL
+  https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language\_understanding\_paper.pdf}.
+\bibitem[{Radford et~al.(2019)Radford, Wu, Child, Luan, Amodei, and
+  Sutskever}]{radford2019language}
+Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya
+  Sutskever. 2019.
+\newblock Language models are unsupervised multitask learners.
+\newblock \emph{OpenAI Blog}, 1(8).
+\bibitem[{Raffel et~al.(2019)Raffel, Shazeer, Roberts, Lee, Narang, Matena,
+  Zhou, Li, and Liu}]{raffel2019exploring}
+Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael
+  Matena, Yanqi Zhou, Wei Li, and Peter~J. Liu. 2019.
+\newblock Exploring the limits of transfer learning with a unified text-to-text
+  transformer.
+\newblock \emph{arXiv preprint arXiv:1910.10683}.
+\bibitem[{Rajpurkar et~al.(2018)Rajpurkar, Jia, and Liang}]{rajpurkar2018know}
+Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018.
+\newblock Know what you don't know: Unanswerable questions for squad.
+\newblock \emph{ACL}.
+\bibitem[{Rajpurkar et~al.(2016)Rajpurkar, Zhang, Lopyrev, and
+  Liang}]{rajpurkar-etal-2016-squad}
+Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016.
+\newblock \href {https://doi.org/10.18653/v1/D16-1264} {{SQ}u{AD}: 100,000+
+  questions for machine comprehension of text}.
+\newblock In \emph{EMNLP}, pages 2383--2392, Austin, Texas. Association for
+  Computational Linguistics.
+\bibitem[{Sang(2002)}]{sang2002introduction}
+Erik~F Sang. 2002.
+\newblock Introduction to the conll-2002 shared task: Language-independent
+  named entity recognition.
+\newblock \emph{CoNLL}.
+\bibitem[{Schuster et~al.(2019)Schuster, Ram, Barzilay, and
+  Globerson}]{schuster2019cross}
+Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson. 2019.
+\newblock Cross-lingual alignment of contextual word embeddings, with
+  applications to zero-shot dependency parsing.
+\newblock \emph{NAACL}.
+\bibitem[{Siddhant et~al.(2019)Siddhant, Johnson, Tsai, Arivazhagan, Riesa,
+  Bapna, Firat, and Raman}]{siddhant2019evaluating}
+Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa,
+  Ankur Bapna, Orhan Firat, and Karthik Raman. 2019.
+\newblock Evaluating the cross-lingual effectiveness of massively multilingual
+  neural machine translation.
+\newblock \emph{AAAI}.
+\bibitem[{Singh et~al.(2019)Singh, McCann, Keskar, Xiong, and
+  Socher}]{singh2019xlda}
+Jasdeep Singh, Bryan McCann, Nitish~Shirish Keskar, Caiming Xiong, and Richard
+  Socher. 2019.
+\newblock Xlda: Cross-lingual data augmentation for natural language inference
+  and question answering.
+\newblock \emph{arXiv preprint arXiv:1905.11471}.
+\bibitem[{Socher et~al.(2013)Socher, Perelygin, Wu, Chuang, Manning, Ng, and
+  Potts}]{socher2013recursive}
+Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher~D Manning,
+  Andrew Ng, and Christopher Potts. 2013.
+\newblock Recursive deep models for semantic compositionality over a sentiment
+  treebank.
+\newblock In \emph{EMNLP}, pages 1631--1642.
+\bibitem[{Tan et~al.(2019)Tan, Ren, He, Qin, Zhao, and
+  Liu}]{tan2019multilingual}
+Xu~Tan, Yi~Ren, Di~He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019.
+\newblock Multilingual neural machine translation with knowledge distillation.
+\newblock \emph{ICLR}.
+\bibitem[{Tjong Kim~Sang and De~Meulder(2003)}]{tjong2003introduction}
+Erik~F Tjong Kim~Sang and Fien De~Meulder. 2003.
+\newblock Introduction to the conll-2003 shared task: language-independent
+  named entity recognition.
+\newblock In \emph{CoNLL}, pages 142--147. Association for Computational
+  Linguistics.
+\bibitem[{Vaswani et~al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones,
+  Gomez, Kaiser, and Polosukhin}]{transformer17}
+Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
+  Aidan~N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017.
+\newblock Attention is all you need.
+\newblock In \emph{Advances in Neural Information Processing Systems}, pages
+  6000--6010.
+\bibitem[{Wang et~al.(2018)Wang, Singh, Michael, Hill, Levy, and
+  Bowman}]{wang2018glue}
+Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel~R
+  Bowman. 2018.
+\newblock Glue: A multi-task benchmark and analysis platform for natural
+  language understanding.
+\newblock \emph{arXiv preprint arXiv:1804.07461}.
+\bibitem[{Wenzek et~al.(2019)Wenzek, Lachaux, Conneau, Chaudhary, Guzman,
+  Joulin, and Grave}]{wenzek2019ccnet}
+Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary,
+  Francisco Guzman, Armand Joulin, and Edouard Grave. 2019.
+\newblock Ccnet: Extracting high quality monolingual datasets from web crawl
+  data.
+\newblock \emph{arXiv preprint arXiv:1911.00359}.
+\bibitem[{Williams et~al.(2017)Williams, Nangia, and
+  Bowman}]{williams2017broad}
+Adina Williams, Nikita Nangia, and Samuel~R Bowman. 2017.
+\newblock A broad-coverage challenge corpus for sentence understanding through
+  inference.
+\newblock \emph{Proceedings of the 2nd Workshop on Evaluating Vector-Space
+  Representations for NLP}.
+\bibitem[{Wu et~al.(2019)Wu, Conneau, Li, Zettlemoyer, and
+  Stoyanov}]{wu2019emerging}
+Shijie Wu, Alexis Conneau, Haoran Li, Luke Zettlemoyer, and Veselin Stoyanov.
+  2019.
+\newblock Emerging cross-lingual structure in pretrained language models.
+\newblock \emph{ACL}.
+\bibitem[{Wu and Dredze(2019)}]{wu2019beto}
+Shijie Wu and Mark Dredze. 2019.
+\newblock Beto, bentz, becas: The surprising cross-lingual effectiveness of
+  bert.
+\newblock \emph{EMNLP}.
+\bibitem[{Xie et~al.(2019)Xie, Dai, Hovy, Luong, and Le}]{xie2019unsupervised}
+Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc~V Le. 2019.
+\newblock Unsupervised data augmentation for consistency training.
+\newblock \emph{arXiv preprint arXiv:1904.12848}.
+\end{thebibliography}

references/2019.arxiv.conneau/source/xlmr.synctex ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:420af1ab9f337834c49b93240fd9062be0a9f1bd9135878e6c96a6d128aa6856
+size 865236

references/2019.arxiv.conneau/source/xlmr.tex ADDED Viewed

	@@ -0,0 +1,307 @@

+%
+% File acl2020.tex
+%
+%% Based on the style files for ACL 2020, which were
+%% Based on the style files for ACL 2018, NAACL 2018/19, which were
+%% Based on the style files for ACL-2015, with some improvements
+%%  taken from the NAACL-2016 style
+%% Based on the style files for ACL-2014, which were, in turn,
+%% based on ACL-2013, ACL-2012, ACL-2011, ACL-2010, ACL-IJCNLP-2009,
+%% EACL-2009, IJCNLP-2008...
+%% Based on the style files for EACL 2006 by
+%%e.agirre@ehu.es or Sergi.Balari@uab.es
+%% and that of ACL 08 by Joakim Nivre and Noah Smith
+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{acl2020}
+\usepackage{times}
+\usepackage{latexsym}
+\renewcommand{\UrlFont}{\ttfamily\small}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\usepackage{graphicx}
+\usepackage{subfigure}
+\usepackage{booktabs} % for professional tables
+\usepackage{url}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{array}
+\usepackage{adjustbox}
+\usepackage{multirow}
+% \usepackage{subcaption}
+\usepackage{hyperref}
+\usepackage{longtable}
+\input{content/tables}
+\aclfinalcopy % Uncomment this line for the final submission
+\def\aclpaperid{479} %  Enter the acl Paper ID here
+%\setlength\titlebox{5cm}
+% You can expand the titlebox if you need extra space
+% to show all the authors. Please do not make the titlebox
+% smaller than 5cm (the original size); we will check this
+% in the camera-ready version and ask you to change it back.
+\newcommand\BibTeX{B\textsc{ib}\TeX}
+\usepackage{xspace}
+\newcommand{\xlmr}{\textit{XLM-R}\xspace}
+\newcommand{\mbert}{mBERT\xspace}
+\newcommand{\XX}{\textcolor{red}{XX}\xspace}
+\newcommand{\note}[3]{{\color{#2}[#1: #3]}}
+\newcommand{\ves}[1]{\note{ves}{red}{#1}}
+\newcommand{\luke}[1]{\note{luke}{green}{#1}}
+\newcommand{\myle}[1]{\note{myle}{cyan}{#1}}
+\newcommand{\paco}[1]{\note{paco}{blue}{#1}}
+\newcommand{\eg}[1]{\note{edouard}{orange}{#1}}
+\newcommand{\kk}[1]{\note{kartikay}{pink}{#1}}
+\renewcommand{\UrlFont}{\scriptsize}
+\title{Unsupervised Cross-lingual Representation Learning at Scale}
+\author{Alexis Conneau\thanks{\ \ Equal contribution.} \space\space\space
+  Kartikay Khandelwal\footnotemark[1] \space\space\space \AND
+  \bf Naman Goyal \space\space\space
+  Vishrav Chaudhary \space\space\space
+  Guillaume Wenzek \space\space\space
+  Francisco Guzm\'an \space\space\space \AND
+  \bf Edouard Grave \space\space\space
+  Myle Ott \space\space\space
+  Luke Zettlemoyer \space\space\space
+  Veselin Stoyanov \space\space\space \\ \\ \\
+  \bf Facebook AI
+  }
+\date{}
+\begin{document}
+\maketitle
+\begin{abstract}
+This paper shows that pretraining  multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed \xlmr, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +14.6\% average accuracy on XNLI, +13\% average F1 score on MLQA, and +2.4\% F1 score on NER. \xlmr performs particularly well on low-resource languages, improving 15.7\% in XNLI accuracy for Swahili and 11.4\% for Urdu over previous XLM models. We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; \xlmr is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make our code, data and models publicly available.{\let\thefootnote\relax\footnotetext{\scriptsize Correspondence to {\tt \{aconneau,kartikayk\}@fb.com}}}\footnote{\url{https://github.com/facebookresearch/(fairseq-py,pytext,xlm)}}
+\end{abstract}
+\section{Introduction}
+The goal of this paper is to improve cross-lingual language understanding (XLU), by carefully studying the effects of training unsupervised cross-lingual representations at a very large scale.
+We present \xlmr a transformer-based multilingual masked language model pre-trained on text in 100 languages, which obtains state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.
+Multilingual masked language models (MLM) like \mbert ~\cite{devlin2018bert} and XLM \cite{lample2019cross} have pushed the state-of-the-art on cross-lingual understanding tasks by jointly pretraining large Transformer models~\cite{transformer17} on many languages. These models allow for effective cross-lingual transfer, as seen in a number of benchmarks including cross-lingual natural language inference ~\cite{bowman2015large,williams2017broad,conneau2018xnli}, question answering~\cite{rajpurkar-etal-2016-squad,lewis2019mlqa}, and named entity recognition~\cite{Pires2019HowMI,wu2019beto}.
+However, all of these studies pre-train on Wikipedia, which provides a relatively limited scale especially for lower resource languages.
+In this paper, we first present a comprehensive analysis of the trade-offs and limitations of multilingual language models at scale, inspired by recent monolingual scaling efforts~\cite{roberta2019}.
+We measure the trade-off between high-resource and low-resource languages and the impact of language sampling and vocabulary size.
+%By training models with an increasing number of languages,
+The experiments expose a trade-off as we scale the number of languages for a fixed model capacity: more languages leads to better cross-lingual performance on low-resource languages up until a point, after which the overall performance on monolingual and cross-lingual benchmarks degrades. We refer to this tradeoff as the \emph{curse of multilinguality}, and show that it can be alleviated by simply increasing model capacity.
+We argue, however, that this remains an important limitation for future XLU systems which may aim to improve performance with more modest computational budgets.
+Our best model XLM-RoBERTa (\xlmr)   outperforms \mbert on cross-lingual classification by up to 23\% accuracy on low-resource languages.
+%like Swahili and Urdu.
+It outperforms the previous state of the art by 5.1\% average accuracy on XNLI, 2.42\% average F1-score on Named Entity Recognition, and 9.1\% average F1-score on cross-lingual Question Answering. We also evaluate monolingual fine tuning on the GLUE and XNLI benchmarks, where \xlmr obtains results competitive with state-of-the-art monolingual models, including RoBERTa \cite{roberta2019}.
+These results demonstrate, for the first time, that it is possible to have a single large model for all languages, without sacrificing per-language performance.
+We will make our code, models and data publicly available, with the hope that this will help research in multilingual NLP and low-resource language understanding.
+\section{Related Work}
+From pretrained word embeddings~\citep{mikolov2013distributed, pennington2014glove} to pretrained contextualized representations~\citep{peters2018deep,schuster2019cross} and transformer based language models~\citep{radford2018improving,devlin2018bert}, unsupervised representation learning has significantly improved the state of the art in natural language understanding. Parallel work on cross-lingual understanding~\citep{mikolov2013exploiting,schuster2019cross,lample2019cross} extends these systems to more languages and to the cross-lingual setting in which a model is learned in one language and applied in other languages.
+Most recently, \citet{devlin2018bert} and \citet{lample2019cross} introduced \mbert and XLM - masked language models trained on multiple languages, without any cross-lingual supervision.
+\citet{lample2019cross} propose translation language modeling (TLM) as a way to leverage parallel data and obtain a new state of the art on the cross-lingual natural language inference (XNLI) benchmark~\cite{conneau2018xnli}.
+They further show strong improvements on unsupervised machine translation and pretraining for sequence generation. \citet{wu2019emerging} shows that monolingual BERT representations are similar across languages, explaining in part the natural emergence of multilinguality in bottleneck architectures. Separately, \citet{Pires2019HowMI} demonstrated the effectiveness of multilingual models like \mbert on sequence labeling tasks. \citet{huang2019unicoder} showed gains over XLM using cross-lingual multi-task learning, and \citet{singh2019xlda} demonstrated the efficiency of cross-lingual data augmentation for cross-lingual NLI. However, all of this work was at a relatively modest scale, in terms of the amount of training data, as compared to our approach.
+\insertWikivsCC
+The benefits of scaling language model pretraining by increasing the size of the model as well as the training data has been extensively studied in the literature. For the monolingual case, \citet{jozefowicz2016exploring} show how large-scale LSTM models can obtain much stronger performance on language modeling benchmarks when trained on billions of tokens.
+%[Kartikay: TODO; CHange the reference to GPT2]
+GPT~\cite{radford2018improving} also highlights the importance of scaling the amount of data and RoBERTa \cite{roberta2019} shows that training BERT longer on more data leads to significant boost in performance. Inspired by RoBERTa, we show that mBERT and XLM are undertuned, and that simple improvements in the learning procedure of unsupervised MLM leads to much better performance. We train on cleaned CommonCrawls~\cite{wenzek2019ccnet}, which increase the amount of data for low-resource languages by two orders of magnitude on average. Similar data has also been shown to be effective for learning high quality word embeddings in multiple languages~\cite{grave2018learning}.
+Several efforts have trained massively multilingual machine translation models from large parallel corpora. They uncover the high and low resource trade-off and the problem of capacity dilution~\citep{johnson2017google,tan2019multilingual}. The work most similar to ours is \citet{arivazhagan2019massively}, which trains a single model in 103 languages on over 25 billion parallel sentences.
+\citet{siddhant2019evaluating} further analyze the representations obtained by the encoder of a massively multilingual machine translation system and show that it obtains similar results to mBERT on cross-lingual NLI.
+%, which performs much wors that the XLM models we study.
+Our work, in contrast, focuses on the unsupervised learning of cross-lingual representations and their transfer to discriminative tasks.
+\section{Model and Data}
+\label{sec:model+data}
+In this section, we present the training objective, languages, and data we use. We follow the XLM approach~\cite{lample2019cross} as closely as possible, only introducing changes that improve performance at scale.
+\paragraph{Masked Language Models.}
+We use a Transformer model~\cite{transformer17} trained with the multilingual MLM objective~\cite{devlin2018bert,lample2019cross} using only monolingual data. We sample streams of text from each language and train the model to predict the masked tokens in the input.
+We apply subword tokenization directly on raw text data using Sentence Piece~\cite{kudo2018sentencepiece} with a unigram language model~\cite{kudo2018subword}. We sample batches from different languages using the same sampling distribution as \citet{lample2019cross}, but with $\alpha=0.3$. Unlike \citet{lample2019cross}, we do not use language embeddings, which allows our model to better deal with code-switching.  We use a large vocabulary size of 250K with a full softmax and train two different models: \xlmr\textsubscript{Base} (L = 12, H = 768, A = 12, 270M params) and \xlmr (L = 24, H = 1024, A = 16, 550M params). For all of our ablation studies, we use a BERT\textsubscript{Base} architecture with a vocabulary of 150K tokens. Appendix~\ref{sec:appendix_B} goes into more details about the architecture of the different models referenced in this paper.
+\paragraph{Scaling to a hundred languages.}
+\xlmr is trained on 100 languages;
+we provide a full list of languages and associated statistics in Appendix~\ref{sec:appendix_A}. Figure~\ref{fig:wikivscc} specifies the iso codes of 88 languages that are shared across \xlmr and XLM-100, the model from~\citet{lample2019cross} trained on Wikipedia text in 100 languages.
+Compared to previous work, we replace some languages with more commonly used ones such as romanized Hindi and traditional Chinese. In our ablation studies, we always include the 7 languages for which we have classification and sequence labeling evaluation benchmarks: English, French, German, Russian, Chinese, Swahili and Urdu. We chose this set as it covers a suitable range of language families and includes low-resource languages such as Swahili and Urdu.
+We also consider larger sets of 15, 30, 60 and all 100 languages. When reporting results on high-resource and low-resource, we refer to the average of English and French results, and the average of Swahili and Urdu results respectively.
+\paragraph{Scaling the Amount of Training Data.}
+Following~\citet{wenzek2019ccnet}~\footnote{\url{https://github.com/facebookresearch/cc_net}}, we build a clean CommonCrawl Corpus in 100 languages. We use an internal language identification model in combination with the one from fastText~\cite{joulin2017bag}. We train language models in each language and use it to filter documents as described in \citet{wenzek2019ccnet}. We consider one CommonCrawl dump for English and twelve dumps for all other languages, which significantly increases dataset sizes, especially for low-resource languages like Burmese and Swahili.
+Figure~\ref{fig:wikivscc} shows the difference in size between the Wikipedia Corpus used by mBERT and XLM-100, and the CommonCrawl Corpus we use. As we show in Section~\ref{sec:multimono}, monolingual Wikipedia corpora are too small to enable unsupervised representation learning. Based on our experiments, we found that a few hundred MiB of text data is usually a minimal size for learning a BERT model.
+\section{Evaluation}
+We consider four evaluation benchmarks.
+For cross-lingual understanding, we use cross-lingual natural language inference, named entity recognition, and question answering. We use the GLUE benchmark to evaluate the English performance of \xlmr and compare it to other state-of-the-art models.
+\paragraph{Cross-lingual Natural Language Inference (XNLI).}
+The XNLI dataset comes with ground-truth dev and test sets in 15 languages, and a ground-truth English training set. The training set has been machine-translated to the remaining 14 languages, providing synthetic training data for these languages as well. We evaluate our model on cross-lingual transfer from English to other languages. We also consider three machine translation baselines: (i) \textit{translate-test}: dev and test sets are machine-translated to English and a single English model is used (ii) \textit{translate-train} (per-language): the English training set is machine-translated to each language and we fine-tune a multiligual model on each training set (iii) \textit{translate-train-all} (multi-language): we fine-tune a multilingual model on the concatenation of all training sets from translate-train. For the translations, we use the official data provided by the XNLI project.
+% In case we want to add more details about the CC-100 corpora : We train language models in each language and use it to filter documents as described in Wenzek et al. (2019). We additionally apply a filter based on type-token ratio score of 0.6.  We consider one CommonCrawl snapshot (December, 2018) for English and twelve snapshots from all months of 2018 for all other languages, which significantly increases dataset sizes, especially for low-resource languages like Burmese and Swahili.
+\paragraph{Named Entity Recognition.}
+% WikiAnn http://nlp.cs.rpi.edu/wikiann/
+For NER, we consider the CoNLL-2002~\cite{sang2002introduction} and CoNLL-2003~\cite{tjong2003introduction} datasets in English, Dutch, Spanish and German. We fine-tune multilingual models either (1) on the English set to evaluate cross-lingual transfer, (2) on each set to  evaluate per-language performance, or (3) on all sets to evaluate multilingual learning. We report the F1 score, and compare to baselines from \citet{lample-etal-2016-neural} and \citet{akbik2018coling}.
+\paragraph{Cross-lingual Question Answering.}
+We use the MLQA benchmark from \citet{lewis2019mlqa}, which extends the English SQuAD benchmark to Spanish, German, Arabic, Hindi, Vietnamese and Chinese. We report the F1 score as well as the exact match (EM) score for cross-lingual transfer from English.
+\paragraph{GLUE Benchmark.}
+Finally, we evaluate the English performance of our model on the GLUE benchmark~\cite{wang2018glue} which gathers multiple classification tasks, such as MNLI~\cite{williams2017broad}, SST-2~\cite{socher2013recursive}, or QNLI~\cite{rajpurkar2018know}. We use BERT\textsubscript{Large} and RoBERTa as baselines.
+\section{Analysis and Results}
+\label{sec:analysis}
+In this section, we perform a comprehensive analysis of multilingual masked language models. We conduct most of the analysis on XNLI, which we found to be representative of our findings on other tasks. We then present the results of \xlmr on cross-lingual understanding and GLUE. Finally, we compare multilingual and monolingual models, and present results on low-resource languages.
+\subsection{Improving and Understanding Multilingual Masked Language Models}
+% prior analysis necessary to build \xlmr
+\insertAblationone
+\insertAblationtwo
+Much of the work done on understanding the cross-lingual effectiveness of \mbert or XLM~\cite{Pires2019HowMI,wu2019beto,lewis2019mlqa} has focused on analyzing the performance of fixed pretrained models on downstream tasks. In this section, we present a comprehensive study of different factors that are important to \textit{pretraining} large scale multilingual models. We highlight the trade-offs and limitations of these models as we scale to one hundred languages.
+\paragraph{Transfer-dilution Trade-off and Curse of Multilinguality.}
+Model capacity (i.e. the number of parameters in the model) is constrained due to practical considerations such as memory and speed during training and inference. For a fixed sized model, the per-language capacity decreases as we increase the number of languages. While low-resource language performance can be improved by adding similar higher-resource languages during pretraining, the overall downstream performance suffers from this capacity dilution~\cite{arivazhagan2019massively}. Positive transfer and capacity dilution have to be traded off against each other.
+We illustrate this trade-off in Figure~\ref{fig:transfer_dilution}, which shows XNLI performance vs the number of languages the model is pretrained on. Initially, as we go from 7 to 15 languages, the model is able to take advantage of positive transfer which improves performance, especially on low resource languages. Beyond this point the {\em curse of multilinguality}
+kicks in and degrades performance across all languages.  Specifically, the overall XNLI accuracy decreases from 71.8\% to 67.7\% as we go from XLM-7 to XLM-100. The same trend can be observed for models trained on the larger CommonCrawl Corpus.
+The issue is even more prominent when the capacity of the model is small. To show this, we pretrain models on Wikipedia Data in 7, 30 and 100 languages. As we add more languages, we make the Transformer wider by increasing the hidden size from 768 to 960 to 1152. In Figure~\ref{fig:capacity}, we show that the added capacity allows XLM-30 to be on par with XLM-7, thus overcoming the curse of multilinguality. The added capacity for XLM-100, however, is not enough
+and it still lags behind due to higher vocabulary dilution (recall from Section~\ref{sec:model+data} that we used a fixed vocabulary size of 150K for all models).
+\paragraph{High-resource vs Low-resource Trade-off.}
+The allocation of the model capacity across languages is controlled by several parameters: the training set size, the size of the shared subword vocabulary, and the rate at which we sample training examples from each language. We study the effect of sampling on the performance of high-resource (English and French) and low-resource (Swahili and Urdu) languages for an XLM-100 model trained on Wikipedia (we observe a similar trend for the construction of the subword vocab). Specifically, we investigate the impact of varying the $\alpha$ parameter which controls the exponential smoothing of the language sampling rate. Similar to~\citet{lample2019cross}, we use a sampling rate proportional to the number of sentences in each corpus.  Models trained with higher values of $\alpha$ see batches of high-resource languages more often.
+Figure~\ref{fig:alpha} shows that the higher the value of $\alpha$, the better the performance on high-resource languages, and vice-versa. When considering overall performance, we found $0.3$ to be an optimal value for $\alpha$, and use this for \xlmr.
+\paragraph{Importance of Capacity and Vocabulary.}
+In previous sections and in Figure~\ref{fig:capacity}, we showed the importance of scaling the model size as we increase the number of languages. Similar to the overall model size, we argue that scaling the size of the shared vocabulary (the vocabulary capacity) can improve the performance of multilingual models on downstream tasks. To illustrate this effect, we train XLM-100 models on Wikipedia data with different vocabulary sizes. We keep the overall number of parameters constant by adjusting the width of the transformer. Figure~\ref{fig:vocab} shows that even with a fixed capacity, we observe a 2.8\% increase in XNLI average accuracy as we increase the vocabulary size from 32K to 256K. This suggests that multilingual models can benefit from allocating a higher proportion of the total number of parameters to the embedding layer even though this reduces the size of the Transformer.
+%With bigger models, we believe that using a vocabulary of up to 2 million tokens with an adaptive softmax~\cite{grave2017efficient,baevski2018adaptive} should improve performance even further, but we leave this exploration to future work.
+For simplicity and given the softmax computational constraints, we use a vocabulary of 250k for \xlmr.
+We further illustrate the importance of this parameter, by training three models with the same transformer architecture (BERT\textsubscript{Base}) but with different vocabulary sizes: 128K, 256K and 512K. We observe more than 3\% gains in overall accuracy on XNLI by simply increasing the vocab size from 128k to 512k.
+\paragraph{Larger-scale Datasets and Training.}
+As shown in Figure~\ref{fig:wikivscc}, the CommonCrawl Corpus that we collected has significantly more monolingual data than the previously used Wikipedia corpora. Figure~\ref{fig:curse} shows that for the same BERT\textsubscript{Base} architecture, all models trained on CommonCrawl obtain significantly better performance.
+Apart from scaling the training data, \citet{roberta2019} also showed the benefits of training MLMs longer. In our experiments, we observed similar effects of large-scale training, such as increasing batch size (see Figure~\ref{fig:batch}) and training time, on model performance. Specifically, we found that using validation perplexity as a stopping criterion for pretraining caused the multilingual MLM in \citet{lample2019cross} to be under-tuned. In our experience, performance on downstream tasks continues to improve even after validation perplexity has plateaued. Combining this observation with our implementation of the unsupervised XLM-MLM objective, we were able to improve the performance of \citet{lample2019cross} from 71.3\% to more than 75\% average accuracy on XNLI, which was on par with their supervised translation language modeling (TLM) objective. Based on these results, and given our focus on unsupervised learning, we decided to not use the supervised TLM objective for training our models.
+\paragraph{Simplifying Multilingual Tokenization with Sentence Piece.}
+The different language-specific tokenization tools
+used by mBERT and XLM-100 make these models more difficult to use on raw text. Instead, we train a Sentence Piece model (SPM) and apply it directly on raw text data for all languages. We did not observe any loss in performance for models trained with SPM when compared to models trained with language-specific preprocessing and byte-pair encoding (see Figure~\ref{fig:batch}) and hence use SPM for \xlmr.
+\subsection{Cross-lingual Understanding Results}
+Based on these results, we adapt the setting of \citet{lample2019cross} and use a large Transformer model with 24 layers and 1024 hidden states, with a 250k vocabulary. We use the multilingual MLM loss and train our \xlmr model for 1.5 Million updates on five-hundred 32GB Nvidia V100 GPUs with a batch size of 8192. We leverage the SPM-preprocessed text data from CommonCrawl in 100 languages and sample languages with $\alpha=0.3$. In this section, we show that it outperforms all previous techniques on cross-lingual benchmarks while getting performance on par with RoBERTa on the GLUE benchmark.
+\insertXNLItable
+\paragraph{XNLI.}
+Table~\ref{tab:xnli} shows XNLI results and adds some additional details: (i) the number of models the approach induces (\#M), (ii) the data on which the model was trained (D), and (iii) the number of languages the model was pretrained on (\#lg). As we show in our results, these parameters significantly impact performance. Column \#M specifies whether model selection was done separately on the dev set of each language ($N$ models), or on the joint dev set of all the languages (single model). We observe a 0.6 decrease in overall accuracy when we go from $N$ models to a single model - going from 71.3 to 70.7. We encourage the community to adopt this setting. For cross-lingual transfer, while this approach is not fully zero-shot transfer, we argue that in real applications, a small amount of supervised data is often available for validation in each language.
+\xlmr sets a new state of the art on XNLI.
+On cross-lingual transfer, \xlmr obtains 80.9\%  accuracy, outperforming the XLM-100 and \mbert open-source models by 10.2\% and 14.6\% average accuracy. On the Swahili and Urdu low-resource languages, \xlmr outperforms XLM-100 by 15.7\% and 11.4\%, and \mbert by 23.5\% and 15.8\%. While \xlmr handles 100 languages, we also show that it outperforms the former state of the art Unicoder~\citep{huang2019unicoder} and XLM (MLM+TLM),  which handle only 15 languages, by 5.5\% and 5.8\% average accuracy respectively. Using the multilingual training of translate-train-all, \xlmr further improves performance and reaches 83.6\% accuracy, a new overall state of the art for XNLI, outperforming Unicoder by 5.1\%. Multilingual training is similar to practical applications where training sets are available in various languages for the same task. In the case of XNLI, datasets have been translated, and translate-train-all can be seen as some form of cross-lingual data augmentation~\cite{singh2019xlda}, similar to back-translation~\cite{xie2019unsupervised}.
+\insertNER
+\paragraph{Named Entity Recognition.}
+In Table~\ref{tab:ner}, we report results of \xlmr and \mbert on CoNLL-2002 and CoNLL-2003. We consider the LSTM + CRF approach from \citet{lample-etal-2016-neural} and the Flair model from \citet{akbik2018coling} as baselines. We evaluate the performance of the model on each of the target languages in three different settings: (i) train on English data only (en) (ii) train on data in target language (each) (iii) train on data in all languages (all). Results of \mbert are reported from \citet{wu2019beto}. Note that we do not use a linear-chain CRF on top of \xlmr and \mbert representations, which gives an advantage to \citet{akbik2018coling}. Without the CRF, our \xlmr model still performs on par with the state of the art, outperforming \citet{akbik2018coling} on Dutch by $2.09$ points. On this task, \xlmr also outperforms \mbert by 2.42 F1 on average for cross-lingual transfer, and 1.86 F1 when trained on each language. Training on all languages leads to an average F1 score of 89.43\%, outperforming cross-lingual transfer approach by 8.49\%.
+\paragraph{Question Answering.}
+We also obtain new state of the art results on the MLQA cross-lingual question answering benchmark, introduced by \citet{lewis2019mlqa}. We follow their procedure by training on the English training data and evaluating on the 7 languages of the dataset.
+We report results in Table~\ref{tab:mlqa}.
+\xlmr obtains F1 and accuracy scores of 70.7\% and 52.7\% while the previous state of the art was 61.6\% and 43.5\%. \xlmr also outperforms \mbert by 13.0\% F1-score and 11.1\% accuracy. It even outperforms BERT-Large on English, confirming its strong monolingual performance.
+\insertMLQA
+\subsection{Multilingual versus Monolingual}
+\label{sec:multimono}
+In this section, we present results of multilingual XLM models against monolingual BERT models.
+\paragraph{GLUE: \xlmr versus RoBERTa.}
+Our goal is to obtain a multilingual model with strong performance on both, cross-lingual understanding tasks as well as natural language understanding tasks for each language. To that end, we evaluate \xlmr on the GLUE benchmark. We show in Table~\ref{tab:glue}, that \xlmr obtains better average dev performance than BERT\textsubscript{Large} by 1.6\% and reaches performance on par with XLNet\textsubscript{Large}. The RoBERTa model outperforms \xlmr by only 1.0\% on average. We believe future work can reduce this gap even further by alleviating the curse of multilinguality and vocabulary dilution. These results demonstrate the possibility of learning one model for many languages while maintaining strong performance on per-language downstream tasks.
+\insertGlue
+\paragraph{XNLI: XLM versus BERT.}
+A recurrent criticism against multilingual models is that they obtain worse performance than their monolingual counterparts. In addition to the comparison of \xlmr and RoBERTa, we provide the first comprehensive study to assess this claim on the XNLI benchmark. We extend our comparison between multilingual XLM models and monolingual BERT models on 7 languages and compare performance in Table~\ref{tab:multimono}. We train 14 monolingual BERT models on Wikipedia and CommonCrawl (capped at 60 GiB),
+%\footnote{For simplicity, we use a reduced version of our corpus by capping the size of each monolingual dataset to 60 GiB.}
+and two XLM-7 models. We increase the vocabulary size of the multilingual model for a better comparison.
+% To our surprise - and backed by further study on internal benchmarks -
+We found that \textit{multilingual models can outperform their monolingual BERT counterparts}. Specifically, in Table~\ref{tab:multimono}, we show that for cross-lingual transfer, monolingual baselines outperform XLM-7 for both Wikipedia and CC by 1.6\% and 1.3\% average accuracy. However, by making use of multilingual training (translate-train-all) and leveraging training sets coming from multiple languages, XLM-7 can outperform the BERT models: our XLM-7 trained on CC obtains 80.0\% average accuracy on the 7 languages, while the average performance of BERT models trained on CC is 77.5\%. This is a surprising result that shows that the capacity of multilingual models to leverage training data coming from multiple languages for a particular task can overcome the capacity dilution problem to obtain better overall performance.
+\insertMultiMono
+\subsection{Representation Learning for Low-resource Languages}
+We observed in Table~\ref{tab:multimono} that pretraining on Wikipedia for Swahili and Urdu performed similarly to a randomly initialized model; most likely due to the small size of the data for these languages. On the other hand, pretraining on CC improved performance by up to 10 points. This confirms our assumption that mBERT and XLM-100 rely heavily on cross-lingual transfer but do not model the low-resource languages as well as \xlmr. Specifically, in the translate-train-all setting, we observe that the biggest gains for XLM models trained on CC, compared to their Wikipedia counterparts, are on low-resource languages; 7\% and 4.8\% improvement on Swahili and Urdu respectively.
+\section{Conclusion}
+In this work, we introduced \xlmr, our new state of the art multilingual masked language model trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages. We show that it provides strong gains over previous multilingual models like \mbert and XLM on classification, sequence labeling and question answering. We exposed the limitations of multilingual MLMs, in particular by uncovering the high-resource versus low-resource trade-off, the curse of multilinguality and the importance of key hyperparameters. We also expose the surprising effectiveness of multilingual models over monolingual models, and show strong improvements on low-resource languages.
+% \section*{Acknowledgements}
+\bibliography{acl2020}
+\bibliographystyle{acl_natbib}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% DELETE THIS PART. DO NOT PLACE CONTENT AFTER THE REFERENCES!
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+ \newpage
+ \clearpage
+ \appendix
+ \onecolumn
+ \section*{Appendix}
+ \section{Languages and statistics for CC-100 used by \xlmr}
+ In this section we present the list of languages in the CC-100 corpus we created for training \xlmr. We also report statistics such as the number of tokens and the size of each monolingual corpus.
+ \label{sec:appendix_A}
+ \insertDataStatistics
+% \newpage
+ \section{Model Architectures and Sizes}
+ As we showed in section~\ref{sec:analysis}, capacity is an important parameter for learning strong cross-lingual representations. In the table below, we list multiple monolingual and multilingual models used by the research community and summarize their architectures and total number of parameters.
+ \label{sec:appendix_B}
+\insertParameters
+% \section{Do \emph{not} have an appendix here}
+% \textbf{\emph{Do not put content after the references.}}
+% %
+% Put anything that you might normally include after the references in a separate
+% supplementary file.
+% We recommend that you build supplementary material in a separate document.
+% If you must create one PDF and cut it up, please be careful to use a tool that
+% doesn't alter the margins, and that doesn't aggressively rewrite the PDF file.
+% pdftk usually works fine.
+% \textbf{Please do not use Apple's preview to cut off supplementary material.} In
+% previous years it has altered margins, and created headaches at the camera-ready
+% stage.
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\end{document}

references/2019.arxiv.ho/paper.md ADDED Viewed

	@@ -0,0 +1,220 @@

+---
+title: "Emotion Recognition for Vietnamese Social Media Text"
+authors:
+  - "Vong Anh Ho"
+  - "Duong Huynh-Cong Nguyen"
+  - "Danh Hoang Nguyen"
+  - "Linh Thi-Van Pham"
+  - "Duc-Vu Nguyen"
+  - "Kiet Van Nguyen"
+  - "Ngan Luu-Thuy Nguyen"
+year: 2019
+venue: "arXiv"
+url: "https://arxiv.org/abs/1911.09339"
+arxiv: "1911.09339"
+---
+\title{Emotion Recognition\\for Vietnamese Social Media Text}
+\titlerunning{Emotion Recognition for Vietnamese Social Media Text}
+\author{Vong Anh Ho\inst{1,4}\textsuperscript{(\Letter)} \and
+Duong Huynh-Cong Nguyen\inst{1,4} \and
+Danh Hoang Nguyen\inst{1,4} \and
+\\Linh Thi-Van Pham\inst{2,4} \and
+Duc-Vu Nguyen\inst{3,4} \and
+\\Kiet Van Nguyen\inst{1,4} \and
+Ngan Luu-Thuy Nguyen\inst{1,4}}
+\authorrunning{Vong Anh Ho et al.}
+\institute{University of Information Technology, VNU-HCM, Vietnam\\
+\email{\{15521025,15520148,15520090\}@gm.uit.edu.vn, \{kietnv,ngannlt\}@uit.edu.vn}\\
+\and
+University of Social Sciences and Humanities, VNU-HCM, Vietnam\\
+\email{vanlinhpham888@gmail.com}\\
+\and
+Multimedia Communications Laboratory, University of Information Technology, VNU-HCM, Vietnam\\
+\email{vund@uit.edu.vn}\\
+\and
+Vietnam National University, Ho Chi Minh City, Vietnam}
+\maketitle
+\begin{abstract}
+Emotion recognition or emotion prediction is a higher approach or a special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of analysis in which the results are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers' comments. In this study, we have achieved two targets. First and foremost, we built a standard **V**ietnamese **S**ocial **M**edia **E**motion **C**orpus (UIT-VSMEC) with exactly 6,927 emotion-annotated sentences, contributing to emotion recognition research in Vietnamese which is a low-resource language in natural language processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC corpus. As a result, the CNN model achieved the highest performance with the weighted F1-score of 59.74\
+\keywords{Emotion Recognition \and Emotion Prediction \and Vietnamese \and Machine Learning \and Deep Learning \and CNN \and LSTM \and SVM.}
+\end{abstract}
+\input{sections/1-introduction.tex}
+\input{sections/2-relatedwork.tex}
+\input{sections/3-corpus.tex}
+\input{sections/4-method.tex}
+\input{sections/5-experiments.tex}
+\input{sections/6-conclusion.tex}
+# Acknowledgment
+We would like to give our thanks to the NLP@UIT research group and the Citynow-UIT Laboratory of the University of Information Technology - Vietnam National University Ho Chi Minh City for their supports with pragmatic and inspiring advice.
+\bibliographystyle{splncs04}
+\begin{thebibliography}{10}
+\providecommand{\url}[1]{`#1`}
+\providecommand{\urlprefix}{URL }
+\providecommand{\doi}[1]{https://doi.org/#1}
+\bibitem{PlabanKumarBhowmick}
+{Bhowmick}, P.K., {Basu}, A., {Mitra}, P.: {An Agreement Measure for
+  Determining Inter-Annotator Reliability of Human Judgements on Affective
+  Tex}. In: {Proceedings of the workshop on Human Judgements in Computational
+  Linguistics}. pp. 58--65. COLING 2008, Manchester, United Kingdom (2008)
+\bibitem{Jointstockcompany}
+company, J.S.: {The habit of using social networks of Vietnamese people 2018}.
+  brands vietnam, Ho Chi Minh City, Vietnam (2018)
+\bibitem{Ekman1993}
+{Ekman}, P.: In: {Facial expression and emotion}. vol.~48, pp. 384--392.
+  {American Psychologist} (1993)
+\bibitem{Ekman}
+{Ekman}, P.: In: {Emotions revealed: Recognizing faces and feelings to improve
+  communication and emotional life}. p.~2007. {Macmillan} (2012)
+\bibitem{PaulEkman}
+{Ekman}, P., {Ekman}, E., {Lama}, D.: In: {The Ekmans' Atlas of Emotion} (2018)
+\bibitem{Kim}
+{Kim}, Y.: {Convolutional Neural Networks for Sentence Classifications}. In:
+  {Proceedings of the 2014 Conference on Empirical Methods in Natural Language
+  Processing (EMNLP)}. pp. 1746--1751. { Association for Computational
+  Linguistics}, Doha, Qatar (2014)
+\bibitem{Kiritchenko}
+{Kiritchenko}, S., {Mohammad}, S.: {Using Hashtags to Capture Fine Emotion
+  Categories from Tweets}. In: {Computational Intelligence}. pp. 301--326
+  (2015)
+\bibitem{RomanKlinger}
+{Klinger}, R., {Clerc}, O.D., {Mohammad}, S.M., {Balahur}, A.: {IEST:WASSA-2018
+  Implicit Emotions Shared Task}. pp. 31--42. 2017 AFNLP, Brussels, Belgium
+  (2018)
+\bibitem{BernhardKratzwald}
+{Kratzwald}, B., {Ilic}, S., {Kraus}, M., S.~{Feuerriegel}, H.P.: {Decision
+  support with text-based emotion recognition: Deep learning for affective
+  computing}. pp. 24 -- 35. {Decision Support Systems} (2018)
+\bibitem{SaifMohammad2017}
+{Mohammad}, S., {Bravo-Marquez}, F.: {Emotion Intensities in Tweets}. In:
+  {Proceedings of the Sixth Joint Conference on Lexical and Computational
+  Semantics (*SEM)}. pp. 65--77. Association for Computational Linguistics,
+  Vancouver, Canada (2017)
+\bibitem{Mohammad}
+{Mohammad}, S.M.: {\#Emotional Tweets}. In: {First Joint Conference on Lexical
+  and Computational Semantics (*SEM)}. pp. 246--255. {Association for
+  Computational Linguistics}, Montreal, Canada (2012)
+\bibitem{Mohammad2018}
+{Mohammad}, S.M., {Bravo-Marquez}, F., {Salameh}, M., {Kiritchenko}, S.:
+  {SemEval-2018 task 1: Affect in tweets}. pp. 1--17. Proceedings of
+  International Workshop on Semantic Evaluation, New Orleans, Louisiana (2018)
+\bibitem{SaifMohammad}
+{Mohammad}, S.M., {Xiaodan}, Z., {Kiritchenko}, S., {Martin}, J.: {Sentiment,
+  emotion, purpose, and style in electoral tweets}. pp. 480--499. Information
+  Processing and Management: an International Journal (2015)
+\bibitem{Nguyen}
+Nguyen: {Vietnam has the 7th largest number of Facebook users in the world}.
+  Dan Tri newspaper (2018)
+\bibitem{VLSPX}
+{Nguyen}, H.T.M., {Nguyen}, H.V., {Ngo}, Q.T., {Vu}, L.X., {Tran}, V.M., {Ngo},
+  B.X., {Le}, C.A.: {VLSP Shared Task: Sentiment Analysis}. In: {Journal of
+  Computer Science and Cybernetics}. pp. 295--310 (2018)
+\bibitem{KietVanNguyen}
+{Nguyen}, K.V., {Nguyen}, V.D., {Nguyen}, P., {Truong}, T., {Nguyen}, N.L.T.:
+  {UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis}.
+  In: {2018 10th International Conference on Knowledge and Systems Engineering
+  (KSE)}. pp. 19--24. {IEEE}, Ho Chi Minh City, Vietnam (2018)
+\bibitem{PhuNguyen}
+{Nguyen}, P.X.V., {Truong}, T.T.H., {Nguyen}, K.V., {Nguyen}, N.L.T.: {Deep
+  Learning versus Traditional Classifiers on Vietnamese Students' Feedback
+  Corpus}. In: {2018 5th NAFOSTED Conference on Information and Computer
+  Science (NICS)}. pp. 75--80. Ho Chi Minh City, Vietnam (2018)
+\bibitem{VuDucNguyen}
+{Nguyen}, V.D., {Nguyen}, K.V., {Nguyen}, N.L.T.: {Variants of Long Short-Term
+  Memory for Sentiment Analysis on Vietnamese Students’ Feedback Corpus}. In:
+  {2018 10th International Conference on Knowledge and Systems Engineering
+  (KSE)}. pp. 306--311. IEEE, Ho Chi Minh City, Vietnam (2018)
+\bibitem{AurelienGeron}
+Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
+  O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
+  Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.:
+  Scikit-learn: Machine learning in python. Journal of Machine Learning
+  Research  **12**,  2825--2830 (2011)
+\bibitem{CarloStrapparava}
+{Strapparava}, C., {Mihalcea}, R.: {SemEval-2007 Task 14: Affective Text}. In:
+  {Proceedings of the 4th International Workshop on Semantic Evaluations
+  (SemEval-2007)}. pp. 70--74. { Association for Computational Linguistics},
+  Prague (2007)
+\bibitem{TingweiWang}
+{T. {Wang} and X. {Yang} and C. {Ouyang}}: {A Multi-emotion Classification
+  Method Based on BLSTM-MC in Code-Switching Text: 7th CCF International
+  Conference, NLPCC 2018, Hohhot, China, August 26–30, 2018, Proceedings,
+  Part II.} pp. 190--199. Natural Language Processing and Chinese Computing
+  (2018)
+\bibitem{ZhongqingWang}
+{Wang}, Z., {Li}, S.: {Overview of NLPCC 2018 Shared Task 1: Emotion Detection
+  in Code-Switching Text: 7th CCF International Conference, NLPCC 2018, Hohhot,
+  China, August 26–30, 2018, Proceedings, Part II}. pp. 429--433. Natural
+  Language Processing and Chinese Computing (2018)
+\bibitem{Facial2007}
+{Zhang}, S., {Wu}, Z., {Meng}, H., {Cai}, L.: Facial expression synthesis using
+  pad emotional parameters for a chinese expressive avatar. vol.~4738, pp.
+  24--35 (09 2007)
+\bibitem{YingjieZhang}
+{Zhang}, Y., {Wallace}, B.C.: {A Sensitivity Analysis of (and Practitioners’
+  Guide to Convolutional}. pp. 253--263. 2017 AFNLP, Taipei, Taiwan (2017)
+\bibitem
+{Nguyen}, K.V., {Nguyen}, N.L.T., 2016, October. {Vietnamese transition-based dependency parsing with supertag features}. In 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE) (pp. 175-180). IEEE.
+\bibitem{nguyen2014treebank}
+Nguyen, D.Q., Pham, S.B., Nguyen, P.T., Le Nguyen, M., et al.: From treebankconversion to automatic dependency parsing for vietnamese. In: International Con-ference on Applications of Natural Language to Data Bases/Information Systems.pp. 196–207. Springer (2014)
+\bibitem{nguyen2016vietnamese}
+Nguyen, K.V., Nguyen, N.L.T.: Vietnamese transition-based dependency parsingwith supertag features. In: 2016 Eighth International Conference on Knowledgeand Systems Engineering (KSE). pp. 175–180. IEEE (2016)
+\bibitem{bach2018empirical}
+Bach, N.X., Linh, N.D., Phuong, T.M.: An empirical study on pos tagging forvietnamese social media text. Computer Speech \& Language50, 1–15 (2018)
+\bibitem{nguyen2017word}
+Nguyen, D.Q., Vu, T., Nguyen, D.Q., Dras, M., Johnson, M.: From word segmen-tation to pos tagging for vietnamese. arXiv preprint arXiv:1711.04951 (2017)
+\bibitem{thao2007named}
+Thao, P.T.X., Tri, T.Q., Dien, D., Collier, N.: Named entity recognition in viet-namese using classifier voting. ACM Transactions on Asian Language InformationProcessing (TALIP)6(4), 1–18 (2007)
+\bibitem{nguyen2016approach}
+Nguyen, L.H., Dinh, D., Tran, P.: An approach to construct a named entity anno-tated english-vietnamese bilingual corpus. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)16(2), 1–17 (2016)
+\bibitem{Nguyen_2009}
+Nguyen, D.Q., Nguyen, D.Q., Pham, S.B.: A vietnamese question answeringsystem. 2009 International Conference on Knowledge and Systems Engineering(Oct 2009). https://doi.org/10.1109/kse.2009.42,http://dx.doi.org/10.1109/KSE.2009.42
+\bibitem{le2018factoid}
+Le-Hong, P., Bui, D.T.: A factoid question answering system for vietnamese. In:Companion Proceedings of the The Web Conference 2018. pp. 1049–1055 (2018)
+\end{thebibliography}

references/2019.arxiv.ho/paper.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:583f61e2334e8547aba92a311850d5fb7b6dbac7301b0d9af9186c3ffb7aed60
+size 365205

references/2019.arxiv.ho/paper.tex ADDED Viewed

	@@ -0,0 +1,239 @@

+\documentclass[runningheads]{llncs}
+\usepackage{graphicx}
+\usepackage{marvosym}
+\usepackage{amsmath,amssymb,amsfonts}
+\usepackage{algorithmic}
+\usepackage{graphicx}
+\usepackage{textcomp}
+\usepackage[T5]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[vietnamese,english]{babel}
+\usepackage{pifont}
+\usepackage{float}
+\usepackage{caption}
+\usepackage{placeins}
+\usepackage{array}
+\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
+\newcolumntype{M}[1]{>{\centering\arraybackslash}m{#1}}
+\usepackage{multirow}
+\usepackage{hyperref}
+\usepackage{amsmath}
+\hypersetup{colorlinks, citecolor=blue, linkcolor=blue, urlcolor=blue}
+\begin{document}
+% \title{Emotion Recognition for Vietnamese Social Media Text}
+\title{Emotion Recognition\\for Vietnamese Social Media Text}
+\titlerunning{Emotion Recognition for Vietnamese Social Media Text}
+\author{Vong Anh Ho\inst{1,4}\textsuperscript{(\Letter)} \and
+Duong Huynh-Cong Nguyen\inst{1,4} \and
+Danh Hoang Nguyen\inst{1,4} \and
+\\Linh Thi-Van Pham\inst{2,4} \and
+Duc-Vu Nguyen\inst{3,4} \and
+\\Kiet Van Nguyen\inst{1,4} \and
+Ngan Luu-Thuy Nguyen\inst{1,4}}
+\authorrunning{Vong Anh Ho et al.}
+\institute{University of Information Technology, VNU-HCM, Vietnam\\
+\email{\{15521025,15520148,15520090\}@gm.uit.edu.vn, \{kietnv,ngannlt\}@uit.edu.vn}\\
+\and
+University of Social Sciences and Humanities, VNU-HCM, Vietnam\\
+\email{vanlinhpham888@gmail.com}\\
+\and
+Multimedia Communications Laboratory, University of Information Technology, VNU-HCM, Vietnam\\
+\email{vund@uit.edu.vn}\\
+\and
+Vietnam National University, Ho Chi Minh City, Vietnam}
+\maketitle
+\begin{abstract}
+Emotion recognition or emotion prediction is a higher approach or a special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of analysis in which the results are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers' comments. In this study, we have achieved two targets. First and foremost, we built a standard \textbf{V}ietnamese \textbf{S}ocial \textbf{M}edia \textbf{E}motion \textbf{C}orpus (UIT-VSMEC) with exactly 6,927 emotion-annotated sentences, contributing to emotion recognition research in Vietnamese which is a low-resource language in natural language processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC corpus. As a result, the CNN model achieved the highest performance with the weighted F1-score of 59.74\%. Our corpus is available at our research website \footnote[1]{\url{ https://sites.google.com/uit.edu.vn/uit-nlp/corpora-projects}}.
+\keywords{Emotion Recognition \and Emotion Prediction \and Vietnamese \and Machine Learning \and Deep Learning \and CNN \and LSTM \and SVM.}
+\end{abstract}
+\input{sections/1-introduction.tex}
+\input{sections/2-relatedwork.tex}
+\input{sections/3-corpus.tex}
+\input{sections/4-method.tex}
+\input{sections/5-experiments.tex}
+\input{sections/6-conclusion.tex}
+\section*{Acknowledgment}
+We would like to give our thanks to the NLP@UIT research group and the Citynow-UIT Laboratory of the University of Information Technology - Vietnam National University Ho Chi Minh City for their supports with pragmatic and inspiring advice.
+\bibliographystyle{splncs04}
+% \bibliography{bibliography}
+\begin{thebibliography}{10}
+\providecommand{\url}[1]{\texttt{#1}}
+\providecommand{\urlprefix}{URL }
+\providecommand{\doi}[1]{https://doi.org/#1}
+\bibitem{PlabanKumarBhowmick}
+{Bhowmick}, P.K., {Basu}, A., {Mitra}, P.: {An Agreement Measure for
+  Determining Inter-Annotator Reliability of Human Judgements on Affective
+  Tex}. In: {Proceedings of the workshop on Human Judgements in Computational
+  Linguistics}. pp. 58--65. COLING 2008, Manchester, United Kingdom (2008)
+\bibitem{Jointstockcompany}
+company, J.S.: {The habit of using social networks of Vietnamese people 2018}.
+  brands vietnam, Ho Chi Minh City, Vietnam (2018)
+\bibitem{Ekman1993}
+{Ekman}, P.: In: {Facial expression and emotion}. vol.~48, pp. 384--392.
+  {American Psychologist} (1993)
+\bibitem{Ekman}
+{Ekman}, P.: In: {Emotions revealed: Recognizing faces and feelings to improve
+  communication and emotional life}. p.~2007. {Macmillan} (2012)
+\bibitem{PaulEkman}
+{Ekman}, P., {Ekman}, E., {Lama}, D.: In: {The Ekmans' Atlas of Emotion} (2018)
+\bibitem{Kim}
+{Kim}, Y.: {Convolutional Neural Networks for Sentence Classifications}. In:
+  {Proceedings of the 2014 Conference on Empirical Methods in Natural Language
+  Processing (EMNLP)}. pp. 1746--1751. { Association for Computational
+  Linguistics}, Doha, Qatar (2014)
+\bibitem{Kiritchenko}
+{Kiritchenko}, S., {Mohammad}, S.: {Using Hashtags to Capture Fine Emotion
+  Categories from Tweets}. In: {Computational Intelligence}. pp. 301--326
+  (2015)
+\bibitem{RomanKlinger}
+{Klinger}, R., {Clerc}, O.D., {Mohammad}, S.M., {Balahur}, A.: {IEST:WASSA-2018
+  Implicit Emotions Shared Task}. pp. 31--42. 2017 AFNLP, Brussels, Belgium
+  (2018)
+\bibitem{BernhardKratzwald}
+{Kratzwald}, B., {Ilic}, S., {Kraus}, M., S.~{Feuerriegel}, H.P.: {Decision
+  support with text-based emotion recognition: Deep learning for affective
+  computing}. pp. 24 -- 35. {Decision Support Systems} (2018)
+\bibitem{SaifMohammad2017}
+{Mohammad}, S., {Bravo-Marquez}, F.: {Emotion Intensities in Tweets}. In:
+  {Proceedings of the Sixth Joint Conference on Lexical and Computational
+  Semantics (*SEM)}. pp. 65--77. Association for Computational Linguistics,
+  Vancouver, Canada (2017)
+\bibitem{Mohammad}
+{Mohammad}, S.M.: {\#Emotional Tweets}. In: {First Joint Conference on Lexical
+  and Computational Semantics (*SEM)}. pp. 246--255. {Association for
+  Computational Linguistics}, Montreal, Canada (2012)
+\bibitem{Mohammad2018}
+{Mohammad}, S.M., {Bravo-Marquez}, F., {Salameh}, M., {Kiritchenko}, S.:
+  {SemEval-2018 task 1: Affect in tweets}. pp. 1--17. Proceedings of
+  International Workshop on Semantic Evaluation, New Orleans, Louisiana (2018)
+\bibitem{SaifMohammad}
+{Mohammad}, S.M., {Xiaodan}, Z., {Kiritchenko}, S., {Martin}, J.: {Sentiment,
+  emotion, purpose, and style in electoral tweets}. pp. 480--499. Information
+  Processing and Management: an International Journal (2015)
+\bibitem{Nguyen}
+Nguyen: {Vietnam has the 7th largest number of Facebook users in the world}.
+  Dan Tri newspaper (2018)
+\bibitem{VLSPX}
+{Nguyen}, H.T.M., {Nguyen}, H.V., {Ngo}, Q.T., {Vu}, L.X., {Tran}, V.M., {Ngo},
+  B.X., {Le}, C.A.: {VLSP Shared Task: Sentiment Analysis}. In: {Journal of
+  Computer Science and Cybernetics}. pp. 295--310 (2018)
+\bibitem{KietVanNguyen}
+{Nguyen}, K.V., {Nguyen}, V.D., {Nguyen}, P., {Truong}, T., {Nguyen}, N.L.T.:
+  {UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis}.
+  In: {2018 10th International Conference on Knowledge and Systems Engineering
+  (KSE)}. pp. 19--24. {IEEE}, Ho Chi Minh City, Vietnam (2018)
+\bibitem{PhuNguyen}
+{Nguyen}, P.X.V., {Truong}, T.T.H., {Nguyen}, K.V., {Nguyen}, N.L.T.: {Deep
+  Learning versus Traditional Classifiers on Vietnamese Students' Feedback
+  Corpus}. In: {2018 5th NAFOSTED Conference on Information and Computer
+  Science (NICS)}. pp. 75--80. Ho Chi Minh City, Vietnam (2018)
+\bibitem{VuDucNguyen}
+{Nguyen}, V.D., {Nguyen}, K.V., {Nguyen}, N.L.T.: {Variants of Long Short-Term
+  Memory for Sentiment Analysis on Vietnamese Students’ Feedback Corpus}. In:
+  {2018 10th International Conference on Knowledge and Systems Engineering
+  (KSE)}. pp. 306--311. IEEE, Ho Chi Minh City, Vietnam (2018)
+\bibitem{AurelienGeron}
+Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
+  O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
+  Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.:
+  Scikit-learn: Machine learning in python. Journal of Machine Learning
+  Research  \textbf{12},  2825--2830 (2011)
+\bibitem{CarloStrapparava}
+{Strapparava}, C., {Mihalcea}, R.: {SemEval-2007 Task 14: Affective Text}. In:
+  {Proceedings of the 4th International Workshop on Semantic Evaluations
+  (SemEval-2007)}. pp. 70--74. { Association for Computational Linguistics},
+  Prague (2007)
+\bibitem{TingweiWang}
+{T. {Wang} and X. {Yang} and C. {Ouyang}}: {A Multi-emotion Classification
+  Method Based on BLSTM-MC in Code-Switching Text: 7th CCF International
+  Conference, NLPCC 2018, Hohhot, China, August 26–30, 2018, Proceedings,
+  Part II.} pp. 190--199. Natural Language Processing and Chinese Computing
+  (2018)
+\bibitem{ZhongqingWang}
+{Wang}, Z., {Li}, S.: {Overview of NLPCC 2018 Shared Task 1: Emotion Detection
+  in Code-Switching Text: 7th CCF International Conference, NLPCC 2018, Hohhot,
+  China, August 26–30, 2018, Proceedings, Part II}. pp. 429--433. Natural
+  Language Processing and Chinese Computing (2018)
+\bibitem{Facial2007}
+{Zhang}, S., {Wu}, Z., {Meng}, H., {Cai}, L.: Facial expression synthesis using
+  pad emotional parameters for a chinese expressive avatar. vol.~4738, pp.
+  24--35 (09 2007)
+\bibitem{YingjieZhang}
+{Zhang}, Y., {Wallace}, B.C.: {A Sensitivity Analysis of (and Practitioners’
+  Guide to Convolutional}. pp. 253--263. 2017 AFNLP, Taipei, Taiwan (2017)
+\bibitem
+{Nguyen}, K.V., {Nguyen}, N.L.T., 2016, October. {Vietnamese transition-based dependency parsing with supertag features}. In 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE) (pp. 175-180). IEEE.
+\bibitem{nguyen2014treebank}
+Nguyen, D.Q., Pham, S.B., Nguyen, P.T., Le Nguyen, M., et al.: From treebankconversion to automatic dependency parsing for vietnamese. In: International Con-ference on Applications of Natural Language to Data Bases/Information Systems.pp. 196–207. Springer (2014)
+\bibitem{nguyen2016vietnamese}
+Nguyen, K.V., Nguyen, N.L.T.: Vietnamese transition-based dependency parsingwith supertag features. In: 2016 Eighth International Conference on Knowledgeand Systems Engineering (KSE). pp. 175–180. IEEE (2016)
+\bibitem{bach2018empirical}
+Bach, N.X., Linh, N.D., Phuong, T.M.: An empirical study on pos tagging forvietnamese social media text. Computer Speech \& Language50, 1–15 (2018)
+\bibitem{nguyen2017word}
+Nguyen, D.Q., Vu, T., Nguyen, D.Q., Dras, M., Johnson, M.: From word segmen-tation to pos tagging for vietnamese. arXiv preprint arXiv:1711.04951 (2017)
+\bibitem{thao2007named}
+Thao, P.T.X., Tri, T.Q., Dien, D., Collier, N.: Named entity recognition in viet-namese using classifier voting. ACM Transactions on Asian Language InformationProcessing (TALIP)6(4), 1–18 (2007)
+\bibitem{nguyen2016approach}
+Nguyen, L.H., Dinh, D., Tran, P.: An approach to construct a named entity anno-tated english-vietnamese bilingual corpus. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)16(2), 1–17 (2016)
+\bibitem{Nguyen_2009}
+Nguyen, D.Q., Nguyen, D.Q., Pham, S.B.: A vietnamese question answeringsystem. 2009 International Conference on Knowledge and Systems Engineering(Oct 2009). https://doi.org/10.1109/kse.2009.42,http://dx.doi.org/10.1109/KSE.2009.42
+\bibitem{le2018factoid}
+Le-Hong, P., Bui, D.T.: A factoid question answering system for vietnamese. In:Companion Proceedings of the The Web Conference 2018. pp. 1049–1055 (2018)
+\end{thebibliography}
+\end{document}

references/2019.arxiv.ho/source/bibliography.bib ADDED Viewed

	@@ -0,0 +1,289 @@

+@InProceedings{Kiritchenko,
+  title     = "{Using Hashtags to Capture Fine Emotion Categories from Tweets}",
+  author    = {S. {Kiritchenko} and S. {Mohammad}},
+  booktitle = "{Computational Intelligence}",
+  year      = {2015},
+  pages     = {301-326},
+}
+@InProceedings{BernhardKratzwald,
+  title     = "{Decision support with text-based emotion recognition: Deep learning for affective computing}",
+  author    = { B. {Kratzwald} and S. {Ilic} and M. {Kraus} and S. {Feuerriegel},  H. {Prendinger}},
+  year      = {2018},
+  publisher = "{Decision Support Systems}",
+  pages     = {24 - 35},
+}
+@InProceedings{ApurbaPaul,
+  title     = "{Identification and Classification of Emotional Key Phrases from Psychological texts.}",
+  author    = "{A. {Paul} and D. {Das}}",
+  booktitle = "{Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction}",
+  year      = {2015},
+  publisher = "{Association for Computational Linguistics}",
+  address   = {Beijing, China},
+  pages     = {32 - 38},
+}
+@InProceedings{CarloStrapparava,
+  title     = "{SemEval-2007 Task 14: Affective Text}",
+  author    = {C. {Strapparava} and R. {Mihalcea}},
+  booktitle = "{Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007)}",
+  year      = {2007},
+  publisher = "{ Association for Computational Linguistics}",
+  address = {Prague},
+  pages     = {70-74},
+}
+@InProceedings{tran2009,
+  author    = {O. T. {Tran} and C. A. {Le} and T. Q. {Ha} and Q. H. {Le}},
+  booktitle = "{2009 International Conference on Asian Language Processing}",
+  title     = "{An Experimental Study on Vietnamese POS Tagging}",
+  year      = {2009},
+  pages     = {23-27}
+}
+@InProceedings{Facial2007,
+  author    = {S. {Zhang} and Z. {Wu} and H. {Meng} and L. {Cai}},
+  year      = {2007},
+  month     = {09},
+  pages     = {24-35},
+  title     = {Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar},
+  volume    = {4738}
+}
+@InProceedings{vncorenlp,
+  title     = "{{V}n{C}ore{NLP}: A {V}ietnamese Natural Language Processing Toolkit}",
+  author    = {T. {Vu} and Q. D. {Nguyen} and Q. D. {Nguyen} and M. {Dras} and M. {Johnson}},
+  booktitle = "{Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Demonstrations}",
+  year      = {2018},
+  address   = "{New Orleans, Louisiana}",
+  publisher = "{Association for Computational Linguistics}",
+  pages     = {56-60}
+}
+@InProceedings{Ekman,
+  booktitle = "{Emotions revealed: Recognizing faces and feelings to improve communication and emotional life}",
+  author    = {P. {Ekman}},
+  year      = {2012},
+  publisher = "{Macmillan}",
+  pages     = {2007},
+}
+@InProceedings{Ekman1993,
+  booktitle = "{Facial expression and emotion}",
+  author    = {P. {Ekman}},
+  year      = {1993},
+  publisher = "{American Psychologist}",
+  pages     = {384-392},
+  volume    = {48}
+}
+@InProceedings{VLSPX,
+  title     = "{VLSP Shared Task: Sentiment Analysis}",
+  author    = {H. T. M. {Nguyen} and H. V. {Nguyen} and Q. T. {Ngo} and L. X. {Vu} and V. M. {Tran} and B. X. {Ngo} and C. A. {Le}},
+  booktitle = "{Journal of Computer Science and Cybernetics}",
+  year      = {2018},
+  pages     = {295-310},
+}
+@InProceedings{KietVanNguyen,
+  title     = "{UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis}",
+  author    = {K. V. {Nguyen} and V. D. {Nguyen} and P. {Nguyen} and T. {Truong} and  N. L. T. {Nguyen}},
+  booktitle = "{2018 10th International Conference on Knowledge and Systems Engineering (KSE)}",
+  year      = {2018},
+  publisher = "{IEEE}",
+  pages     = {19-24},
+  address   = {Ho Chi Minh City, Vietnam},
+}
+@InProceedings{PhuNguyen,
+  title     = "{Deep Learning versus Traditional Classifiers on Vietnamese Students' Feedback Corpus}",
+  author    = { P. X. V. {Nguyen} and T. T. H. {Truong} and K. V. {Nguyen} and N. L. T. {Nguyen}},
+  booktitle = "{2018 5th NAFOSTED Conference on Information and Computer Science (NICS)}",
+  year      = {2018},
+  pages     = {75-80},
+  address   = {Ho Chi Minh City, Vietnam},
+}
+@InProceedings{Kim,
+  title     = "{Convolutional Neural Networks for Sentence Classifications}",
+  author    = {Y. {Kim}},
+  booktitle = "{Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)}",
+  year      = {2014},
+  publisher = "{ Association for Computational Linguistics}",
+  pages     = {1746-1751},
+  address   = {Doha, Qatar}
+}
+@InProceedings{Mohammad,
+  title     = "{\#Emotional Tweets}",
+  author    = {S. M. {Mohammad}},
+  booktitle = "{First Joint Conference on Lexical and Computational Semantics (*SEM)}",
+  year      = {2012},
+  publisher = "{Association for Computational Linguistics}",
+  pages     = {246-255},
+  address   = {Montreal, Canada}
+}
+@InProceedings{PaulEkman,
+  booktitle = "{The Ekmans' Atlas of Emotion}",
+  author    = {P. {Ekman} and E. {Ekman} and D. {Lama}},
+  year      = {2018}
+}
+@InProceedings{PlabanKumarBhowmick,
+  title     = "{An Agreement Measure for Determining Inter-Annotator Reliability of Human Judgements on Affective Tex}",
+  author    = {P. K. {Bhowmick} and A. {Basu} and P. {Mitra}},
+  booktitle = "{Proceedings of the workshop on Human Judgements in Computational Linguistics}",
+  publisher = {COLING 2008},
+  year      = {2008},
+  address   = {Manchester, United Kingdom},
+  pages     = {58-65},
+}
+@InProceedings{Nguyen,
+  title     = "{Vietnam has the 7th largest number of Facebook users in the world}",
+  author    = {Nguyen},
+  publisher = {Dan Tri newspaper},
+  year      = {2018}
+}
+@InProceedings{SaifMohammad,
+  title     = "{Sentiment, emotion, purpose, and style in electoral tweets}",
+  author    = {S. M. {Mohammad} and Z. {Xiaodan} and S. {Kiritchenko} and J. {Martin}},
+  publisher = {Information Processing and Management: an International Journal},
+  year      = {2015},
+  pages     = {480-499},
+}
+@InProceedings{Mohammad2018,
+  title     = "{SemEval-2018 task 1: Affect in tweets}",
+  author    = { S. M. {Mohammad} and F. {Bravo-Marquez} and M. {Salameh} and S. {Kiritchenko}},
+  publisher = { Proceedings of International Workshop on Semantic Evaluation},
+  year      = {2018},
+  pages     = {1-17},
+  address   = {New Orleans, Louisiana},
+}
+@InProceedings{SaifMohammad2017,
+  title     = "{Emotion Intensities in Tweets}",
+  author    = {S. {Mohammad} and F. {Bravo-Marquez}},
+  booktitle = "{Proceedings of the Sixth Joint Conference on Lexical and Computational Semantics (*SEM)}",
+  publisher = {Association for Computational Linguistics},
+  year      = {2017},
+  pages     = {65-77},
+  address   = {Vancouver, Canada},
+}
+@InProceedings{smd1,
+  title     = "{Social media data}",
+  author    = {Science and Information Technology},
+  publisher = {Science and Information Technology - University of Information Technology},
+  year      = {2016}
+}
+@InProceedings{TingweiWang,
+  title     = "{A Multi-emotion Classification Method Based on BLSTM-MC in Code-Switching Text: 7th CCF International Conference, NLPCC 2018, Hohhot, China, August 26–30, 2018, Proceedings, Part II.}",
+  author    = "{T. {Wang} and X. {Yang} and C. {Ouyang}}",
+  publisher = {Natural Language Processing and Chinese Computing},
+  year      = {2018},
+  pages     = {190-199},
+}
+@InProceedings{VenkateshDuppada,
+  title     = "{SeerNet at SemEval-2018 Task 1: Domain Adaptation for Affect in Tweets}",
+  author    = {V. {Duppada} and R. {Jain} and S. {Hiray}},
+  booktitle = "{*SEMEVAL}",
+  publisher = {Association for Computational Linguistics},
+  address   = {New Orleans, Louisiana},
+  year      = {2018},
+  pages     = {18-23},
+}
+@InProceedings{VoNgocPhu,
+  title     = "{A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics}",
+  author    = {P. N. {Vo} and C. T. {Vo} and T. T. {Vo} and D. D. {Nguyen}},
+  publisher = {Artificial Intelligence Review},
+  address   = {New Orleans, Louisiana},
+  year      = {2018},
+  pages     = {93-159},
+}
+@InProceedings{VuDucNguyen,
+  title     = "{Variants of Long Short-Term Memory for Sentiment Analysis on Vietnamese Students’ Feedback Corpus}",
+  author    = {V. D. {Nguyen} and K. V. {Nguyen} and N. L. T. {Nguyen}},
+  booktitle = "{2018 10th International Conference on Knowledge and Systems Engineering (KSE)}",
+  publisher = {IEEE},
+  address   = {Ho Chi Minh City, Vietnam},
+  year      = {2018},
+  pages     = {306-311},
+}
+@InProceedings{Jointstockcompany,
+  title     = "{The habit of using social networks of Vietnamese people 2018}",
+  author    = {Joint Stock company},
+  publisher = {brands vietnam},
+  address   = {Ho Chi Minh City, Vietnam},
+  year      = {2018}
+}
+@InProceedings{Yam,
+  title     = "{Emotion Detection and Recognition from Text Using Deep Learning}",
+  author    = {C. Y. {Yam}},
+  year      = {2018},
+  publisher = {Developer blog}
+}
+@InProceedings{YingjieZhang,
+  title     = "{A Sensitivity Analysis of (and Practitioners’ Guide to Convolutional}",
+  author    = {Y. {Zhang} and B. C. {Wallace}},
+  booktiltle= "{Proceedings of the The 8th International Joint Conference on Natural Language Processing}",
+  publisher = {2017 AFNLP},
+  address   = {Taipei, Taiwan},
+  year      = {2017},
+  pages     = {253-263},
+}
+@InProceedings{ZhongqingWang,
+  title     = "{Overview of NLPCC 2018 Shared Task 1: Emotion Detection in Code-Switching Text: 7th CCF International Conference, NLPCC 2018, Hohhot, China, August 26–30, 2018, Proceedings, Part II}",
+  author    = {Z. {Wang} and S. {Li}},
+  publisher = {Natural Language Processing and Chinese Computing},
+  year      = {2018},
+  pages     = {429-433},
+}
+@InProceedings{RomanKlinger,
+  title     = "{IEST:WASSA-2018 Implicit Emotions Shared Task}",
+  author    = {R. {Klinger} and O. D. {Clerc} and S. M. {Mohammad} and A. {Balahur}},
+  booktiltle= "{2018 Association for Computational Linguistics}",
+  publisher = {2017 AFNLP},
+  address   = {Brussels, Belgium},
+  year      = {2018},
+  pages     = {31-42},
+}
+@InProceedings{joulin2016fasttext,
+  title="{FastText.zip: Compressing text classification models}",
+  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas},
+  journal="{arXiv preprint arXiv:1612.03651}",
+  year={2016},
+}
+@Article{AurelienGeron,
+   author    = {Pedregosa, Fabian and Varoquaux, Ga\"{e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, \'{E}douard},
+   title     = {Scikit-learn: Machine Learning in Python},
+   journal   = {Journal of Machine Learning Research},
+   volume    = {12},
+   year      = {2011},
+   pages     = {2825-2830}
+}
+@InProceedings{KietVanNguyen1,
+ title     = "{UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis}",
+ author    = {K. V. {Nguyen} and V. D. {Nguyen} and P. {Nguyen} and T. {Truong} and  N. L. T. {Nguyen}},
+ booktitle = "{2018 10th International Conference on Knowledge and Systems Engineering (KSE)}",
+ year      = {2018},
+ pages     = {19-24},
+ address   = {Ho Chi Minh City, Vietnam},
+}
+@Inproceedings{nguyen2016,
+  title={Vietnamese transition-based dependency parsing with supertag features},
+  author={Nguyen, Kiet V and Nguyen, Ngan Luu-Thuy},
+  booktitle={2016 Eighth International Conference on Knowledge and Systems Engineering (KSE)},
+  pages={175--180},
+  year={2016},
+  organization={IEEE}
+}

references/2019.arxiv.ho/source/images/DataProcessing.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c2e2acb142678c169cb7523e8b6e151b726648fa3d048c149a8594e1791189d
+size 10809

references/2019.arxiv.ho/source/images/cnnmodel.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e96bee03b1dec742297dbdf7c3a9f1fb99386c3b13372549c9e4dba07a59e571
+size 74015

references/2019.arxiv.ho/source/images/con_matrix.png ADDED Viewed

Git LFS Details

SHA256: a153882bb69258af0fbe136853563cec63d231b2b3b4a36a4f4d7f643db4b2b0
Pointer size: 131 Bytes
Size of remote file: 154 kB

references/2019.arxiv.ho/source/images/confusion_matrix.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:523a704f183b8ebe06afda81dee17f7570f4782fd9b0a1ebcdd0f0710923fcd7
+size 21538