tfrere's picture
tfrere HF Staff
update
554248d
# LaTeX Importer
Complete LaTeX to MDX (Markdown + JSX) importer optimized for Astro with advanced support for references, interactive equations, and components.
## ๐Ÿš€ Quick Start
```bash
# Complete LaTeX โ†’ MDX conversion with all features
node index.mjs
# For step-by-step debugging
node latex-converter.mjs # LaTeX โ†’ Markdown
node mdx-converter.mjs # Markdown โ†’ MDX
```
## ๐Ÿ“ Structure
```
latex-importer/
โ”œโ”€โ”€ index.mjs # Complete LaTeX โ†’ MDX pipeline
โ”œโ”€โ”€ latex-converter.mjs # LaTeX โ†’ Markdown with Pandoc
โ”œโ”€โ”€ mdx-converter.mjs # Markdown โ†’ MDX with Astro components
โ”œโ”€โ”€ reference-preprocessor.mjs # LaTeX references cleanup
โ”œโ”€โ”€ post-processor.mjs # Markdown post-processing
โ”œโ”€โ”€ bib-cleaner.mjs # Bibliography cleaner
โ”œโ”€โ”€ filters/
โ”‚ โ””โ”€โ”€ equation-ids.lua # Pandoc filter for KaTeX equations
โ”œโ”€โ”€ input/ # LaTeX sources
โ”‚ โ”œโ”€โ”€ main.tex
โ”‚ โ”œโ”€โ”€ main.bib
โ”‚ โ””โ”€โ”€ sections/
โ””โ”€โ”€ output/ # Results
โ”œโ”€โ”€ main.md # Intermediate Markdown
โ””โ”€โ”€ main.mdx # Final MDX for Astro
```
## โœจ Key Features
### ๐ŸŽฏ **Smart References**
- **Invisible anchors**: Automatic conversion of `\label{}` to `<span id="..." style="position: absolute;"></span>`
- **Clean links**: Identifier cleanup (`:` โ†’ `-`, removing prefixes `sec:`, `fig:`, `eq:`)
- **Cross-references**: Full support for `\ref{}` with functional links
### ๐Ÿงฎ **Interactive Equations**
- **KaTeX IDs**: Conversion of `\label{eq:...}` to `\htmlId{id}{equation}`
- **Equation references**: Clickable links to mathematical equations
- **Advanced KaTeX support**: `trust: true` configuration for `\htmlId{}`
### ๐ŸŽจ **Automatic Styling**
- **Highlights**: `\highlight{text}` โ†’ `<span class="highlight">text</span>`
- **Auto cleanup**: Removal of numbering `(1)`, `(2)`, etc.
- **Astro components**: Images โ†’ `Figure` with automatic imports
### ๐Ÿ”ง **Robust Pipeline**
- **LaTeX preprocessor**: Reference cleanup before Pandoc
- **Lua filter**: Equation processing in Pandoc AST
- **Post-processor**: Markdown cleanup and optimization
- **MDX converter**: Final transformation with Astro components
## ๐Ÿ“Š Example Workflow
```bash
# 1. Prepare LaTeX sources
cp my-paper/* input/
# 2. Complete automatic conversion
node index.mjs
# 3. Generated results
ls output/
# โ†’ main.md (Intermediate Markdown)
# โ†’ main.mdx (Final MDX for Astro)
# โ†’ assets/image/ (extracted images)
```
### ๐Ÿ“‹ Conversion Result
The pipeline generates an MDX file optimized for Astro with:
```mdx
---
title: "Your Article Title"
description: "Generated from LaTeX"
---
import Figure from '../components/Figure.astro';
import figure1 from '../assets/image/figure1.png';
## Section with invisible anchor
<span id="introduction" style="position: absolute;"></span>
Here is some text with <span class="highlight">highlighted words</span>.
Reference to an interactive [equation](#equation-name).
Equation with KaTeX ID:
$$\htmlId{equation-name}{E = mc^2}$$
<Figure src={figure1} alt="Description" />
```
## โš™๏ธ Required Astro Configuration
To use equations with IDs, add to `astro.config.mjs`:
```javascript
import rehypeKatex from 'rehype-katex';
export default defineConfig({
markdown: {
rehypePlugins: [
[rehypeKatex, { trust: true }], // โ† Important for \htmlId{}
],
},
});
```
## ๐Ÿ› ๏ธ Prerequisites
- **Node.js** with ESM support
- **Pandoc** (`brew install pandoc`)
- **Astro** to use the generated MDX
## ๐ŸŽฏ Technical Architecture
### 4-Stage Pipeline
1. **LaTeX Preprocessing** (`reference-preprocessor.mjs`)
- Cleanup of `\label{}` and `\ref{}`
- Conversion `\highlight{}` โ†’ CSS spans
- Removal of prefixes and problematic characters
2. **Pandoc + Lua Filter** (`equation-ids.lua`)
- LaTeX โ†’ Markdown conversion with `gfm+tex_math_dollars+raw_html`
- Equation processing: `\label{eq:name}` โ†’ `\htmlId{name}{equation}`
- Automatic image extraction
3. **Markdown Post-processing** (`post-processor.mjs`)
- KaTeX, Unicode, grouping commands cleanup
- Attribute correction with `:`
- Code snippet injection
4. **MDX Conversion** (`mdx-converter.mjs`)
- Images transformation โ†’ `Figure`
- HTML span escaping correction
- Automatic imports generation
- MDX frontmatter
## ๐Ÿ“Š Conversion Statistics
For a typical scientific document:
- **87 labels** detected and processed
- **48 invisible anchors** created
- **13 highlight spans** with CSS class
- **4 equations** with `\htmlId{}` KaTeX
- **40 images** converted to components
## โœ… Project Status
### ๐ŸŽ‰ **Complete Features**
- โœ… **LaTeX โ†’ MDX Pipeline**: Full end-to-end functional conversion
- โœ… **Cross-document references**: Perfectly functional internal links
- โœ… **Interactive equations**: KaTeX support with clickable IDs
- โœ… **Automatic styling**: Highlights and Astro components
- โœ… **Robustness**: Automatic cleanup of all escaping
- โœ… **Optimization**: Clean code without unnecessary elements
### ๐Ÿš€ **Production Ready**
The toolkit is now **100% operational** for converting complex scientific LaTeX documents to MDX/Astro with all advanced features (references, interactive equations, styling).