Spaces:
Running
Running
File size: 5,405 Bytes
554248d 2225c34 554248d 2225c34 554248d 2225c34 1ee6ce7 2225c34 1ee6ce7 2225c34 1ee6ce7 2225c34 1ee6ce7 2225c34 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# LaTeX Importer
Complete LaTeX to MDX (Markdown + JSX) importer optimized for Astro with advanced support for references, interactive equations, and components.
## ๐ Quick Start
```bash
# Complete LaTeX โ MDX conversion with all features
node index.mjs
# For step-by-step debugging
node latex-converter.mjs # LaTeX โ Markdown
node mdx-converter.mjs # Markdown โ MDX
```
## ๐ Structure
```
latex-importer/
โโโ index.mjs # Complete LaTeX โ MDX pipeline
โโโ latex-converter.mjs # LaTeX โ Markdown with Pandoc
โโโ mdx-converter.mjs # Markdown โ MDX with Astro components
โโโ reference-preprocessor.mjs # LaTeX references cleanup
โโโ post-processor.mjs # Markdown post-processing
โโโ bib-cleaner.mjs # Bibliography cleaner
โโโ filters/
โ โโโ equation-ids.lua # Pandoc filter for KaTeX equations
โโโ input/ # LaTeX sources
โ โโโ main.tex
โ โโโ main.bib
โ โโโ sections/
โโโ output/ # Results
โโโ main.md # Intermediate Markdown
โโโ main.mdx # Final MDX for Astro
```
## โจ Key Features
### ๐ฏ **Smart References**
- **Invisible anchors**: Automatic conversion of `\label{}` to `<span id="..." style="position: absolute;"></span>`
- **Clean links**: Identifier cleanup (`:` โ `-`, removing prefixes `sec:`, `fig:`, `eq:`)
- **Cross-references**: Full support for `\ref{}` with functional links
### ๐งฎ **Interactive Equations**
- **KaTeX IDs**: Conversion of `\label{eq:...}` to `\htmlId{id}{equation}`
- **Equation references**: Clickable links to mathematical equations
- **Advanced KaTeX support**: `trust: true` configuration for `\htmlId{}`
### ๐จ **Automatic Styling**
- **Highlights**: `\highlight{text}` โ `<span class="highlight">text</span>`
- **Auto cleanup**: Removal of numbering `(1)`, `(2)`, etc.
- **Astro components**: Images โ `Figure` with automatic imports
### ๐ง **Robust Pipeline**
- **LaTeX preprocessor**: Reference cleanup before Pandoc
- **Lua filter**: Equation processing in Pandoc AST
- **Post-processor**: Markdown cleanup and optimization
- **MDX converter**: Final transformation with Astro components
## ๐ Example Workflow
```bash
# 1. Prepare LaTeX sources
cp my-paper/* input/
# 2. Complete automatic conversion
node index.mjs
# 3. Generated results
ls output/
# โ main.md (Intermediate Markdown)
# โ main.mdx (Final MDX for Astro)
# โ assets/image/ (extracted images)
```
### ๐ Conversion Result
The pipeline generates an MDX file optimized for Astro with:
```mdx
---
title: "Your Article Title"
description: "Generated from LaTeX"
---
import Figure from '../components/Figure.astro';
import figure1 from '../assets/image/figure1.png';
## Section with invisible anchor
<span id="introduction" style="position: absolute;"></span>
Here is some text with <span class="highlight">highlighted words</span>.
Reference to an interactive [equation](#equation-name).
Equation with KaTeX ID:
$$\htmlId{equation-name}{E = mc^2}$$
<Figure src={figure1} alt="Description" />
```
## โ๏ธ Required Astro Configuration
To use equations with IDs, add to `astro.config.mjs`:
```javascript
import rehypeKatex from 'rehype-katex';
export default defineConfig({
markdown: {
rehypePlugins: [
[rehypeKatex, { trust: true }], // โ Important for \htmlId{}
],
},
});
```
## ๐ ๏ธ Prerequisites
- **Node.js** with ESM support
- **Pandoc** (`brew install pandoc`)
- **Astro** to use the generated MDX
## ๐ฏ Technical Architecture
### 4-Stage Pipeline
1. **LaTeX Preprocessing** (`reference-preprocessor.mjs`)
- Cleanup of `\label{}` and `\ref{}`
- Conversion `\highlight{}` โ CSS spans
- Removal of prefixes and problematic characters
2. **Pandoc + Lua Filter** (`equation-ids.lua`)
- LaTeX โ Markdown conversion with `gfm+tex_math_dollars+raw_html`
- Equation processing: `\label{eq:name}` โ `\htmlId{name}{equation}`
- Automatic image extraction
3. **Markdown Post-processing** (`post-processor.mjs`)
- KaTeX, Unicode, grouping commands cleanup
- Attribute correction with `:`
- Code snippet injection
4. **MDX Conversion** (`mdx-converter.mjs`)
- Images transformation โ `Figure`
- HTML span escaping correction
- Automatic imports generation
- MDX frontmatter
## ๐ Conversion Statistics
For a typical scientific document:
- **87 labels** detected and processed
- **48 invisible anchors** created
- **13 highlight spans** with CSS class
- **4 equations** with `\htmlId{}` KaTeX
- **40 images** converted to components
## โ
Project Status
### ๐ **Complete Features**
- โ
**LaTeX โ MDX Pipeline**: Full end-to-end functional conversion
- โ
**Cross-document references**: Perfectly functional internal links
- โ
**Interactive equations**: KaTeX support with clickable IDs
- โ
**Automatic styling**: Highlights and Astro components
- โ
**Robustness**: Automatic cleanup of all escaping
- โ
**Optimization**: Clean code without unnecessary elements
### ๐ **Production Ready**
The toolkit is now **100% operational** for converting complex scientific LaTeX documents to MDX/Astro with all advanced features (references, interactive equations, styling).
|