File size: 2,794 Bytes
4d6dba6
 
 
 
 
 
 
 
 
 
 
 
 
bc86049
4d6dba6
 
 
999a847
 
7dffa57
 
 
 
 
 
 
 
 
 
 
 
 
 
50d7e6d
 
7dffa57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50d7e6d
 
7dffa57
 
 
50d7e6d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: apache-2.0
language:
- en
- fr
- de
- es
pipeline_tag: token-classification
---
**BibTexer** is a specialized language models trained by PleIAs for the structured extraction of bibliographies in a Bibtex format.

Bibtexer act like a reversed Zotero: given an unstructured list of references, the model will return a series of Bibtex entries that can be loaded in any bibliographic databases.

Like all models from PleIAs **Bad Data Toolbox**, BibTexer has been volontary trained on diverse and challenging data sources, covering nearly all the styles featured on Zotero, as well as examples of broken text sources (line jump, digitization artifact).

BibTexer has been trained on multilingual styles and formats and should work correctly on most European languages.

Along with Segmentext and Bibstyle-Detector, BibTexer can be tested on the [Reversed-Zotero](https://huggingface.co/spaces/PleIAs/Reversed-Zotero) space.

## Example

This copy-paste of unstructured references include unwelcome line jumps as well as a title not part of the original set:

> References
> 1. Postigo JAR. Leishmaniasis in the World Health Organization Eastern Mediterranean Region Int J Antimicrob Agents. 2010;36:S62-5.
> 2. Alvar J, Vélez ID, Bern C, Herrero M, Desjeux P, Cano J, et al. Leishmaniasis worldwide and global estimates of its incidence. PLoS One.
> 2012;7:35671.
> 3. World Health Organization. Control of the leishmaniases. World Health Organ Tech Rep Ser.
> 2010;7-8:1-186.
> 4. Wallace MR, Hale BR, Utz CC, Olson PE, Earhart KC, Thornton SA, et al. Endemic infectious diseases of Afghanistan. Clin Infect Dis. 2002;34:171-207.

After parsing by Segmentext, Bibtexer returns the following list of references:

```
@article{postigo2010,
  author = {Postigo JAR},
  title = {Leishmaniasis in the World Health Organization Eastern Mediterranean Region},
  journal = {Int J Antimicrob Agents},
  year = {2010},
  volume = {36},
  pages = {62-5}
}

@article{alvar2012,
  author = {Alvar J Vélez ID Bern C Herrero M Desjeux P Cano J et al},
  title = {Leishmaniasis worldwide and global estimates of its incidence},
  journal = {PLoS One},
  year = {2012},
  volume = {7},
  pages = {35671}
}

@article{world2010,
  author = {World Health Organization},
  title = {Control of the leishmaniases},
  journal = {World Health Organ Tech Rep Ser},
  year = {2010},
  volume = {78},
  pages = {1-186}
}

@article{wallace2002,
  author = {Wallace MR Hale BR Utz CC Olson PE Earhart KC Thornton SA et al},
  title = {Endemic infectious diseases of Afghanistan},
  journal = {Clin Infect Dis},
  year = {2002},
  volume = {34},
  pages = {171-207}
}
```

The reference can be straight exported to Zotero:

![Export Zotero](https://huggingface.co/PleIAs/BibTexer/resolve/main/zotero_export.jpg)