kostyabuh21
/

DistilBART_forLaTeX

Text Generation

text2text-generation

Model card Files Files and versions

kostyabuh21 commited on Jun 9, 2024

Commit

2c05fa2

·

verified ·

1 Parent(s): f94fa54

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: mit
----

+---
+license: mit
+language:
+- ru
+library_name: transformers
+pipeline_tag: text2text-generation
+tags:
+- math
+- normalization
+---
+### Описание:
+Модель для преобразование стиля и восстановление разметки для образовательных математических текстов в формат LaTeX.
+Модель является дообученной на переведённом&аугментированном датасете "[Mathematics Stack Exchange API Q&A Data](https://zenodo.org/records/1414384)" версией модели [sshleifer/distilbart-cnn-12-6 ](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
+Пример использования:
+---
+Usage example:
+---
+``` python
+import torch
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+from IPython.display import display, Math, Latex
+model_dir = "kostyabuh21/DistilBART_forLaTeX "
+model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
+tokenizer = AutoTokenizer.from_pretrained(model_dir)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model.to(device)
+def get_latex(text):
+    inputs = tokenizer(text, return_tensors='pt').to(device)
+    with torch.no_grad():
+        hypotheses = model.generate(
+            **inputs,
+            do_sample=True,
+            top_p=0.95,
+            num_return_sequences=1,
+            repetition_penalty=1.2,
+            max_length=len(text),
+            temperature=0.6,
+            min_length=10,
+            length_penalty=1.0,
+            no_repeat_ngram_size=2
+        )
+    for h in hypotheses:
+        display(Latex(tokenizer.decode(h, skip_special_tokens=True)))
+        print(tokenizer.decode(h, skip_special_tokens=True))
+text = 'интеграл от 3 до 5 по икс dx'
+get_latex(text)
+```