gair-prox
/

math-doc-refining-lm

Text Generation

text-generation-inference

Model card Files Files and versions

math-doc-refining-lm / README.md

koalazf99's picture

Update README.md

ff506d2 verified over 1 year ago

|

history blame contribute delete

965 Bytes

	---
	license: apache-2.0
	datasets:
	- gair-prox/RedPajama-pro
	language:
	- en
	base_model:
	- gair-prox/RedPJ-ProX-0.7B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- llama
	- code
	---

	# Math-doc-refining-lm

	<p align="center">
	<img src="prox-teaser.png">
	</p>

	[ArXiv](http://arxiv.org/abs/2409.17115) \| [Code](https://github.com/GAIR-NLP/program-every-example)

	Math-doc-refining-lm is an adapted [0.7B-ProX](https://huggingface.co/gair-prox/RedPJ-ProX-0.7B) model, fine-tuned for doc level refining via program generation, and can be applied over math pre-training corpus such as open-web-math.

	<p align="center">
	<img src="func_design.png">
	</p>

	### Citation
	```
	@article{zhou2024programming,
	title={Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale},
	author={Zhou, Fan and Wang, Zengzhi and Liu, Qian and Li, Junlong and Liu, Pengfei},
	journal={arXiv preprint arXiv:2409.17115},
	year={2024}
	}
	```