| | --- |
| | library_name: transformers |
| | tags: [] |
| | --- |
| | |
| | # Model Card for TRPaliGemma |
| |
|
| | This model is fine-tuned PaliGemma model for the Table recognition task. |
| | <!-- Provide a quick summary of what the model is/does. --> |
| |
|
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | Table recognition is a branch of Document AI. |
| | In the existing Table recognition, the structure of the table and the OCR results were calculated and combined, respectively. |
| | For this reason, unnecessary predictions are sometimes made in the process of parsing the table.(ex. bbox) |
| | Using VLM, the structure and text of the table will be predicted at the same time, eliminating unnecessary predictions and integrating the two tasks into one. |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| | This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
| |
|
| | - **Developed by:** Seokhyun Choi |
| | - **Funded by [optional]:** [More Information Needed] |
| | - **Shared by [optional]:** [More Information Needed] |
| | - **Model type:** Vision Language Model |
| | - **Language(s) (NLP):** English |
| | - **License:** [More Information Needed] |
| | - **Finetuned from model [optional]:** PaliGemma |
| |
|
| | ### Model Sources [optional] |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository:** [More Information Needed] |
| | - **Paper [optional]:** [More Information Needed] |
| | - **Demo [optional]:** [More Information Needed] |
| |
|
| | ## Uses |
| |
|
| | <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
| |
|
| | ### Direct Use |
| |
|
| | <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
| |
|
| | This model can convert a tabular images into HTML. |
| |
|
| | ### Downstream Use [optional] |
| |
|
| | <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
| |
|
| | It can be used in document automation systems using Document AI. |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
| |
|
| | This is a fine-tuned model with only the tabular images that exist within the PDF, so you won't get good performance in the tabular images in the wild. |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | <!-- This section is meant to convey both technical and sociotechnical limitations. --> |
| |
|
| | This model simply converts table images into HTML. |
| | To gain additional analysis or knowledge, |
| | you need to learn an NLP model for analysis using HTML or fine-tune the new PaliGemma model by constructing new data. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | inference : https://www.kaggle.com/code/mldlchoidh/tr-inference |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
| |
|
| | Pubtables1-1M |
| |
|