Update README.md
Browse files
README.md
CHANGED
|
@@ -102,7 +102,7 @@ To enhance the model's capability in structured table reasoning, we conduct Supe
|
|
| 102 |
In order to fully leverage the reasoning capabilities of the trained model in tabular data scenarios, a structured data analysis-oriented workflow is proposed in this work. The workflow consists of 4 core components: table preprocessing, table sensing, tool-integrated reasoning and prompt engineering. These components form an end-to-end pipeline designed to enhance the model's ability to understand and reason over tabular datasets.
|
| 103 |
|
| 104 |
### Table Preprocessing
|
| 105 |
-
Before table analysis and reasoning, a table preprocessing is adopted in our workflow such that the input table gets clean, structured and properly formatted. Table preprocessing involves handling missing values, splitting merged cells,
|
| 106 |
|
| 107 |
### Table Sensing
|
| 108 |
Table sensing refers to the model’s contextual understanding of a table’s structure, semantics, and relationships. In this stage, column headers and sample rows of each table are provided to the model. During the table sensing stage, the model identifies the types of each column (e.g., categorical, numerical, textual), infers potential relationships among columns, and detects any implicit hierarchies or grouping patterns. It also involves the understanding of header semantics, including the disambiguation of abbreviations, units, and domain-specific terminology. By observing sample rows, the model gains insight into typical value ranges, formats, and anomalies, enabling it to develop a robust “sense” of the data context.
|
|
@@ -179,11 +179,11 @@ print(response)
|
|
| 179 |
</p>
|
| 180 |
<ul>
|
| 181 |
<li>
|
| 182 |
-
|
| 183 |
covering six core capabilities and 26 subtasks to support diverse table reasoning evaluations.
|
| 184 |
</li>
|
| 185 |
<li>
|
| 186 |
-
|
| 187 |
</li>
|
| 188 |
</ul>
|
| 189 |
|
|
|
|
| 102 |
In order to fully leverage the reasoning capabilities of the trained model in tabular data scenarios, a structured data analysis-oriented workflow is proposed in this work. The workflow consists of 4 core components: table preprocessing, table sensing, tool-integrated reasoning and prompt engineering. These components form an end-to-end pipeline designed to enhance the model's ability to understand and reason over tabular datasets.
|
| 103 |
|
| 104 |
### Table Preprocessing
|
| 105 |
+
Before table analysis and reasoning, a table preprocessing is adopted in our workflow such that the input table gets clean, structured and properly formatted. Table preprocessing involves handling missing values, splitting merged cells, standardizing column headers, and identifying column headers. After the preprocessing, the table is transformed into a normalized structure that can facilitate downstream table understanding and reasoning by the model. Moreover, the structured format reduces ambiguity in table layout and ensures consistent alignment between natural language queries and the corresponding data fields.
|
| 106 |
|
| 107 |
### Table Sensing
|
| 108 |
Table sensing refers to the model’s contextual understanding of a table’s structure, semantics, and relationships. In this stage, column headers and sample rows of each table are provided to the model. During the table sensing stage, the model identifies the types of each column (e.g., categorical, numerical, textual), infers potential relationships among columns, and detects any implicit hierarchies or grouping patterns. It also involves the understanding of header semantics, including the disambiguation of abbreviations, units, and domain-specific terminology. By observing sample rows, the model gains insight into typical value ranges, formats, and anomalies, enabling it to develop a robust “sense” of the data context.
|
|
|
|
| 179 |
</p>
|
| 180 |
<ul>
|
| 181 |
<li>
|
| 182 |
+
An open-source <a href="https://huggingface.co/datasets/JT-LM/JIUTIAN-TReB">dataset</a> combining cleaned public benchmarks, real-world web tables, and proprietary data,
|
| 183 |
covering six core capabilities and 26 subtasks to support diverse table reasoning evaluations.
|
| 184 |
</li>
|
| 185 |
<li>
|
| 186 |
+
An open-source <a href="https://github.com/JT-LM/jiutian-treb">framework code</a> specifically designed to evaluate LLM performance on table reasoning tasks. It integrates diverse inference modes and reliable metrics, enabling precise and multi-dimensional evaluations.
|
| 187 |
</li>
|
| 188 |
</ul>
|
| 189 |
|