yahya-khoder
commited on
Commit
·
4c93e55
1
Parent(s):
dd0dadd
update README
Browse files
README.md
CHANGED
|
@@ -1,3 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# MD2JSON-T5-V1: Text-to-JSON Converter with T5
|
| 2 |
|
| 3 |
This model utilizes the **T5 (Text-to-Text Transfer Transformer)** architecture to convert text strings into valid JSON objects. It is designed to take structured text and transform it into a JSON object.
|
|
@@ -69,3 +85,49 @@ To use the model and perform inference, follow the steps below:
|
|
| 69 |
```bash
|
| 70 |
pip install torch transformers datasets
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- text-to-json
|
| 5 |
+
- t5
|
| 6 |
+
- seq2seq
|
| 7 |
+
- text-generation
|
| 8 |
+
- json-conversion
|
| 9 |
+
- machine-learning
|
| 10 |
+
- nlp
|
| 11 |
+
base_model: t5-small
|
| 12 |
+
model_name: MD2JSON-T5-V1
|
| 13 |
+
version: V1
|
| 14 |
+
author: yahyakhoder
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
# MD2JSON-T5-V1: Text-to-JSON Converter with T5
|
| 18 |
|
| 19 |
This model utilizes the **T5 (Text-to-Text Transfer Transformer)** architecture to convert text strings into valid JSON objects. It is designed to take structured text and transform it into a JSON object.
|
|
|
|
| 85 |
```bash
|
| 86 |
pip install torch transformers datasets
|
| 87 |
|
| 88 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 89 |
+
import torch
|
| 90 |
+
import json
|
| 91 |
+
|
| 92 |
+
# Load the tokenizer and model
|
| 93 |
+
model_name = "yahyakhoder/MD2JSON-T5-V1" # Replace with your Hugging Face model path
|
| 94 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 95 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
|
| 96 |
+
|
| 97 |
+
# Example Input
|
| 98 |
+
input_text = """#firstname: John
|
| 99 |
+
#lastname: Doe
|
| 100 |
+
#age: 30
|
| 101 |
+
#married: true
|
| 102 |
+
#hobbies: ["gaming", "running"]
|
| 103 |
+
#address: {"city": "Berlin", "zipcode": 10115}
|
| 104 |
+
#url: "https://example.com" """
|
| 105 |
+
|
| 106 |
+
# Tokenize and generate the output
|
| 107 |
+
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=256)
|
| 108 |
+
outputs = model.generate(**inputs, max_length=256, num_beams=4, early_stopping=True)
|
| 109 |
+
|
| 110 |
+
# Decode and convert to JSON
|
| 111 |
+
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 112 |
+
try:
|
| 113 |
+
output_json = json.loads(result)
|
| 114 |
+
print(json.dumps(output_json, indent=2, ensure_ascii=False))
|
| 115 |
+
except json.JSONDecodeError:
|
| 116 |
+
print("Error during JSON conversion")
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
### Summary of Changes:
|
| 121 |
+
|
| 122 |
+
1. The **YAML metadata** section at the beginning of the file includes:
|
| 123 |
+
- **license**: `apache-2.0`
|
| 124 |
+
- **tags**: Relevant keywords like `text-to-json`, `t5`, `seq2seq`, `json-conversion`, etc.
|
| 125 |
+
- **base_model**: `t5-small`
|
| 126 |
+
- **model_name**: `MD2JSON-T5-V1`
|
| 127 |
+
- **version**: `V1`
|
| 128 |
+
- **author**: `yahyakhoder`
|
| 129 |
+
|
| 130 |
+
2. **Model path** in the code (under `model_name` variable) is updated to `yahyakhoder/MD2JSON-T5-V1` to reflect your Hugging Face username and model name.
|
| 131 |
+
|
| 132 |
+
This should resolve the YAML metadata warning and provide all the necessary information for users accessing your model on Hugging Face.
|
| 133 |
+
|