yahyakhoder
/

MD2JSON-T5-small-V1

Text Generation

json-conversion

machine-learning

Model card Files Files and versions

MD2JSON-T5-small-V1 / README.md

yahya-khoder

update README

4c93e55 about 1 year ago

|

history blame contribute delete

3.7 kB

	---
	license: apache-2.0
	tags:
	- text-to-json
	- t5
	- seq2seq
	- text-generation
	- json-conversion
	- machine-learning
	- nlp
	base_model: t5-small
	model_name: MD2JSON-T5-V1
	version: V1
	author: yahyakhoder
	---

	# MD2JSON-T5-V1: Text-to-JSON Converter with T5

	This model utilizes the T5 (Text-to-Text Transfer Transformer) architecture to convert text strings into valid JSON objects. It is designed to take structured text and transform it into a JSON object.

	## Description

	The MD2JSON-T5-V1 model is trained to interpret text strings where keys and values are separated by a colon (e.g., `#firstname: John`), and then convert them into a valid JSON object. This model can be used for a wide range of tasks where converting text to JSON is required.

	### Example Input:
	- Input:
	```text
	#firstname: John
	#lastname: Doe
	#age: 30
	#married: true
	#hobbies: ["gaming", "running"]
	#address: {"city": "Berlin", "zipcode": 10115}
	#url: "https://example.com"
	```

	- Generated JSON Output:
	```json
	{
	"firstname": "John",
	"lastname": "Doe",
	"age": 30,
	"married": true,
	"hobbies": ["gaming", "running"],
	"address": {
	"city": "Berlin",
	"zipcode": 10115
	},
	"url": "https://example.com"
	}
	```

	### Another Example:
	- Input:
	```text
	#name: Charlie
	#age: 29
	#isStudent: true
	#skills: ["Java", "Machine Learning"]
	#profile: {"github": "charlie29", "linkedin": "charlie-linkedin"}
	#height: 172.3
	```

	- Generated JSON Output:
	```json
	{
	"name": "Charlie",
	"age": 29,
	"isStudent": true,
	"skills": ["Java", "Machine Learning"],
	"profile": {
	"github": "charlie29",
	"linkedin": "charlie-linkedin"
	},
	"height": 172.3
	}
	```

	## Load the Model

	To use the model and perform inference, follow the steps below:

	### Install Dependencies

	```bash
	pip install torch transformers datasets

	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import torch
	import json

	# Load the tokenizer and model
	model_name = "yahyakhoder/MD2JSON-T5-V1" # Replace with your Hugging Face model path
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

	# Example Input
	input_text = """#firstname: John
	#lastname: Doe
	#age: 30
	#married: true
	#hobbies: ["gaming", "running"]
	#address: {"city": "Berlin", "zipcode": 10115}
	#url: "https://example.com" """

	# Tokenize and generate the output
	inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=256)
	outputs = model.generate(**inputs, max_length=256, num_beams=4, early_stopping=True)

	# Decode and convert to JSON
	result = tokenizer.decode(outputs[0], skip_special_tokens=True)
	try:
	output_json = json.loads(result)
	print(json.dumps(output_json, indent=2, ensure_ascii=False))
	except json.JSONDecodeError:
	print("Error during JSON conversion")



	### Summary of Changes:

	1. The YAML metadata section at the beginning of the file includes:
	- license: `apache-2.0`
	- tags: Relevant keywords like `text-to-json`, `t5`, `seq2seq`, `json-conversion`, etc.
	- base_model: `t5-small`
	- model_name: `MD2JSON-T5-V1`
	- version: `V1`
	- author: `yahyakhoder`

	2. Model path in the code (under `model_name` variable) is updated to `yahyakhoder/MD2JSON-T5-V1` to reflect your Hugging Face username and model name.

	This should resolve the YAML metadata warning and provide all the necessary information for users accessing your model on Hugging Face.