MD2JSON-T5-small-V1 / README.md
yahya-khoder
update README
4c93e55
metadata
license: apache-2.0
tags:
  - text-to-json
  - t5
  - seq2seq
  - text-generation
  - json-conversion
  - machine-learning
  - nlp
base_model: t5-small
model_name: MD2JSON-T5-V1
version: V1
author: yahyakhoder

MD2JSON-T5-V1: Text-to-JSON Converter with T5

This model utilizes the T5 (Text-to-Text Transfer Transformer) architecture to convert text strings into valid JSON objects. It is designed to take structured text and transform it into a JSON object.

Description

The MD2JSON-T5-V1 model is trained to interpret text strings where keys and values are separated by a colon (e.g., #firstname: John), and then convert them into a valid JSON object. This model can be used for a wide range of tasks where converting text to JSON is required.

Example Input:

  • Input:

    #firstname: John
    #lastname: Doe
    #age: 30
    #married: true
    #hobbies: ["gaming", "running"]
    #address: {"city": "Berlin", "zipcode": 10115}
    #url: "https://example.com"
    
  • Generated JSON Output:

    {
        "firstname": "John",
        "lastname": "Doe",
        "age": 30,
        "married": true,
        "hobbies": ["gaming", "running"],
        "address": {
            "city": "Berlin",
            "zipcode": 10115
        },
        "url": "https://example.com"
    }
    

Another Example:

  • Input:

    #name: Charlie
    #age: 29
    #isStudent: true
    #skills: ["Java", "Machine Learning"]
    #profile: {"github": "charlie29", "linkedin": "charlie-linkedin"}
    #height: 172.3
    
  • Generated JSON Output:

    {
        "name": "Charlie",
        "age": 29,
        "isStudent": true,
        "skills": ["Java", "Machine Learning"],
        "profile": {
            "github": "charlie29",
            "linkedin": "charlie-linkedin"
        },
        "height": 172.3
    }
    

Load the Model

To use the model and perform inference, follow the steps below:

Install Dependencies

pip install torch transformers datasets

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import json

# Load the tokenizer and model
model_name = "yahyakhoder/MD2JSON-T5-V1"  # Replace with your Hugging Face model path
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Example Input
input_text = """#firstname: John
#lastname: Doe
#age: 30
#married: true
#hobbies: ["gaming", "running"]
#address: {"city": "Berlin", "zipcode": 10115}
#url: "https://example.com" """

# Tokenize and generate the output
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=256)
outputs = model.generate(**inputs, max_length=256, num_beams=4, early_stopping=True)

# Decode and convert to JSON
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
try:
    output_json = json.loads(result)
    print(json.dumps(output_json, indent=2, ensure_ascii=False))
except json.JSONDecodeError:
    print("Error during JSON conversion")



### Summary of Changes:

1. The **YAML metadata** section at the beginning of the file includes:
   - **license**: `apache-2.0`
   - **tags**: Relevant keywords like `text-to-json`, `t5`, `seq2seq`, `json-conversion`, etc.
   - **base_model**: `t5-small`
   - **model_name**: `MD2JSON-T5-V1`
   - **version**: `V1`
   - **author**: `yahyakhoder`

2. **Model path** in the code (under `model_name` variable) is updated to `yahyakhoder/MD2JSON-T5-V1` to reflect your Hugging Face username and model name.

This should resolve the YAML metadata warning and provide all the necessary information for users accessing your model on Hugging Face.