| T5-Small Project Guide | |
| ===================== | |
| Welcome to the T5-Small Project Guide by RemiAI3, a free educational resource for students to learn AI model fine-tuning using | |
| Hugging Face's T5-small model. This project enables students to build a question-answering system, such as answering questions | |
| about the Chola Empire, using open-source tools. | |
| Objective | |
| --------- | |
| Our goal is to provide accessible AI resources for students to experiment with and learn from, promoting RemiAI3’s mission of | |
| democratizing AI education. This project is designed to be lightweight, avoiding the high costs of deploying large AI models like | |
| text-to-image generators. | |
| Prerequisites | |
| ------------- | |
| - Python Version: Python 3.10.9 - MUST USE THIS VERSION ONLY | |
| - Virtual Environment: Use `venv` to isolate dependencies | |
| - Hugging Face Account: Sign up at https://huggingface.co to get an access token | |
| You can grt the access token by | |
| 1. Click on your Profile in the Hugging face | |
| 2. Scroll down to the buttom then you can see a section named as Access Token | |
| 3. Click on it and Enter your Hugging Face Password | |
| 4. Click on the create a new Token | |
| 5. Then you will redirect to the new page at there click on the write access | |
| 6. Click on the create Token if it displaye on the top is ok or then scroll the screen down then there you can a see a button create | |
| 7. Hit the create button then you will get your Hugging Face Token HF-TOKEN | |
| - Dataset: A CSV or JSON file with question-answer pairs. Example JSON format: | |
| ```json | |
| [ | |
| {"input": "Who was the founder of the Chola Empire?", "response": "Vijayalaya Chola"}, | |
| {"input": "What was the main military force of the Cholas?", "response": "Well-organized army and navy"}, | |
| {"input": "What was a key administrative reform by the Cholas?", "response": "Efficient land revenue system"} | |
| ] | |
| ``` | |
| CSV format (if used): | |
| ```csv | |
| input,response | |
| "Who was the founder of the Chola Empire?","Vijayalaya Chola" | |
| "What was the main military force of the Cholas?","Well-organized army and navy" | |
| ``` | |
| Setup Instructions | |
| ------------------ | |
| 1. Install Python: Download Python 3.10.9 from https://www.python.org/downloads/. | |
| 2. Clone the Repository: | |
| ``` | |
| git clone https://huggingface.co/remiai3/t5-small-project-guide | |
| cd t5-small-project-guide | |
| ``` | |
| 3. Create and Activate a Virtual Environment: | |
| ``` | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| ``` | |
| 4. Install Dependencies: | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| 5. Prepare Your Dataset: Place your `dataset.csv` or `dataset.json` in the project folder. | |
| 6. Set Hugging Face Token: Open `t5_project_all_in_one.py` and replace "YOUR_HUGGING_FACE_TOKEN" with your Hugging Face token. | |
| Running the Project | |
| ------------------ | |
| 1. Fine-Tune the Model: | |
| Run the all-in-one script to convert the dataset (if CSV), preprocess, download the model, and fine-tune: | |
| ``` | |
| python t5_project_all_in_one.py | |
| ``` | |
| This will: | |
| - Convert CSV to JSON (if needed) | |
| - Preprocess the dataset | |
| - Download T5-small weights | |
| - Fine-tune the model | |
| - Save the fine-tuned model to `./finetuned_t5` | |
| - Generate a plot of training and validation loss (`training_metrics.png`) | |
| Project Files | |
| ------------ | |
| - t5_project_all_in_one.py: Single script for dataset conversion, preprocessing, model downloading, and fine-tuning. | |
| - requirements.txt: Lists required Python libraries. | |
| - document.txt: This file with detailed instructions. | |
| - README.md: Model configuration and repo overview. | |
| Libraries and Versions | |
| ---------------------- | |
| - transformers==4.44.2 | |
| - datasets==3.0.1 | |
| - torch==2.4.1 | |
| - pandas==2.2.3 | |
| - matplotlib==3.9.2 | |
| - accelerate==1.0.1 | |
| - huggingface_hub==0.26.0 | |
| Documentation | |
| ------------- | |
| - Hugging Face Transformers: https://huggingface.co/docs/transformers | |
| - Datasets Library: https://huggingface.co/docs/datasets | |
| - T5 Model: https://huggingface.co/docs/transformers/model_doc/t5 | |
| - Pandas: https://pandas.pydata.org/docs | |
| - Matplotlib: https://matplotlib.org/stable/contents.html | |
| - Accelerate: https://huggingface.co/docs/accelerate | |
| Troubleshooting | |
| --------------- | |
| - Inaccurate Answers: Ensure your dataset has 500+ clean question-answer pairs. Increase `num_train_epochs` or `learning_rate` in `t5_project_all_in_one.py`. | |
| - Token Errors: Verify the Hugging Face token in `t5_project_all_in_one.py` is correct. | |
| - Library Issues: Reinstall dependencies with `pip install -r requirements.txt`. | |
| Contributing | |
| ------------ | |
| Fork the repository, make changes, and submit a pull request at https://huggingface.co/remiai3/t5-small-project-guide. | |
| About RemiAI3 | |
| ------------- | |
| RemiAI3 is committed to providing free AI educational resources to empower students. By using this project, you’re helping promote our | |
| mission to build our brand for future AI innovations. | |