remiai3
/

t5-small-project-guide

Text Generation

Model card Files Files and versions

xet

Community

remiai3 commited on Jul 28, 2025

Commit

7783036

verified ·

1 Parent(s): f9396fc

Update document.txt

Browse files

Files changed (1) hide show

document.txt +138 -103

document.txt CHANGED Viewed

@@ -1,103 +1,138 @@
-T5-Small Project Guide
-=====================
-Welcome to the T5-Small Project Guide by RemiAI3, a free educational resource for students to learn AI model fine-tuning using Hugging Face's T5-small model. This project enables students to build a question-answering system, such as answering questions about the Chola Empire, using open-source tools.
-Objective
----------
-Our goal is to provide accessible AI resources for students to experiment with and learn from, promoting RemiAI3’s mission of democratizing AI education. This project is designed to be lightweight, avoiding the high costs of deploying large AI models like text-to-image generators.
-Prerequisites
--------------
-- Python Version: Python 3.9 or higher (recommended: 3.10)
-- Virtual Environment: Use `venv` to isolate dependencies
-- Hugging Face Account: Sign up at https://huggingface.co to get an access token
-- Dataset: A CSV or JSON file with question-answer pairs. Example JSON format:
-  ```json
-  [
-    {"input": "Who was the founder of the Chola Empire?", "response": "Vijayalaya Chola"},
-    {"input": "What was the main military force of the Cholas?", "response": "Well-organized army and navy"},
-    {"input": "What was a key administrative reform by the Cholas?", "response": "Efficient land revenue system"}
-  ]
-  ```
-  CSV format (if used):
-  ```csv
-  input,response
-  "Who was the founder of the Chola Empire?","Vijayalaya Chola"
-  "What was the main military force of the Cholas?","Well-organized army and navy"
-  ```
-Setup Instructions
-------------------
-1. Install Python: Download Python 3.10 from https://www.python.org/downloads/.
-2. Clone the Repository:
-   ```
-   git clone https://huggingface.co/remiai3/t5-small-project-guide
-   cd t5-small-project-guide
-   ```
-3. Create and Activate a Virtual Environment:
-   ```
-   python -m venv venv
-   source venv/bin/activate  # On Windows: venv\Scripts\activate
-   ```
-4. Install Dependencies:
-   ```
-   pip install -r requirements.txt
-   ```
-5. Prepare Your Dataset: Place your `dataset.csv` or `dataset.json` in the project folder.
-6. Set Hugging Face Token: Open `t5_project_all_in_one.py` and replace "YOUR_HUGGING_FACE_TOKEN" with your Hugging Face token.
-Running the Project
-------------------
-1. Fine-Tune the Model:
-   Run the all-in-one script to convert the dataset (if CSV), preprocess, download the model, and fine-tune:
-   ```
-   python t5_project_all_in_one.py
-   ```
-   This will:
-   - Convert CSV to JSON (if needed)
-   - Preprocess the dataset
-   - Download T5-small weights
-   - Fine-tune the model
-   - Save the fine-tuned model to `./finetuned_t5`
-   - Generate a plot of training and validation loss (`training_metrics.png`)
-Project Files
-------------
-- t5_project_all_in_one.py: Single script for dataset conversion, preprocessing, model downloading, and fine-tuning.
-- requirements.txt: Lists required Python libraries.
-- document.txt: This file with detailed instructions.
-- README.md: Model configuration and repo overview.
-Libraries and Versions
-----------------------
-- transformers==4.44.2
-- datasets==3.0.1
-- torch==2.4.1
-- pandas==2.2.3
-- matplotlib==3.9.2
-- accelerate==1.0.1
-- huggingface_hub==0.26.0
-Documentation
--------------
-- Hugging Face Transformers: https://huggingface.co/docs/transformers
-- Datasets Library: https://huggingface.co/docs/datasets
-- T5 Model: https://huggingface.co/docs/transformers/model_doc/t5
-- Pandas: https://pandas.pydata.org/docs
-- Matplotlib: https://matplotlib.org/stable/contents.html
-- Accelerate: https://huggingface.co/docs/accelerate
-Troubleshooting
----------------
-- Inaccurate Answers: Ensure your dataset has 500+ clean question-answer pairs. Increase `num_train_epochs` or `learning_rate` in `t5_project_all_in_one.py`.
-- Token Errors: Verify the Hugging Face token in `t5_project_all_in_one.py` is correct.
-- Library Issues: Reinstall dependencies with `pip install -r requirements.txt`.
-Contributing
-------------
-Fork the repository, make changes, and submit a pull request at https://huggingface.co/remiai3/t5-small-project-guide.
-About RemiAI3
--------------
-RemiAI3 is committed to providing free AI educational resources to empower students. By using this project, you’re helping promote our mission to build our brand for future AI innovations.

+T5-Small Project Guide
+=====================
+Welcome to the T5-Small Project Guide by RemiAI3, a free educational resource for students to learn AI model fine-tuning using
+Hugging Face's T5-small model. This project enables students to build a question-answering system, such as answering questions
+about the Chola Empire, using open-source tools.
+Objective
+---------
+Our goal is to provide accessible AI resources for students to experiment with and learn from, promoting RemiAI3’s mission of
+democratizing AI education. This project is designed to be lightweight, avoiding the high costs of deploying large AI models like
+text-to-image generators.
+Prerequisites
+-------------
+- Python Version: Python 3.10.9 - MUST USE THIS VERSION ONLY
+- Virtual Environment: Use `venv` to isolate dependencies
+- Hugging Face Account: Sign up at https://huggingface.co to get an access token
+   You can grt the access token by
+  1. Click on your Profile in the Hugging face
+  2. Scroll down to the buttom then you can see a section named as Access Token
+  3. Click on it and Enter your Hugging Face Password
+  4. Click on the create a new Token
+  5. Then you will redirect to the new page at there click on the write access
+  6. Click on the create Token if it displaye on the top is ok or then scroll the screen down then there you can a see a button create
+  7. Hit the create button then you will get your Hugging Face Token HF-TOKEN
+- Dataset: A CSV or JSON file with question-answer pairs. Example JSON format:
+  ```json
+  [
+    {"input": "Who was the founder of the Chola Empire?", "response": "Vijayalaya Chola"},
+    {"input": "What was the main military force of the Cholas?", "response": "Well-organized army and navy"},
+    {"input": "What was a key administrative reform by the Cholas?", "response": "Efficient land revenue system"}
+  ]
+  ```
+  CSV format (if used):
+  ```csv
+  input,response
+  "Who was the founder of the Chola Empire?","Vijayalaya Chola"
+  "What was the main military force of the Cholas?","Well-organized army and navy"
+  ```
+Setup Instructions
+------------------
+1. Install Python: Download Python 3.10.9 from https://www.python.org/downloads/.
+2. Clone the Repository:
+   ```
+   git clone https://huggingface.co/remiai3/t5-small-project-guide
+   cd t5-small-project-guide
+   ```
+3. Create and Activate a Virtual Environment:
+   ```
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+4. Install Dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+5. Prepare Your Dataset: Place your `dataset.csv` or `dataset.json` in the project folder.
+6. Set Hugging Face Token: Open `t5_project_all_in_one.py` and replace "YOUR_HUGGING_FACE_TOKEN" with your Hugging Face token.
+Running the Project
+------------------
+1. Fine-Tune the Model:
+   Run the all-in-one script to convert the dataset (if CSV), preprocess, download the model, and fine-tune:
+   ```
+   python t5_project_all_in_one.py
+   ```
+   This will:
+   - Convert CSV to JSON (if needed)
+   - Preprocess the dataset
+   - Download T5-small weights
+   - Fine-tune the model
+   - Save the fine-tuned model to `./finetuned_t5`
+   - Generate a plot of training and validation loss (`training_metrics.png`)
+Project Files
+------------
+- t5_project_all_in_one.py: Single script for dataset conversion, preprocessing, model downloading, and fine-tuning.
+- requirements.txt: Lists required Python libraries.
+- document.txt: This file with detailed instructions.
+- README.md: Model configuration and repo overview.
+Libraries and Versions
+----------------------
+- transformers==4.44.2
+- datasets==3.0.1
+- torch==2.4.1
+- pandas==2.2.3
+- matplotlib==3.9.2
+- accelerate==1.0.1
+- huggingface_hub==0.26.0
+Documentation
+-------------
+- Hugging Face Transformers: https://huggingface.co/docs/transformers
+- Datasets Library: https://huggingface.co/docs/datasets
+- T5 Model: https://huggingface.co/docs/transformers/model_doc/t5
+- Pandas: https://pandas.pydata.org/docs
+- Matplotlib: https://matplotlib.org/stable/contents.html
+- Accelerate: https://huggingface.co/docs/accelerate
+Troubleshooting
+---------------
+- Inaccurate Answers: Ensure your dataset has 500+ clean question-answer pairs. Increase `num_train_epochs` or `learning_rate` in `t5_project_all_in_one.py`.
+- Token Errors: Verify the Hugging Face token in `t5_project_all_in_one.py` is correct.
+- Library Issues: Reinstall dependencies with `pip install -r requirements.txt`.
+Contributing
+------------
+Fork the repository, make changes, and submit a pull request at https://huggingface.co/remiai3/t5-small-project-guide.
+About RemiAI3
+-------------
+RemiAI3 is committed to providing free AI educational resources to empower students. By using this project, you’re helping promote our
+mission to build our brand for future AI innovations.