Improve language tag (#1)

Browse files

- Improve language tag (abe9f207fd865dfd4ecf33d718510c316a0f8a34)

Co-authored-by: Loïck BOURDOIS <lbourdois@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +145 -134

README.md CHANGED Viewed

@@ -1,135 +1,146 @@
----
-license: mit
-datasets:
-- bkai-foundation-models/vi-alpaca-input-output-format
-- CausalLM/GPT-4-Self-Instruct-Japanese
-language:
-- vi
-- ja
-base_model:
-- Qwen/Qwen2.5-1.5B-Instruct
-pipeline_tag: question-answering
-library_name: transformers
----
-# Multilingual Question-Answering Model (Vietnamese and Japanese)
-## Overview
-This repository contains a fine-tuned multilingual question-answering model that supports both **Vietnamese** and **Japanese**. Built on top of the **Qwen/Qwen2.5-1.5B-Instruct** base model, this model leverages advanced transformer architectures to provide high-quality answers in both languages.
-The model has been fine-tuned using datasets such as:
-- **bkai-foundation-models/vi-alpaca-input-output-format**: A Vietnamese dataset designed for instruction-based input-output tasks.
-- **CausalLM/GPT-4-Self-Instruct-Japanese**: A Japanese dataset created with self-instruct techniques to improve language understanding and generation.
-This model is ideal for applications requiring cross-lingual support between Vietnamese and Japanese.
----
-## License
-This project is released under the **MIT License**, ensuring flexibility for both academic and commercial use. Please refer to the `LICENSE` file for more details.
----
-## Model Details
-### Base Model
-- **Qwen/Qwen2.5-1.5B-Instruct**: A powerful 1.5B parameter instruction-tuned model developed by Alibaba Cloud. It excels in understanding and generating natural language across various domains.
-### Supported Languages
-- **Vietnamese (vi)**
-- **Japanese (ja)**
-### Pipeline Tag
-- **Question-Answering**: The model is optimized for answering questions in both supported languages.
-### Library
-- **Transformers**: This model is built using the Hugging Face `transformers` library, making it easy to integrate into existing pipelines.
----
-## Installation
-To use this model, ensure you have the `transformers` library installed:
-```bash
-pip install transformers
-```
-You can then load the model directly from the Hugging Face Hub:
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-# Load the tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained("haiFrHust/VNJPTranslate_base")
-model = AutoModelForCausalLM.from_pretrained("haiFrHust/VNJPTranslate_base")
-# Example usage
-input_text = "質問: ベトナムの首都はどこですか？"  # Japanese: What is the capital of Vietnam?
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(**inputs)
-answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(answer)
-```
----
-## Dataset Information
-### Vietnamese Dataset
-- **Name**: `bkai-foundation-models/vi-alpaca-input-output-format`
-- **Description**: This dataset contains instruction-based input-output pairs in Vietnamese, enabling the model to understand and respond to structured queries effectively.
-### Japanese Dataset
-- **Name**: `CausalLM/GPT-4-Self-Instruct-Japanese`
-- **Description**: A self-instruct dataset in Japanese, designed to enhance the model's ability to generate accurate and contextually relevant responses.
----
-## Use Cases
-This model is suitable for a variety of applications, including but not limited to:
-- **Cross-Lingual Customer Support**: Answering user queries in both Vietnamese and Japanese.
-- **Educational Tools**: Assisting students in learning and understanding concepts in their native language.
-- **Multilingual Chatbots**: Building conversational agents capable of handling multiple languages seamlessly.
----
-## Performance
-The model demonstrates strong performance in both Vietnamese and Japanese, thanks to the high-quality datasets and the robust base model. However, performance may vary depending on the complexity of the questions and the domain-specific knowledge required.
-For optimal results:
-- Ensure your input questions are clear and concise.
-- Fine-tune the model further on domain-specific data if necessary.
----
-## Contributions
-Contributions to this project are welcome! If you have ideas for improvements, encounter issues, or wish to contribute additional datasets, please open an issue or submit a pull request.
----
-## Acknowledgments
-We would like to thank the following organizations and contributors:
-- **Alibaba Cloud** for providing the Qwen base model.
-- The creators of the `bkai-foundation-models/vi-alpaca-input-output-format` and `CausalLM/GPT-4-Self-Instruct-Japanese` datasets.
-- The Hugging Face community for their excellent `transformers` library and support.
----
-## Contact
-For any inquiries or feedback, feel free to reach out to us via:
-- Email: [hai.ph225715@sis.hust.edu.vn]
-- GitHub Issues: Open an issue in this repository.
----
 Thank you for using our multilingual question-answering model! We hope it serves your needs effectively.

+---
+license: mit
+datasets:
+- bkai-foundation-models/vi-alpaca-input-output-format
+- CausalLM/GPT-4-Self-Instruct-Japanese
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+base_model:
+- Qwen/Qwen2.5-1.5B-Instruct
+pipeline_tag: question-answering
+library_name: transformers
+---
+# Multilingual Question-Answering Model (Vietnamese and Japanese)
+## Overview
+This repository contains a fine-tuned multilingual question-answering model that supports both **Vietnamese** and **Japanese**. Built on top of the **Qwen/Qwen2.5-1.5B-Instruct** base model, this model leverages advanced transformer architectures to provide high-quality answers in both languages.
+The model has been fine-tuned using datasets such as:
+- **bkai-foundation-models/vi-alpaca-input-output-format**: A Vietnamese dataset designed for instruction-based input-output tasks.
+- **CausalLM/GPT-4-Self-Instruct-Japanese**: A Japanese dataset created with self-instruct techniques to improve language understanding and generation.
+This model is ideal for applications requiring cross-lingual support between Vietnamese and Japanese.
+---
+## License
+This project is released under the **MIT License**, ensuring flexibility for both academic and commercial use. Please refer to the `LICENSE` file for more details.
+---
+## Model Details
+### Base Model
+- **Qwen/Qwen2.5-1.5B-Instruct**: A powerful 1.5B parameter instruction-tuned model developed by Alibaba Cloud. It excels in understanding and generating natural language across various domains.
+### Supported Languages
+- **Vietnamese (vi)**
+- **Japanese (ja)**
+### Pipeline Tag
+- **Question-Answering**: The model is optimized for answering questions in both supported languages.
+### Library
+- **Transformers**: This model is built using the Hugging Face `transformers` library, making it easy to integrate into existing pipelines.
+---
+## Installation
+To use this model, ensure you have the `transformers` library installed:
+```bash
+pip install transformers
+```
+You can then load the model directly from the Hugging Face Hub:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("haiFrHust/VNJPTranslate_base")
+model = AutoModelForCausalLM.from_pretrained("haiFrHust/VNJPTranslate_base")
+# Example usage
+input_text = "質問: ベトナムの首都はどこですか？"  # Japanese: What is the capital of Vietnam?
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs)
+answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(answer)
+```
+---
+## Dataset Information
+### Vietnamese Dataset
+- **Name**: `bkai-foundation-models/vi-alpaca-input-output-format`
+- **Description**: This dataset contains instruction-based input-output pairs in Vietnamese, enabling the model to understand and respond to structured queries effectively.
+### Japanese Dataset
+- **Name**: `CausalLM/GPT-4-Self-Instruct-Japanese`
+- **Description**: A self-instruct dataset in Japanese, designed to enhance the model's ability to generate accurate and contextually relevant responses.
+---
+## Use Cases
+This model is suitable for a variety of applications, including but not limited to:
+- **Cross-Lingual Customer Support**: Answering user queries in both Vietnamese and Japanese.
+- **Educational Tools**: Assisting students in learning and understanding concepts in their native language.
+- **Multilingual Chatbots**: Building conversational agents capable of handling multiple languages seamlessly.
+---
+## Performance
+The model demonstrates strong performance in both Vietnamese and Japanese, thanks to the high-quality datasets and the robust base model. However, performance may vary depending on the complexity of the questions and the domain-specific knowledge required.
+For optimal results:
+- Ensure your input questions are clear and concise.
+- Fine-tune the model further on domain-specific data if necessary.
+---
+## Contributions
+Contributions to this project are welcome! If you have ideas for improvements, encounter issues, or wish to contribute additional datasets, please open an issue or submit a pull request.
+---
+## Acknowledgments
+We would like to thank the following organizations and contributors:
+- **Alibaba Cloud** for providing the Qwen base model.
+- The creators of the `bkai-foundation-models/vi-alpaca-input-output-format` and `CausalLM/GPT-4-Self-Instruct-Japanese` datasets.
+- The Hugging Face community for their excellent `transformers` library and support.
+---
+## Contact
+For any inquiries or feedback, feel free to reach out to us via:
+- Email: [hai.ph225715@sis.hust.edu.vn]
+- GitHub Issues: Open an issue in this repository.
+---
 Thank you for using our multilingual question-answering model! We hope it serves your needs effectively.