--- license: apache-2.0 language: - en - zh metrics: - accuracy library_name: transformers tags: - text2sql --- # text2sql-8b-instruct-v1 ## 1. Summary it is a natural language-to-SQL conversion model optimized specifically for Chinese and English users. It is based on the llama-3-chinese-8b-instruct-v3 model. We used the latest optimization algorithms to improve the performance of the model, especially in handling complex queries and multi-table joins. ### 1.1 characteristics - Bilingual support: Ability to handle natural language queries in both Chinese and English languages. - High accuracy: After a large number of tests on actual database queries, it has been proved that the SQL statements generated have high accuracy. ### 1.2 training data Training data for the model comes from multiple sources, including: - Open source databases (such as WikiSQL, Spider) - Internally generated dataset covering a variety of query types and complexities - User feedback data for continuous improvement of model performance Training data is strictly screened and cleaned to ensure data quality and diversity. ### 1.3 test results Test results on multiple benchmark datasets show the model exceeds other existing models in terms of accuracy and generation efficiency. For example: - On the WikiSQL dataset, the model achieved an execution accuracy rate of 87.5%. - On the Spider dataset, the model achieved an execution accuracy rate of 95.3%. These results show the model has significant advantages in handling complex queries and multi-table joins. ## 2. Usage: Please upgrade the `transformers` package to ensure it supports Llama3 models. The current version we are using is `4.41.2`. ```python # Use a pipeline as a high-level helper from transformers import pipeline import torch model_id = "xbrain/text2sql-8b-instruct-v1" messages = [ {"role": "system", "content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\n database contains tables such as table_name_30. Table table_name_30 has columns such as nfl_team, draft_year."}, {"role": "user", "content": "###Input:\nIn 1978 what is the NFL team?\n\n###Response:"}, ] pipe_msg = pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto",) outputs = pipe_msg( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1]) ``` ## 3. Ethical Considerations While fine-tuned for text to sql, this model inherits the ethical considerations of the base Llama 3 model. Use responsibly and implement additional safeguards as needed for your application. ## 4. Availability The model is available through: - [Hugging Face](https://huggingface.co/xbrain/text2sql-8b-instruct-v1) For full details on responsible use, ethical considerations, and latest benchmarks, please refer to the [official Llama 3 documentation](https://llama.meta.com/).