| | ---
|
| | license: apache-2.0
|
| | datasets:
|
| | - google/wiki40b
|
| | language:
|
| | - zh
|
| | base_model:
|
| | - openai-community/gpt2
|
| | ---
|
| |
|
| | # Dorami
|
| |
|
| | A GPT-based pretrained model using the BERT Tokenizer
|
| |
|
| | ## Model description
|
| |
|
| | ### Training data
|
| |
|
| | [google/wiki40b](https://huggingface.co/datasets/google/wiki40b)
|
| |
|
| | ### Training code
|
| |
|
| | [dorami](https://github.com/6zeus/dorami.git)
|
| |
|
| | ## How to use
|
| |
|
| | ### 1. Download model from Hugging Face Hub to local
|
| |
|
| | ```
|
| | git lfs install
|
| | git clone https://huggingface.co/lucky2me/Dorami
|
| | ```
|
| |
|
| | ### 2. Use the model downloaded above
|
| | ```python
|
| | import torch
|
| | from transformers import AutoTokenizer,AutoModelForCausalLM
|
| | model_path = "The path of the model downloaded above"
|
| | tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| | model = AutoModelForCausalLM.from_pretrained(model_path)
|
| | text = "fill in any text you like."
|
| | encoded_input = tokenizer(text, return_tensors='pt')
|
| | output = model(**encoded_input)
|
| | predicted_token_id = torch.argmax(output.logits[:, -1, :], dim=-1)
|
| | decoded_text = tokenizer.decode(predicted_token_id, skip_special_tokens=True)
|
| | print("decoded text:",decoded_text)
|
| | ``` |