doberst commited on
Commit
e8b0a17
·
verified ·
1 Parent(s): f2508e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -38
README.md CHANGED
@@ -3,16 +3,17 @@ license: apache-2.0
3
  inference: false
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
- bling-phi-2-v0 is part of the BLING ("Best Little Instruct No GPU Required ...") model series, RAG-instruct trained on top of a Microsoft Phi-2B base model.
11
 
12
- BLING models are fine-tuned with high-quality custom instruct datasets, designed for rapid prototyping in RAG scenarios.
13
 
14
- For models with comparable size and performance in RAG deployments, please see:
15
 
 
16
  [**bling-stable-lm-3b-4e1t-v0**](https://huggingface.co/llmware/bling-stable-lm-3b-4e1t-v0)
17
  [**bling-sheared-llama-2.7b-0.1**](https://huggingface.co/llmware/bling-sheared-llama-2.7b-0.1)
18
  [**bling-red-pajamas-3b-0.1**](https://huggingface.co/llmware/bling-red-pajamas-3b-0.1)
@@ -64,24 +65,24 @@ BLING models have been trained for common RAG scenarios, specifically: questio
64
  without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
65
 
66
 
67
- ## Bias, Risks, and Limitations
68
-
69
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
70
 
71
- Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
72
 
 
 
 
73
 
74
- ## How to Get Started with the Model
75
 
76
- The fastest way to get started with BLING is through direct import in transformers:
 
 
77
 
78
- from transformers import AutoTokenizer, AutoModelForCausalLM
79
- tokenizer = AutoTokenizer.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
80
- model = AutoModelForCausalLM.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
81
 
82
- Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
83
 
84
- The dRAGon model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
85
 
86
  full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
87
 
@@ -95,29 +96,6 @@ To get the best results, package "my_prompt" as follows:
95
  my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
96
 
97
 
98
- If you are using a HuggingFace generation script:
99
-
100
- # prepare prompt packaging used in fine-tuning process
101
- new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"
102
-
103
- inputs = tokenizer(new_prompt, return_tensors="pt")
104
- start_of_output = len(inputs.input_ids[0])
105
-
106
- # temperature: set at 0.3 for consistency of output
107
- # max_new_tokens: set at 100 - may prematurely stop a few of the summaries
108
-
109
- outputs = model.generate(
110
- inputs.input_ids.to(device),
111
- eos_token_id=tokenizer.eos_token_id,
112
- pad_token_id=tokenizer.eos_token_id,
113
- do_sample=True,
114
- temperature=0.3,
115
- max_new_tokens=100,
116
- )
117
-
118
- output_only = tokenizer.decode(outputs[0][start_of_output:],skip_special_tokens=True)
119
-
120
-
121
  ## Model Card Contact
122
 
123
  Darren Oberst & llmware team
 
3
  inference: false
4
  ---
5
 
6
+ # BLING-PHI-2-GGUF
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
+ **bling-phi-2-gguf** is part of the BLING model series, RAG-instruct trained on top of a Microsoft Phi-2B base model.
11
 
12
+ BLING models are fine-tuned with high-quality custom instruct datasets, designed for rapid prototyping in RAG scenarios.
13
 
14
+ For other similar models with comparable size and performance in RAG deployments, please see:
15
 
16
+ [**bling-phi-3-gguf**](https://huggingface.co/llmware/bling-phi-3-gguf)
17
  [**bling-stable-lm-3b-4e1t-v0**](https://huggingface.co/llmware/bling-stable-lm-3b-4e1t-v0)
18
  [**bling-sheared-llama-2.7b-0.1**](https://huggingface.co/llmware/bling-sheared-llama-2.7b-0.1)
19
  [**bling-red-pajamas-3b-0.1**](https://huggingface.co/llmware/bling-red-pajamas-3b-0.1)
 
65
  without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
66
 
67
 
68
+ ## How to Get Started with the Model
 
 
69
 
70
+ To pull the model via API:
71
 
72
+ from huggingface_hub import snapshot_download
73
+ snapshot_download("llmware/dragon-yi-answer-tool", local_dir="/path/on/your/machine/", local_dir_use_symlinks=False)
74
+
75
 
76
+ Load in your favorite GGUF inference engine, or try with llmware as follows:
77
 
78
+ from llmware.models import ModelCatalog
79
+ model = ModelCatalog().load_model("bling-phi-2-gguf")
80
+ response = model.inference(query, add_context=text_sample)
81
 
82
+ Note: please review [**config.json**](https://huggingface.co/llmware/bling-phi-2-gguf/blob/main/config.json) in the repository for prompt wrapping information, details on the model, and full test set.
 
 
83
 
 
84
 
85
+ The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
86
 
87
  full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
88
 
 
96
  my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
97
 
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ## Model Card Contact
100
 
101
  Darren Oberst & llmware team