| | --- |
| | license: llama2 |
| | datasets: |
| | - hardikg2907/github-code-html-css-1 |
| | - mm-stack/html-sample |
| | language: |
| | - en |
| | base_model: |
| | - zstanjj/HTML-Pruner-Llama-1B |
| | - cshulby/YourTTS |
| | - CSgaoshouGroup/CSCupcakeCoder |
| | - Qwen/Qwen2-VL-7B-Instruct |
| | new_version: Qwen/Qwen2.5-Coder-32B-Instruct |
| | --- |
| | <!-- markdownlint-disable first-line-h1 --> |
| | <!-- markdownlint-disable html --> |
| |
|
| | <div align="center"> |
| | <h1> |
| | SlimPLM |
| | </h1> |
| | </div> |
| |
|
| | <p align="center"> |
| | 📝 <a href="https://arxiv.org/abs/2402.12052" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/SlimPLM" target="_blank">Github</a> |
| | </p> |
| |
|
| | <div align="center"> |
| | </div> |
| |
|
| | 🌹 If you use this model, please star our **[GitHub repository](https://github.com/plageon/SlimPlm)** to support us. Your star means a lot! |
| |
|
| | ## ✨ Latest News |
| |
|
| | - [1/25/2024]: Retrieval Necessity Judgment Model released in [Hugging Face](https://huggingface.co/zstanjj/SlimPLM-Retrieval-Necessity-Judgment/). |
| | - [2/20/2024]: Query Rewriting Model released in [Hugging Face](https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/). |
| | - [5/19/2024]: Our new work, **[Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs](https://aclanthology.org/2024.acl-long.242/)**, has been accepted by **ACL 2024 main** conference. |
| |
|
| | ## 🎬 Get Started |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | # construct prompt |
| | question = "Who voices Darth Vader in Star Wars Episodes III-VI, IX Rogue One, and Rebels?" |
| | heuristic_answer = "The voice of Darth Vader in Star Wars is provided by British actor James Earl Jones. He first voiced the character in the 1977 film \"Star Wars: Episode IV - A New Hope\", and his performance has been used in all subsequent Star Wars films, including the prequels and sequels." |
| | prompt = (f"<s>[INST] <<SYS>>\nYou are a helpful assistant. Your task is to parse user input into" |
| | f" structured formats according to the coarse answer. Current datatime is 2023-12-20 9:47:28" |
| | f" <</SYS>>\n Course answer: (({heuristic_answer}))\nQuestion: (({question})) [/INST]") |
| | params_query_rewrite = {"repetition_penalty": 1.05, "temperature": 0.01, "top_k": 1, "top_p": 0.85, |
| | "max_new_tokens": 512, "do_sample": False, "seed": 2023} |
| | |
| | # deploy model |
| | model = AutoModelForCausalLM.from_pretrained("zstanjj/SlimPLM-Query-Rewriting").eval() |
| | if torch.cuda.is_available(): |
| | model.cuda() |
| | tokenizer = AutoTokenizer.from_pretrained("zstanjj/SlimPLM-Query-Rewriting") |
| | |
| | # run inference |
| | input_ids = tokenizer.encode(prompt.format(question=question, answer=heuristic_answer), return_tensors="pt") |
| | len_input_ids = len(input_ids[0]) |
| | if torch.cuda.is_available(): |
| | input_ids = input_ids.cuda() |
| | outputs = model.generate(input_ids) |
| | res = tokenizer.decode(outputs[0][len_input_ids:], skip_special_tokens=True) |
| | print(res) |
| | ``` |
| |
|
| | ## ✏️ Citation |
| |
|
| | ``` |
| | @inproceedings{Tan2024SmallMB, |
| | title={Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs}, |
| | author={Jiejun Tan and Zhicheng Dou and Yutao Zhu and Peidong Guo and Kun Fang and Ji-Rong Wen}, |
| | year={2024}, |
| | url={https://arxiv.org/abs/2402.12052} |
| | } |
| | ``` |