环境问题及提取embedding操作

by jim1616 - opened Jan 12

Jan 12

非常棒的工作！
请问作者这个环境中的pytorch不能是GPU版本吗？我看你在requirements中就简单写了一个环境的版本信息。
另外如果我是只是提取一些中文文本变为embedding，需要怎么操作呢，有示例代码吗？
小白还在学习中，望赐教

YuPeng0214

Kingsoft AI org Jan 26

我们的实现正是在gpu环境下进行的，torch的安装会根据你的cuda版本自动匹配。如果提取中文文本embedding是模型最基本的操作，按照给定的代码示例直接在你的环境下运行。

YuPeng0214

Kingsoft AI org Jan 26

from sentence_transformers import SentenceTransformer
from sklearn.preprocessing import normalize

def get_prompteol_input(text: str) -> str:
return f"This sentence: <|im_start|>“{text}” means in one word: “"

def get_detailed_instruct(task_description: str, query: str) -> str:
return f'Instruct: {task_description}\nQuery:{query}'

model = SentenceTransformer(
"Kingsoft-LLM/QZhou-Embedding-Zh",
model_kwargs={"device_map": "cuda", "trust_remote_code": True},
tokenizer_kwargs={"padding_side": "left", "trust_remote_code": True},
trust_remote_code=True
)

task= "Given a web search query, retrieve relevant passages that answer the query"
queries = [
get_prompteol_input(get_detailed_instruct(task, "光合作用是什么？")),
get_prompteol_input(get_detailed_instruct(task, "电话是谁发明的？"))
]

documents = [
get_prompteol_input("光合作用是绿色植物利用阳光、二氧化碳和水生成葡萄糖和氧气的过程。这一生化反应发生在叶绿体中。"),
get_prompteol_input("亚历山大·格拉汉姆·贝尔（Alexander Graham Bell）因于1876年发明了第一台实用电话而广受认可，并为此设备获得了美国专利第174,465号。")
]

query_embeddings = model.encode(queries, normalize_embeddings=False)
document_embeddings = model.encode(documents, normalize_embeddings=False)

dim=1792 # 128, 256, 512, 768, 1024, 1280, 1536, 1792
query_embeddings = normalize(query_embeddings[:, :dim])
document_embeddings = normalize(document_embeddings[:, :dim])

similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

这是原始的执行代码，query_embeddings和document_embeddings就是你的文本embeddings，如果你的文本是个问题，按照query_embeddings的计算过程得到；如果是文本段落，按照document_embeddings生成。

jim1616

Jan 26

好的谢谢！

jim1616

Jan 26

明白了，感谢你的指导！

jim1616 changed discussion status to closed Jan 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment