Instructions to use Liuchien/nlp-mt5-base-drcd with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Liuchien/nlp-mt5-base-drcd with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Liuchien/nlp-mt5-base-drcd")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Liuchien/nlp-mt5-base-drcd")
model = AutoModelForSeq2SeqLM.from_pretrained("Liuchien/nlp-mt5-base-drcd")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Liuchien/nlp-mt5-base-drcd with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Liuchien/nlp-mt5-base-drcd"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Liuchien/nlp-mt5-base-drcd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Liuchien/nlp-mt5-base-drcd

SGLang

How to use Liuchien/nlp-mt5-base-drcd with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Liuchien/nlp-mt5-base-drcd" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Liuchien/nlp-mt5-base-drcd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Liuchien/nlp-mt5-base-drcd" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Liuchien/nlp-mt5-base-drcd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Liuchien/nlp-mt5-base-drcd with Docker Model Runner:
```
docker model run hf.co/Liuchien/nlp-mt5-base-drcd
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

DRCD dataset

台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。 DRCD資料集從2,108篇維基條目中整理出10,014篇段落，並從段落中標註出30,000多個問題。

Available models

mT5 (base on google/mt5-base)

Abstract

我們提出了Abstracting from Confusion（AFC），並利用DRCD資料集進行微調，微調10個Epoch。

在此實驗設計中，DRCD基準資料集中的每個問題，會搭配10個和問題最相近的段落，額外還有1個保證包含正確答案的最佳段落(The Best Passage)，在BERT閱讀器測試方面，每次進行閱讀理解測試時，是輸入問題和最佳段落，並對比閱讀器預測結果和標準答案之間的差異，計算出F1分數和EM分數。對比閱讀器預測結果和標準答案之間的差異，計算出F1分數和EM分數，分別測試兩個閱讀器，我們可以發現AFC閱讀器的表現並不遜色於BERT閱讀器，甚至在分數表現上更好。

在我們的情境中，基於Text-to-Text Generation概念實作出來的Extractor，在混雜資料上的表現，更優於Bert，詳細參考原論文基於 Fusion-in-Decoder 之中文開放領域問答研究。

Method

將問題(Question)和10句各自獨立的句子(Sentences)組合成輸入，模型可以推理出這10個句子中和問題最相符的答案。

\\Input=question:balabal context:senten1[SEP]senten2[SEP]senten3....
\\Output=abstract result
model = MT5ForConditionalGeneration.from_pretrained("nchu-nlp-lct/nlp-mt5-base-drcd")
tokenizer = MT5Tokenizer.from_pretrained("nchu-nlp-lct/nlp-mt5-base-drcd")

tokenized_inputs = tokenizer.batch_encode_plus(
          ["confusion:2022年,你知道日本職棒上季僅有24個打席，卻奪下年度盜壘王獎項的球員是誰嗎？ context:和田康士朗,2021年-- 全季出賽96場，且僅有24打席、19打數，但季末仍以24次盜壘成功與荻野貴司、西川遥輝、源田壮亮等3人並列洋聯盜壘王，為2011年藤村大介後再次出現未達規定打席的盜壘王；且一舉刷新1966年山本公士158打席、1944年吳昌征93打席的兩聯盟暨單一聯盟最少打席獲得盜壘王的雙重紀錄，並為首位獲得日本職棒年度個人獎項的棒球挑戰聯盟出身球員[SEP]和田康士朗,個人年表,2021年-- 全季出賽96場，且僅有24打席、19打數，但季末仍以24次盜壘成功與荻野貴司、西川遥輝、源田壮亮等3人並列洋聯盜壘王，為2011年藤村大介後再次出現未達規定打席的盜壘王；且一舉刷新1966年山本公士158打席、1944年吳昌征93打席的兩聯盟暨單一聯盟最少打席獲得盜壘王的雙重紀錄，並為首位獲得日本職棒年度個人獎項的棒球挑戰聯盟出身球員[SEP]日本職棒年度盜壘王,單一聯盟時期,年度:1944,球員名:吳新亨,球隊:巨人隊,盜壘成功:19,盜壘次數:24,成功率:.792"], pad_to_max_length=True, return_tensors="pt"
      )
out=model.generate(tokenized_inputs['input_ids'])

//取得答案
for ids in out:
    print(tokenizer2.decode(ids))

Downloads last month: -

Model tree for Liuchien/nlp-mt5-base-drcd

Base model

google/mt5-base

Finetuned

(314)

this model