Instructions to use numind/NuExtract-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use numind/NuExtract-large with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="numind/NuExtract-large", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("numind/NuExtract-large", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use numind/NuExtract-large with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "numind/NuExtract-large"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "numind/NuExtract-large",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/numind/NuExtract-large

SGLang

How to use numind/NuExtract-large with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "numind/NuExtract-large" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "numind/NuExtract-large",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "numind/NuExtract-large" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "numind/NuExtract-large",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use numind/NuExtract-large with Docker Model Runner:
```
docker model run hf.co/numind/NuExtract-large
```

AssertionError: Flash Attention is not available, but is needed for dense attention

by tpadhi1 - opened Jul 29, 2024

Discussion

tpadhi1

Jul 29, 2024

•

edited Jul 29, 2024

model = AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs)
  File "/home/ubuntu/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
 return model_class.from_pretrained(
  File "/home/ubuntu/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained
 model = cls(config, *model_args, **model_kwargs)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 903, in __init__
 self.model = Phi3SmallModel(config)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 745, in __init__
 self.layers = nn.ModuleList([Phi3SmallDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 745, in <listcomp>
 self.layers = nn.ModuleList([Phi3SmallDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 651, in __init__
 self.self_attn = Phi3SmallSelfAttention(config, layer_idx)
  File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/numind/NuExtract-large/fc8e001871f4a6be8e6079093b33de334a2316c9/modeling_phi3_small.py", line 218, in __init__
 assert is_flash_attention_available, "Flash Attention is not available, but is needed for dense attention"
AssertionError: Flash Attention is not available, but is needed for dense attention

Model has been initiated by following code:

model_id = "numind/NuExtract-large"
# model_id = "numind/NuExtract"
model_kwargs = dict(
    use_cache=False,
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    torch_dtype="auto",
    device_map=None,
)
model = AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs)

Transformers Version: 4.43.2
Python Version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Operating System Info:
  System: Linux
  Release: 5.15.0-1064-aws
  Version: #70~20.04.1-Ubuntu SMP Fri Jun 14 15:42:13 UTC 2024
GPU Info:

tpadhi1

Jul 29, 2024

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment