Instructions to use IQuestLab/IQuest-Coder-V1-14B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IQuestLab/IQuest-Coder-V1-14B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="IQuestLab/IQuest-Coder-V1-14B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("IQuestLab/IQuest-Coder-V1-14B-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use IQuestLab/IQuest-Coder-V1-14B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IQuestLab/IQuest-Coder-V1-14B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IQuestLab/IQuest-Coder-V1-14B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/IQuestLab/IQuest-Coder-V1-14B-Instruct

SGLang

How to use IQuestLab/IQuest-Coder-V1-14B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IQuestLab/IQuest-Coder-V1-14B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IQuestLab/IQuest-Coder-V1-14B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IQuestLab/IQuest-Coder-V1-14B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IQuestLab/IQuest-Coder-V1-14B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use IQuestLab/IQuest-Coder-V1-14B-Instruct with Docker Model Runner:
```
docker model run hf.co/IQuestLab/IQuest-Coder-V1-14B-Instruct
```

zwpride-iquestlab commited on Mar 4

Commit

0bc1aa1

verified ·

1 Parent(s): bb6309e

Delete conversion_to_hf.log

Browse files

Files changed (1) hide show

conversion_to_hf.log +0 -137

conversion_to_hf.log DELETED Viewed

@@ -1,137 +0,0 @@
-Loaded loader_megatron_core as the loader.
-Loaded saver_llama2_hf_bf as the saver.
-Starting saver...
-Starting loader...
-fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
-/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import apex plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
-  warnings.warn(
-/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import huggingface plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
-  warnings.warn(
-/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import megatron plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
-  warnings.warn(
-Setting num_layers to 28 from checkpoint
-Setting hidden_size to 5120 from checkpoint
-Setting ffn_hidden_size to 27648 from checkpoint
-Setting seq_length to 131072 from checkpoint
-Setting num_attention_heads to 40 from checkpoint
-Setting num_query_groups to 8 from checkpoint
-Setting group_query_attention to True from checkpoint
-Setting kv_channels to 128 from checkpoint
-Setting max_position_embeddings to 131072 from checkpoint
-Setting position_embedding_type to rope from checkpoint
-Setting add_position_embedding to True from checkpoint
-Setting use_rotary_position_embeddings to True from checkpoint
-Setting rotary_base to 500000 from checkpoint
-Setting rotary_percent to 1.0 from checkpoint
-Setting rotary_interleaved to False from checkpoint
-Setting add_bias_linear to False from checkpoint
-Setting add_qkv_bias to False from checkpoint
-Setting squared_relu to False from checkpoint
-Setting swiglu to True from checkpoint
-Setting untie_embeddings_and_output_weights to True from checkpoint
-Setting apply_layernorm_1p to False from checkpoint
-Setting normalization to RMSNorm from checkpoint
-Setting apply_query_key_layer_scaling to False from checkpoint
-Setting attention_dropout to 0.0 from checkpoint
-Setting hidden_dropout to 0.0 from checkpoint
-Checkpoint did not provide arguments hybrid_override_pattern
-Checkpoint did not provide arguments spec
-Setting hybrid_attention_ratio to 0.0 from checkpoint
-Setting hybrid_mlp_ratio to 0.0 from checkpoint
-Checkpoint did not provide arguments num_experts
-Setting moe_layer_freq to 1 from checkpoint
-Setting moe_router_topk to 2 from checkpoint
-Setting moe_router_pre_softmax to False from checkpoint
-Setting moe_grouped_gemm to False from checkpoint
-Checkpoint did not provide arguments moe_shared_expert_intermediate_size
-Setting mamba_state_dim to 128 from checkpoint
-Setting mamba_head_dim to 64 from checkpoint
-Setting mamba_num_groups to 8 from checkpoint
-Checkpoint did not provide arguments mamba_num_heads
-Setting is_hybrid_model to False from checkpoint
-Checkpoint did not provide arguments heterogeneous_layers_config_path
-Checkpoint did not provide arguments heterogeneous_layers_config_encoded_json
-Setting tokenizer_type to SFTTokenizer from checkpoint
-Setting tokenizer_model to /cpfs01/users/wzhang/iquest-coder-v1.1/RepoData-Ucoder-32B-128k-from2.5.2/97.09B_instruct_iquest-coder from checkpoint
-Checkpoint did not provide arguments tiktoken_pattern
-Setting padded_vocab_size to 76800 from checkpoint
-INFO:megatron.core.num_microbatches_calculator:setting number of microbatches to constant 1
-WARNING: one_logger package is required to enable e2e metrics tracking. please go to https://confluence.nvidia.com/display/MLWFO/Package+Repositories for details to install it
-building GPT model ...
-(TP, PP) mismatch after resume ((1, 1) vs (8, 1) from checkpoint): RNG state will be ignored
-sharded_state_dict metadata loaded from the checkpoint: {'distrib_optim_sharding_type': 'dp_reshardable', 'singleton_local_shards': False, 'chained_optim_avoid_prefix': True}
-Job sharding has changed: Rerun state will be ignored
- loading distributed checkpoint from /tmp/megatron_convert_iter1616_node0_pid360_42a53cb4 at iteration 1616
-/volume/pt-train/users/wzhang/wjj-workspace/code-sft/src/training/Megatron-LM/megatron/core/dist_checkpointing/strategies/torch.py:956: FutureWarning: `load_state_dict` is deprecated and will be removed in future versions. Please use `load` instead.
-  checkpoint.load_state_dict(
-/usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/planner_helpers.py:406: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
-  device = getattr(value, "device", None)
-/usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/default_planner.py:454: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
-  and md.size != obj.size()
- checkpoint version 3.0
-  successfully loaded checkpoint from /tmp/megatron_convert_iter1616_node0_pid360_42a53cb4 [ t 1/1, p 1/1 ] at iteration 1616
-sending embeddings
-sending transformer layer 0
-sending transformer layer 1
-sending transformer layer 2
-sending transformer layer 3
-sending transformer layer 4
-sending transformer layer 5
-sending transformer layer 6
-sending transformer layer 7
-sending transformer layer 8
-sending transformer layer 9
-sending transformer layer 10
-sending transformer layer 11
-sending transformer layer 12
-sending transformer layer 13
-sending transformer layer 14
-sending transformer layer 15
-sending transformer layer 16
-sending transformer layer 17
-sending transformer layer 18
-sending transformer layer 19
-sending transformer layer 20
-sending transformer layer 21
-sending transformer layer 22
-sending transformer layer 23
-sending transformer layer 24
-sending transformer layer 25
-sending transformer layer 26
-sending transformer layer 27
-sending final norm
-sending output layer
-Waiting for saver to complete...
-fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
-received embeddings
-received transformer layer 0
-received transformer layer 1
-received transformer layer 2
-received transformer layer 3
-received transformer layer 4
-received transformer layer 5
-received transformer layer 6
-received transformer layer 7
-received transformer layer 8
-received transformer layer 9
-received transformer layer 10
-received transformer layer 11
-received transformer layer 12
-received transformer layer 13
-received transformer layer 14
-received transformer layer 15
-received transformer layer 16
-received transformer layer 17
-received transformer layer 18
-received transformer layer 19
-received transformer layer 20
-received transformer layer 21
-received transformer layer 22
-received transformer layer 23
-received transformer layer 24
-received transformer layer 25
-received transformer layer 26
-received transformer layer 27
-received final norm
-received output layer
-Saving model to disk ...