DocLM / README.md

ashutoshrp06

Upload folder using huggingface_hub

6a5233b verified 1 day ago

preview code

raw

history blame contribute delete

5.03 kB

metadata

language:
  - en
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-3B
tags:
  - network-diagnostics
  - grpc
  - telemetry
  - gnmi
  - yang
  - sysctl
  - tcp
  - fine-tuned
  - qwen2.5
pipeline_tag: text-generation

DocLM

DocLM is a fine-tuned language model specialized in network telemetry diagnostics and debugging. It is built on top of Qwen2.5-Coder-3B and merged into a single FP16 model.

It is the inference engine behind the Telemetry Debugger CLI tool - a production-grade diagnostic assistant for network engineers working with gRPC, gNMI, YANG, and Linux kernel networking.

Model Details

Property	Value
Base Model	Qwen/Qwen2.5-Coder-3B
Model Type	Causal Language Model
Precision	FP16 (merged)
Fine-tuning Method	LoRA (merged into base)
Parameters	~3B
Context Length	4096 tokens
License	Apache 2.0

What DocLM Does

DocLM is trained to understand natural language requests from network engineers and respond with structured JSON function-calling plans. It operates within an agentic execution pipeline that includes RAG retrieval, transaction-based execution, and automatic rollback.

Specialized Domains

gRPC diagnostics - packet drop analysis, flow control events, stream health
TCP/network health checks - retransmit analysis, buffer sizing, connection state
Telemetry / gNMI - subscription tracing, path validation, stream monitoring
YANG model parsing - schema validation, data conformance checking
sysctl tuning - kernel parameter analysis and safe modification
General network debugging - multi-step diagnostic workflows with rollback safety

Intended Use

DocLM is designed to be used exclusively within the Telemetry Debugger CLI tool. It is not a general-purpose chat model. Its outputs are structured JSON function-calling plans, not free-form conversation.

{
  "reasoning": "High retransmit count on port 50051 suggests TCP buffer exhaustion.",
  "execution_strategy": "stop_on_error",
  "functions": [
    {
      "name": "check_tcp_health",
      "params": {"interface": "eth0", "port": 50051},
      "critical": false
    },
    {
      "name": "execute_sysctl_command",
      "params": {
        "parameter": "net.core.rmem_max",
        "value": "${previous.recommended_buffer_size}"
      },
      "critical": true,
      "depends_on": [0]
    }
  ],
  "explanation": "Increasing TCP receive buffer should resolve the packet drop rate."
}

Training Data

DocLM was fine-tuned on a dataset combining:

Synthetic data - structured Q&A pairs covering network diagnostic scenarios, function-calling examples, and multi-step remediation workflows
Public documentation - gRPC, OpenConfig, YANG (RFC 6020/7950), gNMI specification, and Linux kernel networking documentation

The dataset was constructed to teach the model to produce valid, grounded JSON function calls rather than free-form text responses.

Hardware Requirements

Setup	Minimum
GPU VRAM	8GB (for FP16 inference)
RAM	16GB
Disk	8GB

Recommended: NVIDIA GPU with 16GB+ VRAM for comfortable inference at full context length. CPU-only inference is possible but significantly slower and not recommended for production use.

How to Use

DocLM is intended to be run via the Telemetry Debugger CLI which handles prompt construction, RAG retrieval, and structured output parsing automatically.

For direct inference via vLLM:

python -m vllm.entrypoints.openai.api_server \
  --model ashutoshrp06/DocLM \
  --dtype float16 \
  --max-model-len 4096

For direct inference via Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("ashutoshrp06/DocLM")
model = AutoModelForCausalLM.from_pretrained(
    "ashutoshrp06/DocLM",
    torch_dtype=torch.float16,
    device_map="auto"
)

Note: Without the full Telemetry Debugger pipeline (RAG context, function registry, system prompt), raw outputs will not be useful for end users. Direct inference is only recommended for developers integrating DocLM into their own tooling.

Limitations

DocLM is trained for a specific function registry. Prompts outside the Telemetry Debugger system prompt format will produce unpredictable outputs.
It is not suitable as a general-purpose assistant.
It does not have knowledge of events after its training data cutoff.
FP16 precision requires a CUDA-capable GPU for practical inference speeds.

License

Apache 2.0 - inherited from the Qwen2.5-3B base model. See LICENSE for details.

Citation

If you use DocLM in your work, please cite the base model:

@misc{qwen2.5-coder,
  title={Qwen2.5-Coder Technical Report},
  author={Qwen Team},
  year={2024},
  url={https://huggingface.co/Qwen/Qwen2.5-Coder-3B}
}