DocLM / README.md
ashutoshrp06's picture
Upload folder using huggingface_hub
6a5233b verified
metadata
language:
  - en
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-3B
tags:
  - network-diagnostics
  - grpc
  - telemetry
  - gnmi
  - yang
  - sysctl
  - tcp
  - fine-tuned
  - qwen2.5
pipeline_tag: text-generation

DocLM

DocLM is a fine-tuned language model specialized in network telemetry diagnostics and debugging. It is built on top of Qwen2.5-Coder-3B and merged into a single FP16 model.

It is the inference engine behind the Telemetry Debugger CLI tool - a production-grade diagnostic assistant for network engineers working with gRPC, gNMI, YANG, and Linux kernel networking.


Model Details

Property Value
Base Model Qwen/Qwen2.5-Coder-3B
Model Type Causal Language Model
Precision FP16 (merged)
Fine-tuning Method LoRA (merged into base)
Parameters ~3B
Context Length 4096 tokens
License Apache 2.0

What DocLM Does

DocLM is trained to understand natural language requests from network engineers and respond with structured JSON function-calling plans. It operates within an agentic execution pipeline that includes RAG retrieval, transaction-based execution, and automatic rollback.

Specialized Domains

  • gRPC diagnostics - packet drop analysis, flow control events, stream health
  • TCP/network health checks - retransmit analysis, buffer sizing, connection state
  • Telemetry / gNMI - subscription tracing, path validation, stream monitoring
  • YANG model parsing - schema validation, data conformance checking
  • sysctl tuning - kernel parameter analysis and safe modification
  • General network debugging - multi-step diagnostic workflows with rollback safety

Intended Use

DocLM is designed to be used exclusively within the Telemetry Debugger CLI tool. It is not a general-purpose chat model. Its outputs are structured JSON function-calling plans, not free-form conversation.

{
  "reasoning": "High retransmit count on port 50051 suggests TCP buffer exhaustion.",
  "execution_strategy": "stop_on_error",
  "functions": [
    {
      "name": "check_tcp_health",
      "params": {"interface": "eth0", "port": 50051},
      "critical": false
    },
    {
      "name": "execute_sysctl_command",
      "params": {
        "parameter": "net.core.rmem_max",
        "value": "${previous.recommended_buffer_size}"
      },
      "critical": true,
      "depends_on": [0]
    }
  ],
  "explanation": "Increasing TCP receive buffer should resolve the packet drop rate."
}

Training Data

DocLM was fine-tuned on a dataset combining:

  • Synthetic data - structured Q&A pairs covering network diagnostic scenarios, function-calling examples, and multi-step remediation workflows
  • Public documentation - gRPC, OpenConfig, YANG (RFC 6020/7950), gNMI specification, and Linux kernel networking documentation

The dataset was constructed to teach the model to produce valid, grounded JSON function calls rather than free-form text responses.


Hardware Requirements

Setup Minimum
GPU VRAM 8GB (for FP16 inference)
RAM 16GB
Disk 8GB

Recommended: NVIDIA GPU with 16GB+ VRAM for comfortable inference at full context length. CPU-only inference is possible but significantly slower and not recommended for production use.


How to Use

DocLM is intended to be run via the Telemetry Debugger CLI which handles prompt construction, RAG retrieval, and structured output parsing automatically.

For direct inference via vLLM:

python -m vllm.entrypoints.openai.api_server \
  --model ashutoshrp06/DocLM \
  --dtype float16 \
  --max-model-len 4096

For direct inference via Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("ashutoshrp06/DocLM")
model = AutoModelForCausalLM.from_pretrained(
    "ashutoshrp06/DocLM",
    torch_dtype=torch.float16,
    device_map="auto"
)

Note: Without the full Telemetry Debugger pipeline (RAG context, function registry, system prompt), raw outputs will not be useful for end users. Direct inference is only recommended for developers integrating DocLM into their own tooling.


Limitations

  • DocLM is trained for a specific function registry. Prompts outside the Telemetry Debugger system prompt format will produce unpredictable outputs.
  • It is not suitable as a general-purpose assistant.
  • It does not have knowledge of events after its training data cutoff.
  • FP16 precision requires a CUDA-capable GPU for practical inference speeds.

License

Apache 2.0 - inherited from the Qwen2.5-3B base model. See LICENSE for details.


Citation

If you use DocLM in your work, please cite the base model:

@misc{qwen2.5-coder,
  title={Qwen2.5-Coder Technical Report},
  author={Qwen Team},
  year={2024},
  url={https://huggingface.co/Qwen/Qwen2.5-Coder-3B}
}