File size: 5,033 Bytes
6a5233b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-3B
tags:
- network-diagnostics
- grpc
- telemetry
- gnmi
- yang
- sysctl
- tcp
- fine-tuned
- qwen2.5
pipeline_tag: text-generation
---

# DocLM

DocLM is a fine-tuned language model specialized in network telemetry diagnostics and debugging. It is built on top of [Qwen2.5-Coder-3B](https://huggingface.co/Qwen/Qwen2.5-Coder-3B) and merged into a single FP16 model.

It is the inference engine behind the **Telemetry Debugger** CLI tool - a production-grade diagnostic assistant for network engineers working with gRPC, gNMI, YANG, and Linux kernel networking.

---

## Model Details

| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-3B |
| Model Type | Causal Language Model |
| Precision | FP16 (merged) |
| Fine-tuning Method | LoRA (merged into base) |
| Parameters | ~3B |
| Context Length | 4096 tokens |
| License | Apache 2.0 |

---

## What DocLM Does

DocLM is trained to understand natural language requests from network engineers and respond with structured JSON function-calling plans. It operates within an agentic execution pipeline that includes RAG retrieval, transaction-based execution, and automatic rollback.

### Specialized Domains

- **gRPC diagnostics** - packet drop analysis, flow control events, stream health
- **TCP/network health checks** - retransmit analysis, buffer sizing, connection state
- **Telemetry / gNMI** - subscription tracing, path validation, stream monitoring
- **YANG model parsing** - schema validation, data conformance checking
- **sysctl tuning** - kernel parameter analysis and safe modification
- **General network debugging** - multi-step diagnostic workflows with rollback safety

---

## Intended Use

DocLM is designed to be used **exclusively within the Telemetry Debugger CLI tool**. It is not a general-purpose chat model. Its outputs are structured JSON function-calling plans, not free-form conversation.

```json
{
  "reasoning": "High retransmit count on port 50051 suggests TCP buffer exhaustion.",
  "execution_strategy": "stop_on_error",
  "functions": [
    {
      "name": "check_tcp_health",
      "params": {"interface": "eth0", "port": 50051},
      "critical": false
    },
    {
      "name": "execute_sysctl_command",
      "params": {
        "parameter": "net.core.rmem_max",
        "value": "${previous.recommended_buffer_size}"
      },
      "critical": true,
      "depends_on": [0]
    }
  ],
  "explanation": "Increasing TCP receive buffer should resolve the packet drop rate."
}
```

---

## Training Data

DocLM was fine-tuned on a dataset combining:

- **Synthetic data** - structured Q&A pairs covering network diagnostic scenarios, function-calling examples, and multi-step remediation workflows
- **Public documentation** - gRPC, OpenConfig, YANG (RFC 6020/7950), gNMI specification, and Linux kernel networking documentation

The dataset was constructed to teach the model to produce valid, grounded JSON function calls rather than free-form text responses.

---

## Hardware Requirements

| Setup | Minimum |
|---|---|
| GPU VRAM | 8GB (for FP16 inference) |
| RAM | 16GB |
| Disk | 8GB |

Recommended: NVIDIA GPU with 16GB+ VRAM for comfortable inference at full context length. CPU-only inference is possible but significantly slower and not recommended for production use.

---

## How to Use

DocLM is intended to be run via the Telemetry Debugger CLI which handles prompt construction, RAG retrieval, and structured output parsing automatically.

For direct inference via vLLM:

```bash
python -m vllm.entrypoints.openai.api_server \
  --model ashutoshrp06/DocLM \
  --dtype float16 \
  --max-model-len 4096
```

For direct inference via Hugging Face Transformers:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("ashutoshrp06/DocLM")
model = AutoModelForCausalLM.from_pretrained(
    "ashutoshrp06/DocLM",
    torch_dtype=torch.float16,
    device_map="auto"
)
```

> Note: Without the full Telemetry Debugger pipeline (RAG context, function registry, system prompt), raw outputs will not be useful for end users. Direct inference is only recommended for developers integrating DocLM into their own tooling.

---

## Limitations

- DocLM is trained for a specific function registry. Prompts outside the Telemetry Debugger system prompt format will produce unpredictable outputs.
- It is not suitable as a general-purpose assistant.
- It does not have knowledge of events after its training data cutoff.
- FP16 precision requires a CUDA-capable GPU for practical inference speeds.

---

## License

Apache 2.0 - inherited from the Qwen2.5-3B base model. See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.

---

## Citation

If you use DocLM in your work, please cite the base model:

```
@misc{qwen2.5-coder,
  title={Qwen2.5-Coder Technical Report},
  author={Qwen Team},
  year={2024},
  url={https://huggingface.co/Qwen/Qwen2.5-Coder-3B}
}
```