Building Convergence – A Journey from Network Observability to AI-Driven Automation Part 10: Building MCP Servers for NetClaw: How a Spec Framework I'd Never Heard Of Changed the Way I Write Code

Posted Apr 18, 2026

By Byrn Baker

17 min read

I didn’t know what spec-driven development was three weeks ago. I’d never written a spec.md before writing code. I’d never asked an API “what fields do you actually have?” before designing an integration. I’d never written 1,249 lines of documentation for a project and then written less code than the documentation.

I’m a network engineer. When I need a thing, I build the thing. I open a file, I write Python, I hit the API, I iterate until it works. That’s how every project I’ve ever done has started — including the ten phases of Convergence that I eventually tore apart and rebuilt inside NetClaw.

NetClaw has a different approach. The repo ships with a spec framework — templates in .specify/templates/ and a specs/ directory where the actual spec documents live. Each feature gets a numbered subdirectory with a structured set of files: spec, research, data model, contracts, plan, tasks, quickstart, checklist. Eight files per feature, written before implementation starts. Twenty-six MCP servers in the repo had been built this way before I touched anything.

I thought it was overkill. Then I used it to build the Nautobot MCP server v2, and it caught three API-breaking mistakes before I wrote a single line of server code. This is what happened.

What Was Already There

There was already a Nautobot MCP server in the repo. The community mcp-nautobot project by aiopnet was sitting at mcp-servers/mcp-nautobot/, wired into config/openclaw.json, and working. It had five tools: get_ip_addresses, get_prefixes, get_ip_address_by_id, search_ip_addresses, and test_connection. All read-only. All REST API. All IPAM — IP addresses and prefixes, nothing else.

It was well-engineered code. Async httpx client, Pydantic models, rate limiting, proper error hierarchy. 1,237 lines of Python across two files. For what it did — query Nautobot’s IPAM — it worked fine.

But I needed more. My Nautobot 3.1.0 instance has three devices, 34 VLANs, 63 interfaces per switch, cables between the switches, and five plugins (Golden Config, Firewall Models, BGP Models, IGP Models, SSoT). I needed the agent to see all of that. I needed it to write back to Nautobot when the source of truth drifted from reality. I needed it to compare what pyATS sees on the live switches against what Nautobot says should be there.

The community server couldn’t do any of that. And the gap wasn’t a few missing tools — it was architectural. REST-only vs GraphQL, read-only vs writes, IPAM-only vs full DCIM plus plugins. Patching it would have meant rewriting it. So I built v2 alongside it.

The Spec Framework: What It Is and Why I Was Skeptical

NetClaw’s specs/ directory had 26 numbered subdirectories when I started. Each one followed the same structure:

specs/027-nautobot-mcp-v2/
├── spec.md              # What to build and why
├── research.md          # What the API actually looks like (not what the docs say)
├── data-model.md        # Entities, field names, API patterns
├── contracts/
│   └── mcp-tools.md     # Every tool signature — parameters, returns, examples
├── plan.md              # Architecture decisions, project structure
├── quickstart.md        # How to install and test
├── tasks.md             # Ordered implementation tasks with dependencies
└── checklists/
    └── requirements.md  # Quality gate before implementation starts

The templates live in .specify/templates/. They’re opinionated. The spec template demands user stories with priorities, acceptance scenarios in Given/When/Then format, numbered functional requirements, measurable success criteria, and explicit assumptions. The plan template has a “Constitution Check” — a table where you verify the feature against NetClaw’s operating principles (safety-first, ITSM-gated changes, audit trail, credential safety, backwards compatibility).

My first reaction was: this is a lot of writing for a Python script that talks to an API.

My second reaction, after hitting three different GraphQL field name mismatches during the research phase: this is why the other 26 servers work.

Research: The Part That Changed Everything

The spec framework says: before you design anything, go poke the actual API and write down what you find. Not what the docs say. What the live instance returns.

I’d never done this as a formal step. I’d always just started coding and fixed errors as they came up. The research phase made me do it systematically, and it caught three things that would have wasted hours:

Discovery 1: No GraphQL Mutations

The plan was “use GraphQL for everything.” I ran an introspection query:

        
      
curl -sk -X POST \
  -H "Authorization: Token $NAUTOBOT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"{ __schema { mutationType { fields { name } } } }"}' \
  "$NAUTOBOT_URL/api/graphql/"

        
      
{
  "data": {
    "__schema": {
      "mutationType": null
    }
  }
}

mutationType: null. Nautobot 3.1.0 uses graphene_django 3.2.3, which exposes GraphQL queries but no mutations. The entire write strategy had to change before I’d written a single tool.

I checked the REST API with an OPTIONS request:

        
      
curl -sk -X OPTIONS \
  -H "Authorization: Token $NAUTOBOT_TOKEN" \
  "$NAUTOBOT_URL/api/dcim/interfaces/"

        
      
{
  "actions": {
    "PUT": { ... },
    "POST": { ... }
  }
}

REST supports writes. GraphQL doesn’t. The architecture became: GraphQL for reads, REST for writes. If I’d started coding with the “GraphQL for everything” assumption, I would have built half the server before discovering writes don’t work.

Discovery 2: `interface_assignments` Not `ip_address_assignments`

The Nautobot docs and the community server reference ip_address_assignments. My instance uses interface_assignments:

        
# Nautobot 3.1.0:
{ ip_addresses { interface_assignments { interface { name device { name } } } } }

Every one of these went into research.md as a numbered decision. When I started writing server.py, there were no surprises left. That had never happened to me before on a project like this.

Why Not Just Patch the Existing Server?

The community server was built around Pydantic models for IPAddress and Prefix — two specific object types with hand-defined fields. Adding devices would mean a new model, a new REST endpoint method, a new tool definition, new filter parameters. Then interfaces. Then VLANs. Then cables. Each one is a new model, new method, new tool. By the time you’ve added six object types, you’ve rewritten the server anyway.

More fundamentally, the v1 server uses REST for everything. Nautobot’s REST API returns deeply nested JSON with full object representations for every related field. Ask for an interface and you get the entire device object, the entire status object, the entire VLAN objects — whether you wanted them or not.

Nautobot’s GraphQL endpoint lets you ask for exactly what you need:

        
      
{
  interfaces(device: "HomeSwitch01", limit: 5) {
    name type enabled mtu mode
    untagged_vlan { vid name }
    ip_addresses { address }
    connected_interface { name device { name } }
  }
}

One HTTP request. The REST equivalent would be: fetch interfaces (paginated), then for each interface fetch the VLAN, fetch the IPs, fetch the cable, resolve the cable’s far-end interface, fetch that interface’s device. Five to ten requests per interface.

So I needed GraphQL for reads, REST for writes, plugin support, reconciliation, and ITSM gating. The community server was a starting point for understanding the problem, but the solution was a different server.

The Architecture: Three Files, 30 Tools

The v2 server is three Python files. The spec framework’s contracts/mcp-tools.md defined every tool signature before I wrote any of them, so implementation was just filling in the bodies.

nautobot_client.py (207 lines) — The API client. The key design decision is ID resolution. Nautobot’s REST API requires UUIDs for related objects, but humans (and LLMs) think in names. The client resolves names to UUIDs automatically:

        
      
async def resolve_id(self, object_type: str, name: str) -> str:
    """Resolve a human-readable name to a Nautobot UUID."""
    cache_key = f"{object_type}:{name}"
    if cache_key in _id_cache:
        return _id_cache[cache_key]

    query_map = {
        "status": '{{ statuses(name: "{}") {{ id }} }}'.format(_esc(name)),
        "device": '{{ devices(name: "{}") {{ id }} }}'.format(_esc(name)),
        "location": '{{ locations(name: "{}") {{ id }} }}'.format(_esc(name)),
        # ... more types
    }
    data = await self.graphql(query_map[object_type])
    items = _first_list(data)
    if not items:
        raise NautobotError(f"{object_type} '{name}' not found in Nautobot.")
    uid = items[0]["id"]
    _id_cache[cache_key] = uid
    return uid

When the agent says “create VLAN 200 with status Active at location House,” the client resolves “Active” and “House” to UUIDs before hitting the REST API. The agent never sees a UUID.

server.py (1,153 lines) — All 30 tool definitions. Every read tool follows the same pattern — build a GraphQL query with filters, call the client, return JSON:

        
      
@mcp.tool()
async def nautobot_get_devices(
    name: Optional[str] = None,
    location: Optional[str] = None,
    role: Optional[str] = None,
    platform: Optional[str] = None,
    status: Optional[str] = None,
    q: Optional[str] = None,
    limit: int = 50,
    offset: int = 0,
) -> str:
    """Query devices from Nautobot. Returns name, role, platform, location, status, primary IP, serial."""
    filt = _gql_filters(
        name=name, location=location, role=role, platform=platform,
        status=status, q=q, limit=limit, offset=offset,
    )
    query = f"""{{
  devices{filt} {{
    name serial status {{ name }} role {{ name }} platform {{ name }}
    location {{ name }} device_type {{ model manufacturer {{ name }} }}
    primary_ip4 {{ address }} primary_ip6 {{ address }} comments
  }}
}}"""
    data = await client.graphql(query)
    devices = data.get("devices", [])
    return json.dumps({"count": len(devices), "devices": devices}, indent=2)

A helper builds the GraphQL filter string from keyword args so every tool gets filtering and pagination for free:

        
      
def _gql_filters(**kwargs) -> str:
    parts = []
    for k, v in kwargs.items():
        if v is None:
            continue
        if isinstance(v, bool):
            parts.append(f"{k}: {'true' if v else 'false'}")
        elif isinstance(v, int):
            parts.append(f"{k}: {v}")
        else:
            parts.append(f'{k}: "{_esc(str(v))}"')
    return f"({', '.join(parts)})" if parts else ""

Adding a new read tool is ~20 lines. Define the parameters, write the GraphQL query, call the client. The pattern is so consistent that the plugin tools — Golden Config, Firewall, BGP, OSPF — were each written in minutes.

Write tools follow a different pattern — ITSM gating first, then ID resolution, then REST POST:

        
      
@mcp.tool()
async def nautobot_create_vlan(
    vid: int,
    name: str,
    status: str = "Active",
    location: Optional[str] = None,
    cr_number: Optional[str] = None,
) -> str:
    """Create a VLAN in Nautobot. ITSM-gated."""
    blocked = _check_itsm(cr_number)
    if blocked:
        return json.dumps({"error": blocked})

    status_id = await client.resolve_id("status", status)
    payload = {"vid": vid, "name": name, "status": status_id}
    result = await client.rest_post("ipam/vlans", payload)

    if location:
        loc_id = await client.resolve_id("location", location)
        await client.rest_post(
            "ipam/vlan-location-assignments",
            {"vlan": result["id"], "location": loc_id},
        )

    return json.dumps({"created": True, "vlan": result}, indent=2)

Every write tool checks ITSM first. ITSM_LAB_MODE=true bypasses the gate for home lab use. In production, every write requires a ServiceNow Change Request number.

reconcile.py (103 lines) — The diff engine. The agent calls pyATS MCP to get live interfaces, then passes that data to nautobot_reconcile along with the device name. The tool queries Nautobot internally and returns a structured diff:

        
      
def reconcile_interfaces(nautobot_interfaces, live_interfaces):
    nb_map = {_norm(i["name"]): i for i in nautobot_interfaces}
    live_map = {_norm(i["name"]): i for i in live_interfaces}

    both = set(nb_map) & set(live_map)
    matches, mismatches = [], []
    for name in sorted(both):
        diffs = _compare(nb_map[name], live_map[name])
        if diffs:
            mismatches.append({"name": nb_map[name]["name"], "differences": diffs})
        else:
            matches.append({"name": nb_map[name]["name"]})

    return {
        "summary": { "matches": len(matches), "mismatches": len(mismatches), ... },
        "matches": matches,
        "mismatches": mismatches,
        "nautobot_only": [...],
        "device_only": [...],
    }

Plugin Tools: Why They’re in the Same Server

Our Nautobot instance has five plugins: Golden Config, Firewall Models, BGP Models, IGP Models, and SSoT. Each adds its own GraphQL types and REST endpoints. I asked whether these should be separate MCP servers.

The answer is no, and it’s obvious once you think about it:

Same API, same auth, same connection. Five separate servers would mean five httpx clients hitting the same Nautobot instance with the same token.
The client infrastructure already exists. Adding nautobot_get_firewall_policies was 25 lines — a new GraphQL query string and a @mcp.tool() decorator. The client, error handling, and ITSM gating are shared.
The agent thinks in terms of “Nautobot,” not “Nautobot Golden Config Plugin.” The tool boundary should match the mental model.

The Golden Config plugin got 11 tools because it has the most setup surface — the agent needs to create compliance features, compliance rules, git repositories, GraphQL queries, and wire the Golden Config Setting. These tools exist so the agent can bootstrap the entire Golden Config plugin from scratch, which is what spec 028 describes.

Introspection-Driven Development

Even with the research phase, the plugin tools hit field name mismatches during implementation. The Nautobot docs said protocol. The live schema said ip_protocol:

        
      
# Docs suggest:
{ service_objects { name port protocol } }
# Error: Cannot query field 'protocol'. Did you mean 'ip_protocol'?

The OSPF plugin’s configuration type doesn’t have a routing_instance field — it has instance, and it’s a nested object:

        
      
# First attempt:
{ ospf_configurations { routing_instance { device { name } } } }
# Error: Cannot query field 'routing_instance'

# Second attempt:
{ ospf_configurations { instance process_id } }
# Error: Field 'instance' must have a selection of subfields

# Correct:
{ ospf_configurations { instance { device { name } } process_id } }

Every one of these was caught by running the actual query against the live instance. This is why the spec framework insists on research.md. It’s not a formality — it’s the difference between a server that works and a server that throws 400 errors on every third tool call.

Testing Against Reality

The final integration test hit all 30 tools against the live Nautobot instance:

NAUTOBOT MCP V2 — FULL INTEGRATION TEST
============================================================
  [OK] tool_count: PASS (30 tools)
  [OK] connection: PASS
  [OK] get_devices: PASS (3 devices)
  [OK] get_interfaces: PASS (3 ifaces)
  [OK] get_vlans: PASS (5 vlans)
  [OK] get_prefixes: PASS (2 prefixes)
  [OK] get_ip_addresses: PASS (3 ips)
  [OK] get_cables: PASS (2 cables)
  [OK] graphql: PASS
  [OK] reconcile: PASS
  [OK] itsm_lab_mode: PASS (lab mode, writes allowed)
  [OK] update_object: PASS
  [OK] get_golden_configs: PASS (0 configs)
  [OK] get_config_compliance: PASS (0 records)
  [OK] get_compliance_rules: PASS (0 features, 0 rules)
  [OK] get_gc_settings: PASS (1 settings)
  [OK] get_git_repos: PASS (1 repos)
  [OK] get_graphql_queries: PASS (0 queries)
  [OK] get_fw_policies: PASS (0 policies)
  [OK] get_fw_zones: PASS (0 zones)
  [OK] get_nat_policies: PASS (0 policies)
  [OK] get_bgp_routing: PASS (0 instances)
  [OK] get_asns: PASS (0 ASNs)
  [OK] get_ospf_routing: PASS
  [OK] create_compliance_feature: PASS
============================================================

The plugin tools return 0 records because the plugins are installed but unpopulated. That’s correct — the tools work, the data isn’t there yet. That’s what spec 028 (Golden Config Bootstrap) is for.

The update_object test actually wrote to the live Nautobot instance — set a comment on HomeSwitch01, verified the change, reverted it. Real write, real verification, real cleanup.

What Spec 028 Does With All This

The plugin tools aren’t the end goal. They’re the API surface that enables the real workflow: bootstrapping Golden Config from live device state.

You have two Cisco 3850 switches with running configs. Those configs represent your actual network standards — NTP servers, AAA configuration, logging destinations, SNMP communities, VTY access lists. The Golden Config plugin needs Jinja templates that represent the intended version of those configs, compliance rules that define what to check, and a SoT aggregation query that feeds device data into the templates.

Setting that up manually is a full day of work. Spec 028 describes a workflow where the agent does it: collect running configs via pyATS, analyze them into compliance features, reference RFCs for best practices (RFC 5905 for NTP, RFC 5424 for syslog), generate Jinja templates with SoT variables, create the GitHub repo via GitHub MCP, register everything in Nautobot via the tools we just built, and run the first compliance check.

The agent has the tools. The tools have the API access. The spec defines the orchestration. The human reviews and approves at each step.

What I Learned

I came into this not knowing what the spec framework was. I left thinking it’s the reason the other 26 MCP servers in NetClaw actually work.

Here’s what changed my mind:

The research phase is not optional. I would have started building the entire server against GraphQL mutations that don’t exist. I would have used ip_address_assignments instead of interface_assignments. Two bugs, caught before line 1 of server code, because the spec framework made me hit the API first and write down what I found.

Tool contracts are LLM documentation. The contracts/mcp-tools.md file defines every parameter, return type, and example invocation. Those descriptions become the tool docstrings that the LLM reads when deciding which tool to call. If the contract is vague, the LLM calls the tool wrong. Writing the contract first forces you to think about the tool from the LLM’s perspective.

The task list prevents scope creep. tasks.md groups work by user story with explicit checkpoints. Phase 3 is the MVP — devices and interfaces. If I’d stopped there, I’d still have a useful server. The task list made it clear that I could ship incrementally instead of building all 30 tools before testing anything.

1,249 lines of spec, 1,463 lines of code. Nearly 1:1. That ratio felt wrong at first — why write as much documentation as code? Because the documentation is the design. When I sat down to write server.py, there were no decisions left to make. Every field name was verified. Every API pattern was documented. Every edge case was identified. I just typed.

If you’re building MCP servers for your network tools, try the spec framework. Not because it’s the “right” way to do it. Because it catches the mistakes you’d otherwise find at 2am when the agent calls a tool that returns a 400 error and you can’t figure out why.

The Numbers

	v1 (community)	v2
Tools	5	30
API	REST only	GraphQL reads + REST writes
Scope	IP addresses, prefixes	Devices, interfaces, VLANs, prefixes, IPs, cables, golden config, firewall, BGP, OSPF
Write operations	None	8 (ITSM-gated)
Plugin support	None	Golden Config (11 tools), Firewall (3), BGP (2), IGP (1)
Reconciliation	None	Live-vs-SoT interface diff
Raw GraphQL	None	Arbitrary query tool for any data model
Nautobot version	2.x assumed	3.1.0 verified
Lines of code	1,237	1,463
Lines of spec	0	1,249

More tools, broader scope, write operations, plugin coverage, reconciliation, and a raw GraphQL escape hatch — in roughly the same amount of code. The spec is what made that possible.

The nautobot-mcp-v2 server lives at mcp-servers/nautobot-mcp-v2/ in the byrn-baker/netclaw-convergence repo. The original community server is preserved at mcp-servers/mcp-nautobot/ for reference. The spec is at specs/027-nautobot-mcp-v2/. The Golden Config bootstrap workflow spec is at specs/028-golden-config-bootstrap/. All 30 tools are tested against a live Nautobot 3.1.0 instance.

The Convergence platform lives at byrn-baker/netclaw-convergence on the main branch.

Convergence

This post is licensed under CC BY 4.0 by the author.