Building Convergence – A Journey from Network Observability to AI-Driven Automation Part 8: Breaking the Anthropic Dependency: How We Made Our AI NOC Team Provider-Agnostic

Posted Apr 3, 2026

By Byrn Baker

13 min read

Phase 7 gave us a team of AI agents that could monitor a home network better than any dashboard. Five specialists — NOC officer, network engineer, security expert, NAS engineer, interface reconciler — each with their own tools, their own system prompts encoding real domain expertise, and their own agentic loops querying VictoriaMetrics, Loki, Nautobot, and pfSense every five minutes.

There was one problem: every single agent had import anthropic at the top of the file.

The Anthropic Problem

It wasn’t just an import. Every agent had its own copy of the same 40-line while-True loop: create an AsyncAnthropic client, call messages.create() with tools, check stop_reason, handle tool_use blocks, build tool_result messages, repeat. Six agents plus the supervisor meant seven copies of this loop, all hardwired to Anthropic’s message format — content blocks, tool_use types, block.id, the whole wire protocol.

This created three real problems:

No fallback. When the Anthropic API key ran out of credits at 6 PM on a Friday, every agent returned 0 findings. The entire NOC team went blind. Not degraded — blind. There was no way to say “if Anthropic fails, try something else.”

No flexibility. We had a perfectly good Ollama instance sitting on the LAN with a 5070 GPU. The threat-intel and automation-agent services already supported Ollama (Phase 6 added that). But the net-ops-team — the service that makes the most LLM calls, with the most complex multi-turn tool-use conversations — was Anthropic-only.

Vendor lock-in in the worst place. The tool-use wire format is the stickiest part of any LLM integration. Anthropic uses content blocks with type: "tool_use" and type: "tool_result". OpenAI uses tool_calls arrays with function objects and separate role: "tool" messages. Ollama speaks OpenAI’s format through its /v1/chat/completions shim. Switching providers meant rewriting every agent’s conversation loop — not just swapping an API key.

The Unified LLM Client

The fix was a single module — llm_client.py — that sits between the agents and whatever LLM backend is configured. The agents define their tools in Anthropic’s schema format (because that’s what they were already using) and call two functions:

        
      
# For the poll cycle — runs the full tool-use loop
findings = await run_agentic_loop(
    system=_SYSTEM_PROMPT,
    tools=_TOOLS,
    user_message="Run your analysis.",
    handle_tool_call=_handle_tool_call,
    findings=[],
    caller="network_engineer",
)

# For Discord questions — same loop, returns text
answer = await run_agentic_question(
    system=_SYSTEM_PROMPT,
    tools=_TOOLS,
    question=question,
    handle_tool_call=_handle_tool_call,
    caller="network_engineer",
)

That’s it. The agent doesn’t know or care whether it’s talking to Anthropic, a local Ollama instance, Ollama Cloud, OpenRouter, or a vLLM deployment. The client handles:

Schema translation. Anthropic tool definitions (name, description, input_schema) get converted to OpenAI function-calling format (type: "function", function.parameters) when talking to non-Anthropic providers. Multi-turn conversations with tool results get translated bidirectionally.
Automatic fallback. If the primary provider fails — rate limit, auth error, connection refused — the client tries the next available provider in the chain. Ollama → Anthropic → OpenAI, or whatever order you configure. Every failure is logged with the provider name and error so you can see exactly what happened.
The agentic loop itself. The while-True pattern that was duplicated seven times is now in one place. Build assistant content blocks, execute tool calls, build tool results, repeat until end_turn. One implementation, tested once, used everywhere.

The result: 475 lines of duplicated code removed, 280 lines of shared abstraction added. Every agent file got simpler. The security expert went from 100 lines of Anthropic-specific loop code to a 5-line function call.

Ollama Cloud: The GPU Problem Solved

We had a 12GB RTX 5070 running Ollama locally. The best model that fit was qwen3.5:9b — a capable model, but when five agents hit it concurrently with complex multi-turn tool-calling conversations, the results were… garbage.

The 9b model completed a poll cycle with 7 findings and 0 escalations. The NOC officer said “all devices healthy.” The security expert recommended checking pfBlockerNG. Generic, surface-level stuff. It missed that SNMP monitoring was actually broken on both switches.

Ollama has a cloud offering. Same API, same local Ollama instance, but models with a -cloud suffix get offloaded to Ollama’s infrastructure. You pull the model like any other — ollama pull qwen3-coder:480b-cloud — and your local Ollama transparently routes inference to their servers.

The qwen3-coder:480b model on the same poll cycle: 12 findings, 8 escalations. It found the SNMP failures. It found that the threat-intel service was returning 503s. It found that the automation agent’s API path was wrong (404). It found that SSH authentication to both switches was failing because the credentials weren’t configured. It identified the pattern — multiple monitoring systems being down simultaneously — as itself a critical finding.

Same agents, same tools, same system prompts. The only difference was the model behind the API. The 480b didn’t just find more things — it found the right things, and it understood why they mattered together.

The Bugs the Big Model Found

This is the part that surprised us. The 480b model didn’t just produce better analysis — it exposed real bugs in the code that the 9b had been silently working around.

Wrong API paths. The security engineer was calling /api/v1/threats/summary on the threat-intel service and /api/v1/actions/pending on the automation agent. Neither endpoint exists. The correct paths are /api/report and /api/automation/pending. The 9b model got 404 errors from these tool calls and just… moved on. Reported generic findings without the data it was supposed to have. The 480b model recognized the 404 as a critical infrastructure failure and flagged it.

Missing SSH credentials. The .env file had NETWORK_USERNAME=cisco and NETWORK_PASSWORD=cisco, but the switch SSH tool reads from SWITCH_SSH_USER and SWITCH_SSH_PASS — which were empty. The interface reconciler was silently failing on every SSH connection. The 9b model reported this as a generic warning. The 480b model reported it as CRITICAL with the exact error message and remediation steps.

Missing services. When we were iterating on the LLM changes, we’d been running docker compose up -d net-ops-team threat-intel automation-agent — just the three services we were modifying. Grafana, the OTEL collector, Promtail, Alertmanager, and NetClaw were all down. The 480b model noticed. The 9b model didn’t.

After fixing all the bugs the 480b found, the next poll cycle came back clean: 3 INFO findings, 0 escalations. NOC officer: all devices operational. NAS engineer: both Synology devices healthy (40°C, all disks normal, RAID volumes intact). That’s the signal you want — not “I didn’t find anything because I couldn’t reach the data sources,” but “I checked everything and it’s fine.”

Credentials and the Cloud

When you’re sending prompts to a cloud-hosted model, you need to think about what’s in those prompts. Our agents use tools that query internal infrastructure — VictoriaMetrics, Loki, Nautobot, pfSense, Cisco switches via SSH. The tool code uses credentials to authenticate. The tool results get sent back to the LLM as context for the next turn.

The question is: can those results contain credentials?

The answer is yes, in edge cases. An HTTP error from Nautobot might include the request headers (which contain Authorization: Token <nautobot_token>). A pfSense XML-RPC failure message says “check PFSENSE_XMLRPC_PASS.” A netmiko SSH exception might reference the connection parameters.

We added a sanitizer that runs on every tool result before it enters the LLM context. It catches API tokens, Anthropic keys, Discord bot tokens, long hex strings (like Nautobot tokens), and credential environment variable names. It does not scrub IP addresses, MAC addresses, or interface names — the model needs those to do its job.

We considered going further — anonymizing internal IPs, replacing device names with tokens, summarizing raw data instead of forwarding it. We decided against it. The security expert needs to correlate 192.168.100.23 in a NetFlow record with the same IP in the DHCP lease table and the Nautobot inventory. Anonymizing that breaks the reasoning chain. The interface reconciler literally writes VLAN10 | mylaptop | 192.168.3.42 to switch port descriptions — you can’t anonymize the data it needs to produce.

The practical security posture: credentials never touch the LLM. Network topology does, because the model can’t do its job without it. For a home/SOHO network with RFC 1918 addresses, that’s an acceptable tradeoff. For a regulated enterprise, you’d want the sensitivity-aware router that forces high-sensitivity agents to a local model — but you’d also want a GPU that can run a model worth routing to.

The NetClaw Submodule

A smaller cleanup that was overdue: NetClaw was a full git clone sitting inside the project directory, gitignored, with its own .git/ history. 17MB of someone else’s repo, not tracked, not reproducible. Anyone cloning Convergence wouldn’t get it.

We converted it to a proper git submodule. git clone --recurse-submodules now pulls everything. git submodule update --remote netclaw pulls upstream changes when you want them. The pinned commit is tracked in the repo, so builds are reproducible. The Dockerfile lives at docker/netclaw.Dockerfile — in our project, not in the submodule we don’t control.

Audit Logging

Every LLM call now gets an audit log entry:

caller=security_expert provider=ollama model=qwen3-coder:480b-cloud cloud=True
prompt_chars=12847 tools_called=['get_threat_intel', 'query_netflow'] latency_ms=3420 stop=tool_use

This tells you: which agent called, which provider handled it, whether data left the machine, how much data was in the prompt, what tools the model invoked, and how long it took. When you’re running five agents concurrently against a cloud model, this visibility matters.

The cloud field is the interesting one. It’s True for Anthropic, OpenAI, and any Ollama model with -cloud in the name. It’s False for local Ollama models. If you ever need to audit what data went where, this is the trail.

What’s Different Now

The net-ops-team service went from 7 files with import anthropic to 0. Every agent talks through llm_client.py. The provider is a runtime configuration choice, not a code dependency.

The fallback chain means the NOC team doesn’t go blind when one provider fails. Ollama Cloud gives us access to 480B-parameter models without needing the GPU to run them. The credential sanitizer means API keys and passwords never reach the LLM context. The audit log means we can see exactly what’s happening.

The .env file is chmod 600 now. Small thing. Free thing. Should have been that way from the start.

And the agents are producing genuinely useful output. Not “check your pfBlockerNG settings” — but “your SNMP monitoring is broken on both switches, your threat intel service is returning 503s, your automation agent API paths are wrong, and SSH authentication is failing because the credentials aren’t configured.” That’s the difference between a 9B model running locally and a 480B model running in the cloud, given the same tools and the same system prompts.

The system prompts didn’t change. The tools didn’t change. The architecture didn’t change. We just gave the agents a better brain and made sure they could reach it reliably.

Closing the Loop: From Recommendations to Actions

With the LLM abstraction working and the 480b model producing real findings, we hit the next problem almost immediately. The security expert was finding genuine threats and posting Discord messages like this:

🟡 WARNING — pfSense-FW01 [RECOMMENDATION] Block High-Risk Attacker IPs Action: Add firewall rules on pfSense-FW01 to block inbound traffic from IPs identified as critical/high threat level in threat intel: 5.187.35.26, 79.124.62.230, 78.128.114.42…

And this:

🔴 CRITICAL — all [RECOMMENDATION] Investigate Host 192.168.100.130 Action: Perform forensic analysis on host 192.168.100.130. Review its network activity for signs of compromise, especially any recent connections to or from the newly blocked malicious IPs.

The security expert was doing its job — finding threats, analyzing them, producing actionable recommendations. But it was posting those recommendations to Discord and hoping a human would act on them. Meanwhile, the automation-agent service — the one that actually knows how to block IPs on pfSense, with dedup, rate limiting, GAIT audit trail, and Discord approval — was sitting there running its own separate poll cycle, completely unaware of what the security expert had found.

The pieces existed. They just weren’t wired together.

submit_block_action

We added a new tool to the security expert: submit_block_action. When it finds a high-confidence threat (composite_score >= 80, is_known_bad_actor=true), instead of posting a recommendation to Discord, it POSTs the IP directly to the automation-agent’s new /api/automation/submit endpoint.

The IP enters the same pipeline as scheduler-discovered threats: dedup check, rate limit, LLM action proposal, approval gate (auto-approve if score >= 95, Discord approval otherwise), pfSense execution via XML-RPC, and GAIT audit trail. Every step committed to an immutable git branch.

The security expert still uses recommend_action for things that can’t be automated — policy changes, service hardening, manual forensic analysis. But for “block this IP” — the most common recommendation — it now acts instead of advising.

investigate_host

The other gap was the “investigate 192.168.100.130” recommendation. The security expert found suspicious outbound traffic from an internal IP but had no way to identify what device it was. It would post a CRITICAL finding saying “perform forensic analysis” and leave the human to figure out which device, which switch port, which MAC address.

We added investigate_host. When the security expert finds suspicious internal traffic, it calls this tool with the IP. The tool queries pfSense DHCP leases (IP → MAC → hostname), the ARP table (fallback), and Nautobot’s interface inventory on both switches (MAC → physical port).

The security expert gets back: IP, MAC address, hostname, which switch, which port, and the port description. Its finding now says “Host 192.168.100.130 (mac: aa:bb:cc:dd:ee:ff, hostname: desktop-PC, connected to HomeSwitch01 Gi1/0/24 [VLAN10 | desktop-PC | 192.168.100.130]) was observed making outbound connections to known C2 infrastructure” — not just “investigate this IP.”

The System Prompt Change

The key change was in the security expert’s system prompt. We added:

ACTION PROTOCOL — when you find a threat, DO NOT just recommend blocking. Take action:
- For high-risk IPs: use submit_block_action
- For suspicious internal hosts: use investigate_host
- Use recommend_action ONLY for things that cannot be automated

This is the difference between an AI that generates reports and an AI that operates infrastructure. The security expert now finds the threat, investigates it, and initiates the response — all within the same poll cycle, with the same safety controls (approval gates, rate limits, audit trail) that were already in place.

The Convergence platform is a home network AI observability project. Phase 8 source code lives in services/net-ops-team/app/llm_client.py and the updated agent files in services/net-ops-team/app/team/.

Ideas or homelab war stories? Find me on X @byrn_baker or Linkedin @byrnbaker .

Code: https://github.com/byrn-baker/Convergence/tree/feature/netclaw-integration

Convergence

This post is licensed under CC BY 4.0 by the author.