Building Convergence: A Journey from Network Observability to AI-Driven Automation

Posted Feb 6, 2026

By Byrn Baker

5 min read

TL;DR: Convergence is an open-source, AI-driven network automation platform built with a monitoring-first mindset. I’m vibe coding the whole project using AI agents and prompt-based iteration to construct it, while the first several phases focus entirely on observability (OpenTelemetry, Grafana, VictoriaMetrics) before the AI agents take center stage. Reliable automation only exists when the data underneath it is reliable.

Building Convergence: A Journey from Network Observability to AI‑Driven Automation

Part 1: Why I’m Building Monitoring First

What if you could ask your network what was wrong—and get a real answer? Not a vague alert, not a dashboard full of noise, but an explanation grounded in evidence. What if the network could catch problems early, or even correct them, before users ever noticed?

That idea isn’t science fiction anymore. The tooling exists. The standards exist. What’s usually missing is discipline.

This series documents my attempt to build Convergence, an AI-driven network observability and automation platform, from the ground up, and I’m vibe coding the entire thing. Instead of traditional line-by-line development, I’m leaning heavily on natural-language prompting and iterative AI assistance while deliberately starting with the boring foundation: monitoring.

Because if the monitoring is wrong, everything built on top of it is wrong too.

The Problem: Automation Built on Assumptions

I’ve watched plenty of network automation projects fail over the years. Rarely because the idea was bad. Rarely because the tooling was incapable. They failed because the automation had no trustworthy feedback.

Automation without observability is like driving with your eyes closed. You might stay on the road for a bit, but eventually you’re going to hit something—and you won’t know why.

The uncomfortable truth is this:

You cannot automate what you cannot observe.

Before any system is allowed to make decisions, it needs:

Reliable data it can trust
Clear baselines for what “normal” looks like
Alerts that actually mean something
Historical context to reason over
A known‑good operating state

Without those, AI doesn’t make automation smarter. It just makes mistakes faster.

The Monitoring‑First Approach

Convergence is planned as a multi‑phase build. Some of this may change. Some of it may fail entirely. That’s part of the point. But the intent is clear: no intelligence before visibility.

Phase 1: Infrastructure Foundation
   └─ nautobot, Grafana, VictoriaMetrics, OpenTelemetry Collector

Phase 2: Telemetry Pipeline
   └─ Syslog, SNMP, streaming telemetry, pipeline hardening

Phase 3: Dashboards & Baselines
   └─ Visualization, alerting, understanding normal behavior

Phase 4: AI Agent Foundation
   └─ LangGraph agents, tool interfaces, nautobot integration

Phase 5: AI‑Enhanced Discovery
   └─ Topology mapping, onboarding, inventory validation

Phase 6: Configuration Intelligence
   └─ Drift detection, compliance, controlled remediation

Phase 7: Autonomous Operations
   └─ Predictive analysis, self‑healing, multi‑agent workflows

I don’t expect this to be fast. I don’t even know how far I’ll get. What I do know is that skipping steps here would guarantee failure later.

Architecture: Observation Before Action

At its core, Convergence is built around a simple idea: telemetry is the source of truth for reality, and nautobot is the source of truth for intent. Everything else sits on top of those two pillars.

Devices ──▶ OpenTelemetry Collector ──▶ VictoriaMetrics
   │                                     │
   │                                     ▼
   └───────────────▶ Grafana ◀──────── nautobot
                          ▲
                          │
                    AI Agents (Phase 4+)

AI agents don’t enter the picture until the system is already stable, observable, and boring. That’s intentional.

Why OpenTelemetry Collector

The OpenTelemetry Collector is the backbone of this platform, and it earns that role.

It allows me to collect syslog, SNMP, traps, streaming telemetry, and metrics through a single, vendor‑neutral pipeline. More importantly, it lets me process and enrich that data before it ever lands in storage.

That means:

Normalized telemetry across vendors
Metadata enrichment from nautobot
Filtering and rate‑limiting at the edge
Backpressure when things go sideways

If the telemetry pipeline can’t be trusted, nothing downstream matters.

nautobot as Intent and Context

nautobot provides the context raw telemetry lacks. It knows what a device is supposed to be, where it lives, and how it fits into the network.

By tying nautobot into the telemetry pipeline, syslog messages and metrics stop being anonymous numbers. They gain meaning, site, role, function, ownership. That context becomes critical once AI agents start asking questions like “is this behavior expected?”

Storage and Visualization

VictoriaMetrics was an easy choice here. It’s fast, efficient, and built for scale without becoming fragile. Grafana sits on top as the primary interface for humans.

Before any automation happens, I want to be able to answer basic questions confidently:

What changed?
When did it change?
Is this actually a problem or just noise?

If I can’t answer those manually, an AI agent certainly shouldn’t be trusted to answer them autonomously.

Value Before AI

One of the intentional outcomes of this approach is that the platform is useful long before AI shows up.

Phases 1 through 3 deliver a full‑featured observability stack on their own. Many teams would stop there and already be better off than where they started.

The AI layer doesn’t replace that value, it builds on it. Only once the system understands reality does it make sense to let software take action.

What’s Next

The next post focuses on Phase 1: standing up the infrastructure and getting the first real metrics flowing, containers, databases, and a telemetry pipeline that actually works. No AI. No magic. Just making sure the foundation is solid enough to build on.

Because the most impressive automation in the world is useless if it can’t tell you when it’s wrong.

The best time to build monitoring was before you needed it. The second‑best time is now.

Homelab, Virtualization, SaaS

network-automation observability monitoring AI OpenTelemetry Grafana VictoriaMetrics nautobot LangGraph network-monitoring telemetry DevOps NetOps infrastructure open-source containerlab OTEL time-series network-operations automation-platform

This post is licensed under CC BY 4.0 by the author.