Building Convergence: A Journey from Network Observability to AI-Driven Automation
TL;DR: Convergence is an open-source, AI-driven network automation platform built with a monitoring-first mindset. I’m vibe coding the whole project using AI agents and prompt-based iteration to construct it, while the first several phases focus entirely on observability (OpenTelemetry, Grafana, VictoriaMetrics) before the AI agents take center stage. Reliable automation only exists when the data underneath it is reliable.
Building Convergence: A Journey from Network Observability to AI‑Driven Automation
Part 1: Why I’m Building Monitoring First
What if you could ask your network what was wrong—and get a real answer? Not a vague alert, not a dashboard full of noise, but an explanation grounded in evidence. What if the network could catch problems early, or even correct them, before users ever noticed?
That idea isn’t science fiction anymore. The tooling exists. The standards exist. What’s usually missing is discipline.
This series documents my attempt to build Convergence, an AI-driven network observability and automation platform, from the ground up, and I’m vibe coding the entire thing. Instead of traditional line-by-line development, I’m leaning heavily on natural-language prompting and iterative AI assistance while deliberately starting with the boring foundation: monitoring.
Because if the monitoring is wrong, everything built on top of it is wrong too.
The Problem: Automation Built on Assumptions
I’ve watched plenty of network automation projects fail over the years. Rarely because the idea was bad. Rarely because the tooling was incapable. They failed because the automation had no trustworthy feedback.
Automation without observability is like driving with your eyes closed. You might stay on the road for a bit, but eventually you’re going to hit something—and you won’t know why.
The uncomfortable truth is this:
You cannot automate what you cannot observe.
Before any system is allowed to make decisions, it needs:
- Reliable data it can trust
- Clear baselines for what “normal” looks like
- Alerts that actually mean something
- Historical context to reason over
- A known‑good operating state
Without those, AI doesn’t make automation smarter. It just makes mistakes faster.
The Monitoring‑First Approach
Convergence is planned as a multi‑phase build. Some of this may change. Some of it may fail entirely. That’s part of the point. But the intent is clear: no intelligence before visibility.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Phase 1: Infrastructure Foundation
└─ nautobot, Grafana, VictoriaMetrics, OpenTelemetry Collector
Phase 2: Telemetry Pipeline
└─ Syslog, SNMP, streaming telemetry, pipeline hardening
Phase 3: Dashboards & Baselines
└─ Visualization, alerting, understanding normal behavior
Phase 4: AI Agent Foundation
└─ LangGraph agents, tool interfaces, nautobot integration
Phase 5: AI‑Enhanced Discovery
└─ Topology mapping, onboarding, inventory validation
Phase 6: Configuration Intelligence
└─ Drift detection, compliance, controlled remediation
Phase 7: Autonomous Operations
└─ Predictive analysis, self‑healing, multi‑agent workflows
I don’t expect this to be fast. I don’t even know how far I’ll get. What I do know is that skipping steps here would guarantee failure later.
Architecture: Observation Before Action
At its core, Convergence is built around a simple idea: telemetry is the source of truth for reality, and nautobot is the source of truth for intent. Everything else sits on top of those two pillars.
1
2
3
4
5
6
7
Devices ──▶ OpenTelemetry Collector ──▶ VictoriaMetrics
│ │
│ ▼
└───────────────▶ Grafana ◀──────── nautobot
▲
│
AI Agents (Phase 4+)
AI agents don’t enter the picture until the system is already stable, observable, and boring. That’s intentional.
Why OpenTelemetry Collector
The OpenTelemetry Collector is the backbone of this platform, and it earns that role.
It allows me to collect syslog, SNMP, traps, streaming telemetry, and metrics through a single, vendor‑neutral pipeline. More importantly, it lets me process and enrich that data before it ever lands in storage.
That means:
- Normalized telemetry across vendors
- Metadata enrichment from nautobot
- Filtering and rate‑limiting at the edge
- Backpressure when things go sideways
If the telemetry pipeline can’t be trusted, nothing downstream matters.
nautobot as Intent and Context
nautobot provides the context raw telemetry lacks. It knows what a device is supposed to be, where it lives, and how it fits into the network.
By tying nautobot into the telemetry pipeline, syslog messages and metrics stop being anonymous numbers. They gain meaning, site, role, function, ownership. That context becomes critical once AI agents start asking questions like “is this behavior expected?”
Storage and Visualization
VictoriaMetrics was an easy choice here. It’s fast, efficient, and built for scale without becoming fragile. Grafana sits on top as the primary interface for humans.
Before any automation happens, I want to be able to answer basic questions confidently:
- What changed?
- When did it change?
- Is this actually a problem or just noise?
If I can’t answer those manually, an AI agent certainly shouldn’t be trusted to answer them autonomously.
Value Before AI
One of the intentional outcomes of this approach is that the platform is useful long before AI shows up.
Phases 1 through 3 deliver a full‑featured observability stack on their own. Many teams would stop there and already be better off than where they started.
The AI layer doesn’t replace that value, it builds on it. Only once the system understands reality does it make sense to let software take action.
What’s Next
The next post focuses on Phase 1: standing up the infrastructure and getting the first real metrics flowing, containers, databases, and a telemetry pipeline that actually works. No AI. No magic. Just making sure the foundation is solid enough to build on.
Because the most impressive automation in the world is useless if it can’t tell you when it’s wrong.
The best time to build monitoring was before you needed it. The second‑best time is now.
