Brownfield Ingestion

Recently Updated Python

Contents

Concept

Vendor translation layer that decouples network configuration from vendor-specific syntax. Uses LLM-powered RAG to extract network-level intent and topology relationships from vendor documentation and CLI configurations, normalizing them into a vendor-neutral topology graph model. The intermediate representation is topology-centric (protocol adjacencies, link roles, VLAN membership) rather than device-centric like YANG, enabling genuine vendor abstraction.


Architecture

The pipeline has four stages:

  1. Document ingestion: PDF/HTML vendor manuals converted to Markdown, indexed into a vector database (ChromaDB) for RAG retrieval. Dual-engine parsing — pymupdf4llm for fast extraction, MinerU for layout-aware fallback.

  2. Intent extraction: LLM + RAG extracts topology-level relationships from unstructured documentation and CLI configurations. Every extraction carries a confidence score and evidence citation.

  3. Human-in-the-loop review: Low-confidence extractions routed to a web UI for operator review. Corrections feed back to improve model accuracy.

  4. Configuration generation: Topology model compiled to vendor-specific CLI (Cisco IOS, Arista EOS). Batfish validates semantic correctness — compiled configs are simulated to verify they produce the intended forwarding behavior.

Built with Python 3.12, FastAPI, ChromaDB, LangChain.


Quick Facts

   
Status Recently Updated
Stack Python

What This Is

A network automation framework that decouples network configuration from vendor-specific syntax. It uses LLM-powered RAG to extract network-level intent and topology relationships from vendor documentation and CLI configurations, normalizing them into a vendor-neutral topology graph model inspired by AutoNetKit. The system enables cross-vendor configuration generation and validation through semantic simulation.


Core Value

Extract network-level topology relationships (protocol adjacencies, link roles, VLAN membership) from vendor-specific CLI and documentation with high accuracy, enabling truly vendor-independent network configuration.


Current Milestone: v2.0 Production-Grade Translation Layer

Goal: Become the universal, production-ready vendor translation layer for the network automation ecosystem

Target features:


Requirements


# Validated

v1.0 Core Pipeline (shipped 2026-02-22):


# Active

v2.0 (in progress):


# Out of Scope


Context

v1.0 Status (shipped 2026-02-22): Full pipeline working end-to-end. Proven that LLM-powered extraction with RAG and human-in-the-loop can successfully translate vendor CLI to/from topology IR. System validated with real-world configs. Built with Python 3.12, FastAPI, ChromaDB, LangChain, and LLM APIs (Claude/GPT-4).

Ecosystem Position: This tool is the vendor translation layer in a larger network automation ecosystem (automationarch). It consumes vendor documentation and CLI, produces topology IR that feeds into autonetkit for modeling/simulation/visualization. Complementary to (not overlapping with) tools like autonetkit-config (design/compilation), netsim (protocol simulation), and netvis (visualization).

Key architectural insight: The intermediate representation is a topology-centric graph model, NOT a device-centric model like YANG. Network-level relationships (OSPF adjacencies, BGP peerings) are genuinely vendor-independent, while device-level configuration varies wildly across vendors. This enables true vendor abstraction.

v1.0 Learnings: LLM + RAG is viable for extracting topology-level intent from unstructured documentation. Human-in-the-loop is essential to manage hallucination risks. Batfish provides semantic validation to ensure compiled configs behave correctly. Confidence scoring and evidence citation are critical for production use.


Constraints


Key Decisions

Decision Rationale Outcome
Topology-centric IR (not YANG) YANG is device-centric; network relationships are truly vendor-independent ✓ Good — enables genuine vendor abstraction
RAG + LLM for extraction Handles diverse, unstructured vendor documentation better than rule-based parsers ✓ Good — v1.0 validated with real-world configs
Batfish for validation Industry-standard network simulator, validates semantic correctness ✓ Good — v1.0 integration working, optional behind flag
Dual-engine PDF parsing pymupdf4llm fast path + MinerU layout-aware fallback ✓ Good — handles diverse PDF formats
ChromaDB vector store Lightweight, embedded, good for RAG workloads ✓ Good — fast retrieval, stable
HITL for quality assurance LLMs hallucinate; human review essential for production ✓ Good — v1.0 demonstrated viability with review UI
Confidence + evidence citation Every extraction needs confidence score and doc/config evidence ✓ Good — enables intelligent routing to HITL
Ecosystem integration focus Translation layer only, not orchestration/intent/deployment ✓ Good — clear boundaries with automationarch tools

Last updated: 2026-02-22 after v1.0 completion and v2.0 milestone initialization


Current Status

2026-03-05 — Completed 07-01-PLAN.md