Performance Simulator

Recently Updated Rust

Contents

Concept

Network Flow Simulator uses analytic queuing models and Monte Carlo simulation to evaluate network performance without packet-level discrete event simulation. Given a topology and traffic demands, it pushes billions of flow iterations through queuing models in seconds, identifying congestion bottlenecks probabilistically and projecting capacity headroom across carrier-scale networks (100k+ nodes).

The core tradeoff: sacrifice per-packet fidelity for orders-of-magnitude speed improvement. An M/G/1 queuing model with Pollaczek-Khinchine mean value analysis produces utilization, delay, and loss estimates that are analytically exact for the modeled traffic class — and runs in seconds where a packet simulator would take hours.


Usage

# Run a Monte Carlo simulation
netflowsim simulate --topology network.graphml --iterations 1000

# Compare queuing models side by side
netflowsim compare --topology network.graphml --models mm1,md1,mg1-pareto

# Generate routing matrix from FIBs
netflowsim generate-routing --fibs routing-tables/

# Run N-1 failure analysis
netflowsim n1-analysis --topology network.graphml

# Generate report with all analysis modules
netflowsim report --config simulation.json

Configuration is driven by JSON config files (--config), with CLI flags overriding config values, and defaults filling the rest.


Quick Facts

   
Status Recently Updated
Stack Rust

Core Value

netflowsim provides rapid, massive-scale network performance analysis by using analytic queuing models and Monte Carlo simulations instead of packet-level discrete event simulation. It enables network engineers to validate topologies and routing strategies against billions of flow iterations in seconds, identify bottlenecks probabilistically, test network resilience under failure scenarios, and project capacity headroom for carrier-scale networks (100k+ nodes).


Primary Objectives

  1. Performance: Utilize Rust and Rayon to maximize multi-core hardware utilization.
  2. Scalability: Handle massive carrier-scale topologies (100k+ nodes) via Petgraph and efficient data structures.
  3. Decoupling: Clearly separate the Routing Matrix generation (packet-sim logic) from the Flow Simulation (queuing logic).
  4. Visibility: Provide high-performance geographic visualization via MVT/Martin with multi-region support.

Milestones

See .planning/MILESTONES.md for full milestone history.


Current Milestone: v2.1 Performance Optimization

Goal: Reduce memory footprint, runtime overhead, and restore throughput while maintaining all v2.0 features.

Target features:


Current State

Version: v2.0 (shipped 2026-03-01) Codebase: 14,813 lines of Rust (+ from v1.1) Tech Stack: Rust, Petgraph (StableGraph), Rayon, Serde, Plotters, Criterion, approx, kolmogorov_smirnov, statrs CLI Commands: simulate, compare, generate-routing, report, n1-analysis

Shipped Features (v1.0 + v1.1 + v2.0):

Performance Characteristics (100k nodes, 1000 iterations, all v2.0 features):

Known Limitations:

User Feedback Themes:

Technical Debt:


Requirements


# Validated (v1.0)


# Validated (v1.1)


# Validated (v2.0)


# Active (v2.1)


# Active (Future Work)


# Out of Scope


Tech Stack


Ecosystem Context

This project is part of a seven-tool network automation ecosystem. netflowsim provides flow-based traffic analysis — the “analyze” stage of the pipeline.

Role: Validate network capacity and performance at scale using analytic queuing models and Monte Carlo simulation. Consume topologies and traffic demands from topogen; consume FIBs from netsim for path tracing.

Key integration points:

Architecture documents:


Key Decisions

Decision Rationale Outcome Status
Recursive path tracing with ECMP Handles multi-path routing correctly Works well, cycle detection robust ✓ Good
Interface-to-link resolution via subnet matching Automates FIB-to-topology mapping Eliminates manual configuration ✓ Good
Nearest-rank percentiles Avoids interpolation complexity Simple, robust, accurate ✓ Good
Node bottleneck scoring: 1.0 - ∏(1-p) Captures “at least one incident link congested” Identifies aggregate hotspots ✓ Good
StableGraph for dynamic mutations Enables runtime topology changes Zero breaking changes to earlier phases ✓ Good
Separate schemas for static/dynamic Different modes track different metrics Clean separation, documented limitation ✓ Good
Warn for Pareto α ≤ 2 (infinite variance) Retains user flexibility for heavy-tailed exploration Allows analysis with caveats ✓ Good
Automatic CV² calculation via distribution methods Eliminates manual input errors Correct queuing theory application ✓ Good
Deterministic seeded traffic for comparison mode Ensures fair cross-model results Reproducible performance comparisons ✓ Good
Config-first merge (config → CLI → defaults) Deterministic merge order for errors Clear validation feedback ✓ Good
Dual persistence (embedded + sidecar run_config) Self-contained results + easy extraction Perfect reproducibility ✓ Good
Additive v1.1 schema with serde defaults v1.0 backward compatibility Seamless version migration ✓ Good
[v2.0] DHAT profiling via feature flag Avoids production overhead while enabling allocation tracking No runtime penalty, profiling when needed ✓ Good
[v2.0] Tick(u64) as unified time index Single time axis works across Monte Carlo and dynamic modes Simplifies time-series collection ✓ Good
[v2.0] Adaptive sampling + fixed point budget Min-interval plus change-triggered emission with downsampling Prevents memory explosion at scale ✓ Good
[v2.0] Bottleneck threshold util>=0.80 Industry standard threshold balances sensitivity with actionability Effective bottleneck detection ✓ Good
[v2.0] Manual Pearson correlation (~30 LOC) Avoids external dependency bloat (linfa/polars) Lightweight, maintainable ✓ Good
[v2.0] Adaptive max_lag using 2× median interval Prevents false positives in correlation analysis Robust causality detection ✓ Good
[v2.0] Manual linear regression (least squares) Avoids linfa/polars for simple use case ~40 LOC, no external deps ✓ Good
[v2.0] RegionLocalityConfig with locality_factor (0.0-1.0) Controls same-region traffic bias Flexible geo-distributed patterns ✓ Good
[v2.0] Graceful degradation for nodes without region_id Enables parallel plan execution and backward compatibility Seamless v1.x → v2.0 migration ✓ Good
[v2.0] LatencyZone enum for WAN links Type-safe representation prevents invalid values Clear inter-region link classification ✓ Good
[v2.0] Optimized pairwise coverage (30 scenarios vs 540) Keeps validation time under 40 minutes Comprehensive validation without exhaustive tests ✓ Good
[v2.0] Document failures honestly SCALE-02 and SCALE-03 marked FAILED based on empirical evidence Transparent performance characteristics ⚠️ Revisit (v2.1 optimization)
[v2.0] Accept partial phase goal achievement 1/3 performance targets met (runtime), 2/3 failed (memory, throughput) Functional completeness prioritized over perf ⚠️ Revisit (v2.1 optimization)
**[v2.0] Arc for immutable flow fields** Eliminates 300k allocations in Monte Carlo hot path Improved baseline performance ✓ Good
[v2.0] HashMap::with_capacity() for known sizes Eliminates reallocation overhead at scale Reduced allocation churn ✓ Good
[v2.0] Iteration-specific seed derivation (base_seed + index) Deterministic parallel execution without rayon hooks Reproducible parallel Monte Carlo ✓ Good
[v2.0] Restart-with-seed resume semantics Simpler than incremental, avoids rayon hooks Post-execution checkpoints complete — Pending (mid-execution deferred)

Last updated: 2026-03-01 after v2.1 milestone start


Current Status

2026-03-08 — tests pass)