This blog is the third in a series highlighting Apstra’s participation in Networking Field Day 19.
We had a great time at NFD19. As I was preparing for how to present Intent-Based Analytics to the world, I spent a lot of time contrasting how networks are operated traditionally and how this changes with an Intent-Based Networking approach.
Today there is an obsession on visuals; people are mesmerized by graphical representations and are accustomed to seeing lots of them. But with larger networks, ever-growing amounts of available telemetry data and complexity of the overall network fabric, the screen based approach with limited set of metrics is simply not enough. Open-ended general exploration of metrics is tempting but has limited value in general as it is manual, insufficient, and ineffective.
Instead we need to focus on finding key invariants in a network design and continuously validate those in real-time to assess the true health of the network. These invariants also come from learnings you derive from analyzing service outages. Typically these invariants are manually verified by humans in the process of diagnosing a service outage. For example, if virtual machines have connectivity problems, we start by looking at the VLAN that is used and verifying that VLAN is indeed trunked on the fabric side. Intent-Based Networking can automate this verification using an Intent-Based Analytics (IBA) probe, the same way it can automate the configuration of your network.
The operational model of Intent-Based Analytics in an Intent-Based Network implementation can be summarized with the acronym “SEE”, which stands for Select, Enrich, and Extract.
- Select what telemetry data you want to collect from which network elements in your fabric
- Enrich collected telemetry data with additional context available from intent or operational state
- Extract knowledge by analyzing context rich telemetry data
Let’s look at an example to illustrate how this works. The application team is experiencing issues with SQL server VMs and they claim the network is the problem. You, as the network admin, choose to start by monitoring all interface error metrics that affect SQL VMs to get a sense of any issues in the network. Without intent, how do you select interfaces on leafs facing servers to collect interface error metrics, buffer queue metrics, and so on? Even if you managed to get this from tribal knowledge or manual entry, how do you keep that up to date? How do you know which VMs are behind which leaf interfaces? Can you now raise anomalies on only those interfaces that are affecting SQL VMs?
With VMware vSphere vCenter integration, Apstra Operating System (AOS) discovers the virtual inventory and combines that into the Apstra Graph Datastore by correlating the underlay infrastructure nodes like servers to virtual infrastructure nodes like hypervisors, and physical NICs. With a single source of truth you can select ingestion of various metrics for only those interfaces on leafs that carry SQL VM traffic, enrich this with the list of VMs behind each leaf interface, and extract knowledge by raising anomalies when metric values cross thresholds on a sustained basis. This knowledge extraction pipeline is posted via REST with JSON payloads allowing you to create dynamic, real-time, Intent-Based data pipelines that are automatically in sync with changes in the network intent or the operational state of physical or virtual infrastructure. In the above example, if new SQL server VMs are created, the probe will automatically ingest counters from corresponding leaf interfaces in the network and vice versa.
You can choose to export output of any Intent-Based Analytics stage to an external endpoint using Google Protocol Buffers (an illustration for this is showcased in our GitHub project aosom-streaming).
There are many other complex but insightful Intent-Based Analytics probes that help detect grey failures and prevent service outages and/or significantly reduce their mean time to resolution (a.k.a. MTTR). Here are a few examples:
- Raise anomalies on high traffic imbalances between member interfaces of all MLAG (multi-chassis link aggregation groups) interfaces in the fabric
- Similarly detect ECMP imbalances within the fabric or between fabric and external routers
- Measure the ratio of north-south vs east-west traffic and ensure it meets your usage intent
- Monitor packet loss, measure latency using active ping tests as a way to measure and prove Service Level Agreements (aka SLAs)
- Many more examples you can view in our public catalog of IBA probes
You can also automate the remediation for anomalies identified by Intent-Based Analytics probes. A sample illustration of this using ServiceNow was recently demonstrated as part of our NFD19 presentation.
To summarize, Intent-Based Analytics is the Intent-driven, dynamic, real-time, knowledge extraction engine that provides the foundation for more insightful network operations. This is an essential part of an overall Intent-Based Data Center Automation approach that significantly reduces the cost of network operations.