AOS FAQs

What is AOS?

AOS is Intent-Based Distributed Operating System that helps network engineers rapidly and reliably design, build and operate data center networks, without any dependency on the choice of underlying network infrastructure (physical and virtual).

What can AOS do for CIOs?

  • Decouples data center network design and operations from hardware.
  • Provides log-scale improvement in networking agility, reliability and economics.
  • Removes the need for risky and costly DIY programming.

How do you use AOS?

Design

AOS gives you an interface (Web GUI or RESTful API) to declare your network service intent, i.e. desired outcome (service) at the network level, using human language — not prescribing vendor-specific device-level jargon. A vendor-agnostic intent example: Build a L3 scale-out fabric with virtualization (i.e. overlay) providing inter-rack L2 segments, for 100 servers, with 1:1 oversubscription.

Operate

Build: AOS presents you the choice of reference design templates that can meet your intent. AOS then allocates resources to your chosen template, resulting in a blueprint. Finally, AOS uses the artifacts in the blueprint to fabricate the network service configurations and telemetry expectations.

Deploy: With the push of a button, AOS deploys desired configurations (configuring resources according to the reference design).

Validate: AOS auto-validates your service expectations, executes test cases, and generates alerts and telemetry.

When you make changes dynamically — either to the physical infrastructure (add a rack, replace a switch) — or virtual network (add a VXLAN or VLAN, delete virtual network, add endpoint to virtual network), AOS implements these in an intent-driven, closed-loop manner.

When the user changes the intent, AOS implements the change and validates that the change was indeed implemented as intended.

How does AOS work?

AOS organizes your top-level intent, rendered device configurations, and real-time telemetry in a graph database. The database continuously and automatically updates all the interrelationships and dependency among data center servers and network devices (physical and virtual), based on any change (intent, config and live status) happening in the network.

AOS provides turnkey automation of the management plane where engineers spend most of their time performing manual, error-prone tasks, such as configuring BGP, creating / deleting virtual networks (VXLAN), allocating IP addresses, configuring redundant links (MLAG or vPC), upgrading devices and OSes, and monitoring events and statistics. Turnkey means AOS eliminates the need for do-it-yourself programming.

What is Intent-Based Analytics (IBA)?

Intent-Based Analytics (IBA) is automated big data analytics designed to cut data center network outages and gray failures by at least 50%. It is embedded in AOS — turnkey software to help enterprises and service providers rapidly and reliably design, build and operate data center networks.

AOS IBA brings the industry closer to the vision of vision of a Self-Operating NetworkTM.

What are the core components of AOS?

There are two main components to AOS:
  1. The AOS Server is the application with a WebGUI and API that engineers use to design, build, validate, and operate multiple spine-leaf networks.
  2. The AOS agent is lightweight software that pushes configs out to devices and collects closed-loop telemetry from devices according to the original intent, without the admin having to configure monitoring tools. Customers can choose to install on-box agents directly on network devices, or off-box agents on AOS server.

How is AOS different from other “intent-based” products?

Single-source-of-truth across lifecycle: AOS works for the entire networking lifecycle: design, build or change configuration, deploy configuration, and operate (monitor, troubleshoot, analyze).

Closed-loop: Network engineers’ intent, network configurations, and actual state (telemetry) are continuously validated by AOS in a closed-loop.

Vendor-agnostic: AOS allows you to completely decouple your services and operational model from vendor specificity. You can express your intent once, and then render and rerender detailed configurations for any vendor of your choice — without having to modify your intent.

Scalability: You might easily create an Ansible playbook for a small discrete task such as creating a VLAN trunk map across a few devices. But AOS works for all day-to-day management tasks, whether they are for 10, 100, 1,000 racks of compute, HPC, IP storage, or full spine / leaf implementations. Your operational procedure should stay the same. IBN should take care of all the plumbing needed to shuttle the data in and out of all devices, storing and correlating all data, and versioning of all racks and devices.

Is vendor-agnostic the same as multi-vendor?

No, they are not the same. Many networking tools support multiple hardware vendors, but typically require network engineers to create, maintain and debug a set of commands, scripts, playbooks, or programs that work only for a specific vendor. These playbooks or programs do not work for other vendor hardware platforms. You will have to write an equivalent set of programs for other vendors. Further, you can’t mix-and-match vendor A leaf switch with vendor B spine switch. In fact, you can’t even mix-and-match the same vendor switches if they have different OS versions or hardware models.

Vendor-agnostic allows you to completely decouple your services and operational model from vendor specificity. You can express your intent once, and then render and rerender detailed configurations for any vendor of your choice — without modifying your intent. You can render configurations for vendor A leaf switch interoperating with vendor B spine switch. You can swap out vendor A switch with an equivalent vendor B switch — all without changing your intent.

You can certainly change your intent — which can be done with a few mouse clicks — and have AOS auto-render new configurations for any vendor of your choice.

What is closed-loop telemetry?

Closed-loop telemetry is a feedback mechanism to validate and ensure network engineering intent is met. It is a new concept in networking but widely available in other control systems. For example, a thermostat allows you to set a target temperature, and continuously measures it and gives feedback to system to make sure the target is met.

In AOS, telemetry is tightly coupled with the target intent. Change of intent results in change of required telemetry that provides change of feedback. This is a critical step towards big reduction of raw random telemetry data, and mean-time-to-insights.

What vendor devices does AOS intend to support?

We currently support Cisco, Arista and Cumulus with many more under development or by request. Please refer to this data sheet for more detail.

What are the benefits of using AOS?

  • Improved service agility: Enables networking team to rapidly design, build, deploy and validate network services
  • Reduced risks: Greatly reduces human error, loss of visibility, configuration drifts, and big data telemetry dumps — fundamental sources of outages and application performance issues
  • Reduced costs: Reduces CapEx due to vendor hardware lock-in, and OpEx spent on complex, manual operations.

What can AOS do for network engineers?

  • Decouples data center network design and operations from hardware.
  • Provides log-scale improvement in networking agility, reliability and economics.
  • Removes the need for risky and costly DIY programming.
  • Allows network engineers and operators to express their high-level intent (the what).
  • Renders the intent into low-level device configuration, then validates and deploys it without human errors (the how).
  • Extracts actionable insights from raw telemetry based on the intent and network’s run-time state, to continuously ensure it does not deviate from the desired state (rendered config).
  • Allows network engineers and operators to declaratively apply Service Level Objectives (SLOs) as constraints to raw telemetry in minutes, and automatically receive actionable alerts in real-time when the SLOs are not being met.
  • Allows network engineers to safely make changes by starting with an intent, such as keeping the oversubscription unchanged while replacing an older vendor A device with a newer better vendor B device.
  • Removes the need for massive programming and integration:
    • Learn the know-how (Ansible, Python, etc.)
    • Build programming environment (e.g. backend database, front-end UI, message bus)
    • Translate design and operational procedures into vendor-specific syntax and semantics
    • Identify use case (e.g. build me a VLAN trunk map across devices)
    • Scale to more use cases (e.g. get this piece of telemetry from Finisar transceivers on these Cisco switches)
    • Version control your code
    • Debug your code
    • Document your code

What does 'intent' mean?

Intent is a network engineer’s declarative specification of desired outcome (service), describing the need for cooperative behavior of the network system infrastructure, without prescribing imperative commands to achieve it (the desired outcome). It is the single source of truth.

An intent example: “Provide connectivity to 1000 servers, using L2 and/or L3 access at the edge, with oversubscription in the core of 1:1 (no oversubscription), with endpoints such as hosts, VMs or containers grouped into isolation domains (including both traffic and address space isolation). Have some endpoints reachable via the rest of the world and some not, with policies associated with isolation domains governing both security and load balancing, with connectivity to the rest of the world via at least n links to support the external traffic and protect from possible failures.”

How does IBA work?

Apstra IBA allows network operators to specify exactly how they expect their network to operate and continuously validates their intent, generating anomalies when it detects a deviation.

IBA works across vendors and reduces the time, cost and risk of network operations by alerting the network operator of specific insights required to validate that the network is operating as intended.

With AOS IBA, network operators can quickly detect and prevent a wide range of service level violations - including security breaches, performance degradations, and traffic imbalances.

Operators use AOS IBA without having to worry about what hardware vendors they use — both established vendors and open alternatives.

How do operators use IBA?

Network operators specify, using a simple, dynamic, declarative interface, exactly how they expect their network to operate — beyond mere connectivity and including traffic patterns, performance, and tolerance for grey failures. AOS then continuously validates the network operators’ intent, simply generating anomalies when it detects a deviation.

With AOS IBA, network operators can quickly detect and prevent a wide range of service level violations - including security breaches, performance degradations, and traffic imbalances.

Contrary to the traditional big-data analytics status-quo, this approach relieves you from having to write complex low level imperative programs, which need to be integrated and constantly kept in sync.

In AOS 2.1, IBA includes predefined, turnkey probes that network operators can use out-of-the-box to:

  • detect link traffic imbalance between leaves and spines
  • detect when links are reaching saturation
  • compare East-West and North-South traffic distributions
  • detect MLAG pair traffic imbalance
  • detect interface Error/Discard Counters
  • detect interface flapping
  • compute available bandwidth between servers or switches.
In addition, IBA provides an open source catalog of IBA probe configurations to enable an ecosystem with customers, partners, and other third parties. View the catalog here.

Finally, advanced services offerings to help customers tailor IBA probes to their exact network service goals.

What alternatives exist in the market? How does AOS fit into my automation strategy?

Many people compare AOS with do-it-yourself automation tools and SDN. AOS is not just about pushing golden configs like scripting tools do. It does push golden configs too. But it does a lot more before and after config-push.

Before config-push, IBN allows you to express high-level design intent, then renders your intent into low-level device configs. It also validates config semantics so it can be pushed safely and will work. After config-push, IBN monitors devices’ operational state to ensure it does not deviate from the desired state (rendered config).

Do-it-yourself tools and scripting don’t understand the network “semantics”, don’t know how to translate your high-level intent into vendor-specific device details, or validate your intent is achieved and maintained. A script or config tool will happily take a phone number instead of an IP address from you as an input parameter and shove it down to devices.

Besides, because network config is device-specific and distributed across many devices, network problems are device-specific and entail interactions across devices (gray problems). Using scripts or server config tools to treat multiple interdependent devices amplifies the risk, by making them all go wrong at the same time.

AOS focuses on the management plane of the network because in practice, the time engineers spend on the management plane is typically much longer than on the control and data plane. Control plane is mainly about programmatically manipulating how packets should be forwarded (e.g. SDN).

They are many different management tasks such as configuring BGP, creating / deleting overlay networks (VXLAN), allocating IP addresses, configuring redundant links (MLAG or vPC), upgrading devices and OSes, and monitoring events and statistics.

These tasks involve many direct human interactions with devices 1-by-1, and are therefore highly risky and time-consuming.

AOS focuses on helping network engineers with these management tasks. Unlike compute automation tools that typically only push golden configuration out, AOS allows you to describe what service outcome you want, then renders / validates / deploys network-wide configurations safely, and notifies you when the live network state deviates from your intent.

AOS is therefore unique from server automation and SDN, and should be funded as a critical element of your automation strategy.