Model-Driven DevOps for NetOps: Transforming DoDIN Cyber Operations with Network Infrastructure as Code (IaC)
By Andrew D. Stewart
April 21, 2022
Andrew Stewart will be speaking about Model-Driven DevOps for NetOps at HammerCon 2022 on May 19, 2022 - be sure to get your tickets here to hear him and our other exciting speakers before tickets run out!
Introduction
Model-driven Dev-Ops [1] represents a game-changing digital transformation approach for NetOps to deliver enhanced network infrastructure orchestration, optimization, agility, flexibility, and resiliency – the result: a DevOps-Driven Mission Intent-Based Infrastructure. Just as agile DevOps efforts transformed application development and created more responsive and timely mission outcomes, DevOps for NetOps is a critical next step in meeting today’s and, more importantly, tomorrow’s mission-driven demands. Adopting this approach in our culture and our engineering approach to NetOps will enable military cyber professionals to finally begin operating the network like a mission platform.
Why now? With the capabilities and power enabled by modern software defined networking (SDN), it is possible to leverage greater abstraction of the network infrastructure as a collection of Application Programming Interfaces (APIs). Applications can now render the network into infrastructure-as-code (IaC). Simultaneously, with industry implementation of “digital twin” technology, the real-time digital model counterpart of complex physical objects, has dramatically transformed and accelerated manufacturing design-to-production capabilities and improved complex entity simulations and insights. Using this same approach for a network infrastructure, enables the establishment of a “network infrastructure digital twin” to enhance the ability to develop, test and dynamically deploy critical infrastructure updates, changes, and optimizations at scale. DevOps for NetOps is not the goal; enabling DevOps for the network to enable Mission Transformation is The Goal.
Adopting this approach is likely as much a (if not greater) cultural challenge than it is a technical one. Military cyber professionals must understand the technology and break down the cultural and technical impediments that prevent the adoption of realizing all the potential of IaC. In fact, continued technology advances will accelerate the ability and need to deliver dynamic “mission-intent-based infrastructure” to support all domain command and control mission outcomes while, simultaneously, the Department of Defense (DoD) seeks more transformation from DevOps-driven application development – NetOps must respond!
The Future is Now
The network is fundamental to connect users, devices, applications, data, and services no matter where they reside—from edge to cloud; however, much of network administration has not changed meaningfully in 30 years. As digital services are delivered more frequently through adoption of DevOps for software development that focuses on services or applications, gaps and weaknesses are quickly identified in the supporting hybrid cloud network infrastructure. Network operators face increasing pressure to move faster – often at the sacrifice of fundamental, scalable network architecture and security best practices – while at the same time, they are being held responsible to help mitigate risks and respond to threats. This challenge demands a cultural shift – requiring a DevOps mindset inclusive with network infrastructure.
The demand for new features and faster delivery of services, has driven the need to develop software and applications faster—thus, driving the rapid virtualization and “cloudification” of IT infrastructure. Failure to transform to a DevOps approach for network infrastructure aligned with the Continuous Integration/Continuous Deployment (CI/CD) process is not an option. A model-driven DevOps approach enables network operators to maneuver the network at machine speed through a deliberate process which: 1) Encapsulates the network as a data model; 2) Renders a data model of the network into a “digital twin;” 3) Enables repeatable synthetic testing; and 4) Provides the means to automatically deploy network changes (employ network maneuver) at machine speed in response to increasing application-driven data demands, evolving mission needs, and following the Cyber Commander’s Intent — fighting in and winning the day in Cyber. This vision must be understood so that the impediments to change can be addressed.
What is it?
DevOps is often used as a term to describe a specific outcome. However, it is really an evolving organizational strategy used to deliver better value and mission outcomes. In context of this paper, DevOps will be described as a combination of culture, tools, and processes aimed at: accelerating delivery of new services, improving the scale of services, improving the quality of services, and lowering risk when done deliberately. A model-driven DevOps approach is a structured way to enable network automation at scale built on data models. This has been proven to address challenges and perceptions
around complexity, standardization, and manual operations. Most APIs are driven by data models, but the most common model-driven APIs for network devices use the Network Configuration (NETCONF) protocol with Yet Another Next Generation (YANG) data models. NETCONF pushes the data models encoded in Extensible Markup Language (XML) over a secure transport layer and gives us several operational advantages over command line interface (CLI), including:
• Install, manipulate, and delete methods for configuration data
• Multiple configuration data stores (e.g., candidate, running, startup)
• Configuration validation and testing
• Differentiation between configuration and state data
• Configuration rollback
To undertake a DevOps approach to NetOps by leveraging these capabilities and the power of abstraction, there are five key fundamentals that must be explored and understood in greater detail: Automation, Source of Truth (SoT), APIs, Infrastructure as Code (IaC), Continuous Integration/Continuous Deployment (CI/CD).
Automation
Accelerating speed of execution and network maneuver becomes extremely challenging without automating processes that traditionally involve human interaction. When applied to network infrastructure, it is common to start with basic change requests such as adding an access control list (ACL) or modifying a domain name system (DNS) server. These tasks typically require operators to log in to individual devices and interact with the CLI, implementing the change on the device itself. When these changes are made directly on the device, the individual device becomes the SoT containing a unique configuration. When every device in the production environment contains data unique to that device, it is extremely challenging for the NetOps team to manage. To automate network infrastructure, it is necessary to centralize the SoT and remove errors inadvertently induced by multiple human-to-machine communications.
An initial step towards automation is centralizing configurations using a Source Code Manager (SCM). This is a very low impact task that alleviates the risk of losing a device and its configuration. Snapshots of all devices can be automatically taken at a chosen time interval and placed in the repository. If there are any issues due to failure or inaccurate configuration change, services can be quickly restored based on a previous working state. However, it is important to note that backups are a snapshot in time, and this does not prevent administrators from making changes directly on devices. If work is done between scheduled backups, there will be a period where the environment is out of sync and the SoT will be lost.
Source of Truth (SoT)
To centralize the SoT and maintain consistency with software developers, one can take advantage of existing SCM such as Git for storing configurations. Using an SCM allows network operators to track changes, keep backups, and collaborate with others in a manageable way. Not only must code be stored and managed centrally, but operators must transition from modifying configurations directly on devices and make changes through the SCM. Once changes are reviewed, they can be deployed with the SCM keeping an audit log and allowing rollback if any issues occur. Regardless of how far along a NetOps team is in their automation journey, using an existing SCM for basic code or configuration management has a very low impact on operations but provides tremendous benefit . There are three (3) fundamental disciplines of an SCM process that must be implemented to enable the safe management of the lifecycle of configurations in a collaborative environment: Version Control, Change Logging, and Branches – implementing these processes requires organizational discipline that must embedded in culture. A well-disciplined SCM process can enable multiple administrators with the ability to track and manage all changes centrally.
APIs
To further enable automation, operators must begin moving away from using CLI and begin interacting with devices through their API. An API is a way for two applications to interact with each other. In contrast, CLI is a standard way for humans to interact with an application to retrieve data or make configuration changes. Continuing to manage network configurations through manual methods (CLI), directly conflicts with DevOps principles. By managing using the CLI, operators by default are creating deviations from the desired network state. Moving forward, committing changes through an SCM and having them deployed programmatically through APIs enables scale and consistency. As introduced, APIs provide operational advantages over CLI and are a critical component of model-driven DevOps through the use of data models . APIs are how the key/value pair data described by models is passed to a device. The API software on a device takes the data and, using the data model to decipher that data, configures the various components of the device in the way that the manufacturer of that device intended. When the network infrastructure is treated as a set of APIs, configuration consists of moving data, generally in the form of JSON or XML, between those APIs. This brings network operations more in-line with the rest of cloud and application development and away from the traditional human-optimized CLI interaction.
Infrastructure as Code (IaC)
IaC is the process of rendering the provisioning and configuration of infrastructure as code. The goal of automation was to remove the administrator’s need to interact manually with anything, allowing machine to machine communication. It is necessary to present and deploy the network device configurations in a format that machines understand natively. Due to the large amount of data necessary for a network device to function, using a data-model is the most effective way to represent infrastructure as machine readable code. A data model is simply a way to organize and structure data. In this light, network automation is more a problem of data management and movement than it is of
“pure coding.” Data models can represent all possible configuration in a network. Therefore, all configuration changes can be accommodated with a similar flow. Instead of having a playbook or script for every possible type of change (e.g., DNS, NTP, Interface, BGP, etc.), one set will suffice. In the scope of model-driven DevOps, the code is comprised of the code for the automation tooling (e.g., Ansible Playbooks, Jinja templates, Python code) and the textual “code” (e.g., YAML files, JSON files) that contain the data that describes the network. As described above, the data is referred to as the SoT .
When the infrastructure is rendered as code, much like software, several key advantages can be realized:
• Representing infrastructure as code enables the ability to leverage common version control tools, such as Git, to track changes, keep backups, and collaborate with others in a manageable way.
• The formats used to render infrastructure as code are designed to be human readable, allowing operators to quickly make changes by editing a file stored centrally. This prevents operators from directly interacting with devices.
• Describing infrastructure as code makes it repeatable and predictable. Operators can provision the infrastructure many times—knowing it will be deployed the same way, every time. This is a critical advantage as changes are tested and reviewed before being deployed into production.
Continuous Integration/Continuous Deployment (CI/CD)
While DevOps is a philosophy for removing friction between Development and Operations, Continuous Integration/Continuous Deployment (CI/CD) is a specific application of DevOps principles focused on improving the reliability and improving the speed of change in the environment. This has been challenging when applied to network infrastructure due to its dependency on physical devices and potential business disruption.
Continuous Integration (CI) is the process of continually integrating changes made to applications, services, or infrastructure into a “main” or up-to-date branch. Building on the notion of infrastructure as code, automation is used to instantiate a copy of the application, service, or infrastructure in a test environment and run a series of unit or functional tests anytime a change is detected. If all tests pass, then the change is integrated into the current “main” branch, otherwise, it is rejected.
Continuous Deployment (CD) takes the process one step farther and, upon successful completion of all unit and functional tests, automatically deploys the change into production. For new applications or services, CI is usually enabled first and then, once a certain level of comfort is reached that the process is functioning correctly and that the testing is comprehensive, CD is enabled.
Challenges
If the reasons to adopt DevOps for NetOps are so strong, then why are DevOps processes not already being more widely applied to better deliver mission-driven applications, services, and infrastructure ? Although much focus around DevOps is being applied to applications, most NetOps teams are still operating the same way they have been for the last 30 years. The reasons for this fall into several challenges that require exploration in detail.
As with most challenges in the cyber domain, the challenges for implementing DevOps for Network Infrastructure span both cultural and technical. There are six (6) broad challenges that act as impediments to implementing a DevOps approach to implement Infrastructure as Code (IaC):
• Complexity
• Keeping Things Running
• Lack of Standardization
• Understanding of the SoT
• Flexible Testing Environment
• The Human Element
Complexity
Networks are more complex and interconnected than applications. A single network element (e.g., switch, router, firewall) can support the operation of hundreds of systems while providing users with stable connections to these systems. A change to the individual network element can cause a much larger disruption in services when compared to a single application. NetOps teams are, rightly so, risk averse. Due to the large potential impact, automation, when not applied correctly, introduces risk. Operators unfamiliar with a DevOps approach, new capabilities, and the CI/CD process are more likely to reject an automated approach as a perceived means to reduce risk. Ironically, by taking a DevOps approach, programmability leverages wider abstraction which actually reduces complexity and serves to reduce risk by providing enhanced configuration control and management.
Keep Things Running
Nobody will argue that the manual provisioning of devices often leads to situations where infrastructure quickly becomes outdated and less resilient over time. This approach also impedes immediate patching and the ability to address security vulnerabilities in a timely manner. Yet, it is hard to make investments for the future when there are very real upfront costs and the future return on investment is not clear. DevOps falls into this category. It is not just a change in operating model, it is a change in skillset, culture, tools, and processes. Unfortunately, it is common today to focus only on “keeping things running” and, given how much it tends to cost to keep things running using today’s ridiculously labor-intensive operating model, it starts to become clear why there is very little left over for investment in a new operating model.
Lack of Standardization
Because most network services are based on embedded systems, the APIs needed to configure them have taken longer to develop and standardize. As discussed, to programmatically interact with network devices through an API, data models are commonly used. To date, there is no industry standard data model for network infrastructure. In large multi-vendor environments, such as the DoD, creating data models per vendor, per device becomes challenging to maintain and continue to scale. However, this obstacle is now being overcome with greater abstraction and the use of network element drivers (NEDs) this challenge can be overcome. NEDs communicate over the native protocol supported by the device, such as NETCONF, Representational State Transfer (REST), XML, CLI, and Simple Network Management Protocol (SNMP). These drivers are rendered based on a YANG data model which provides significant agility and multi-vendor flexibility.
Understanding of the SoT
The unique sets of data or key/value pairs needed to configure a typical network device is significantly larger than that of an application or system being managed. A large router or firewall can require thousands of individual pieces of information that make up the device specific configuration. It is a frequent practice to distribute these key/value pairs across several systems, such as IPAM (IP Address Management), CMDB (Configuration Management Database), text files, etc. This makes the information difficult to access programmatically. To centralize the SoT and maintain consistency with software developers, NetOps teams should take advantage of existing source control managers (SCM) such as Git for storing configurations. Using an SCM allows network operators to track changes, keep backups, and collaborate with others in a manageable way. Not only must code be stored and managed centrally, but operators must transition from modifying configurations directly on devices and make changes through the SCM. With sound SCM processes, once changes are reviewed, they can be deployed with the SCM keeping an audit log and allowing rollback if any issues occur. Regardless of how far along a network operations team is in their automation journey, using an existing SCM for basic code or configuration management has a very low impact and provides tremendous benefit.
Flexible Testing Environment
Testing network infrastructure still has dependence on physical hardware, which prevents the ability to rapidly test different topologies or configurations. Even as network functions become virtualized, the challenges of fully representing a network virtually still exist. The dependance on hardware for programmatic test and validation can make full automation difficult. Application CI/CD is typically easier because the testing environment is virtual. Cloud infrastructure can be created on demand where network infrastructure still relies on hardware. Having a valid test environment for an enterprise network usually comes with a higher cost. In addition, an issue with an application typically impacts the users of only that application. A network outage has the potential to disrupt a large portion of users and applications that rely on a stable network. This means the ability to test and validate network changes before deploying them into production significantly reduces risk – this is a key outcome of realizing the full Continuous Integration (CI) / Continuous Deployment (CD) for NetOps. Simulating and testing is critical for DevOps and testing needs to be automated so that it is done regularly and rigorously. The success of DevOps is highly dependent on the coverage and accuracy of testing. If errors are not found before they are released, the resultant unstable system is likely to do more damage than the bottleneck that is being addressed.
The Human Element
The tools, processes, and vocabulary used in DevOps can look intimidating to somebody who does not come from a software development background – which is true for many network operators. For example, version control systems, automation languages, programming languages, APIs, data formats, and build servers are all topics that a typical software developer already understands. Unfortunately, these topics are still somewhat foreign to the typical NetOps cyber professional who generally may not have access/exposure to this type of training but may also be discouraged to undertake studies in these areas. Perhaps even more challenging for most military cyber professionals, understanding of these tools and capabilities are beyond the technical understanding of leadership who must support, nurture, and cultivate the skills and community culture for their teams to succeed. Culture change that helps the community address this skills gap is the number one challenge to solve to help realize change.
Realizing Change and Transforming the Mission
By committing to following a DevOps Roadmap and understanding the supporting DevOps for NetOps fundamentals, NetOps teams must re-evaluate how they operate network infrastructure – today! The physical network cannot be the bottleneck for digital mission transformation – it must be an enabler. Applying a DevOps Roadmap for network infrastructure can be undertaken in five (5) deliberate steps that are aligned with the CI/CD process:
Architecture - Build architecture focusing on standardization
Simulation – Simulate architecture as a virtual twin
Automation – Automate deployment in the simulated environment
Testing – Create / Validate deployment tests in the simulation
Deployment – Use Automation to deploy into production
Realizing change means changing the old mindset and creating a new culture of thinking. Previously, most have thought of the network itself as the SoT. By embracing a DevOps approach, the SoT of a network is embodied in the central repository, or digital twin, of all information that is needed to configure the network to a desired state. With that view in mind and moving into the future, all network operations are transformed into a push of SoT data into a device in whole or in part. Although many NetOps teams fundamentally know or understand this, moving to a model-driven approach is a hard, but necessary, transition to make. Once accepted, viewing all automation operations as simply a push of data from the SoT into the infrastructure simplifies the IaC approach. Further, by leveraging CI/CD principles to properly test and validate changes to infrastructure before they are made in production, network operators can enjoy all the possibilities and benefits of a model-driven DevOps approach to move at scale and speed. Embracing this approach in the culture of NetOps teams is a must to move forward.
Ultimately, moving to a DevOps approach for infrastructure improves network fighting agility/maneuverability and speeds new capabilities to the edge while helping mitigate threats/risks—enabling network operations to occur at machine speed and keep pace with evolving cyberspace operations. With the understanding of these fundamentals, realizing a transformation in NetOps is attainable through the rational implementation of new policies, processes and, above all, driving culture change. NetOps cannot be the reason to slow mission transformation – it must enable it.
References
[1] Carter, Steven and King, Jason. Model Driven DevOps. 1st ed. Hoboken, NJ: Pearson Education, 2022.
About the Author
Andrew D. Stewart is a National Security and Government Senior Strategist at Cisco Systems, Inc. He has been with Cisco for the last 4 years after retiring from almost 30 years in the U.S. Navy where he last served as the Chief of Operations for Fleet Cyber Command/U.S. TENTH Fleet. Andy also served as the Commanding Officer and Program Manager for the Navy Cyber Warfare Development Group (NCWDG). He is a graduate of the Sellinger School of Business, Loyola University Maryland and the Cybersecurity and Policy Executive Program from the Harvard Kennedy School. He is also a graduate from the Naval Postgraduate School Monterey, CA, the United States Naval Academy, the National Defense University, and the Naval War College