Expanding your edge applications? Consider how orchestration and monitoring help resilience
July 24, 2022
This article is originally published on EdgeIR. We are republishing it here.
Organizations are moving their focus to edge computing after leveraging the operational benefits and flexibility of cloud computing. This shift is driven by the need to run computationally intensive tasks closer to devices that are generating data; doing so reduces the latency and network bandwidth costs associated with cloud processing.
This type of distributed edge computing requires a complex hierarchy of application stacks and infrastructure that have smaller footprints and require more data processing power. Ensuring guaranteed delivery of services and adhering to a lower latency constraint for time-sensitive use cases is challenging for IT teams and software architects, though.
There are various deployment models of edge computing to adapt for different uses; the deployment of edge computing will have different architectures for data center providers, mobile network operators, communication services providers, corporate IT, industrial IoT and healthcare facilities. With each use case, the number of edge nodes and distance from the central cloud varies based on how edge architecture is going to be implemented. But despite different scales, the challenges of management of such distributed computing platforms and services remain common.
Whatever the size of the deployment, enterprise technical teams need to think strategically about their architecture by considering all aspects of the environment.
Managing Kubernetes in edge operations
We know Kubernetes as a leading container orchestration platform that is being widely used to handle both containers along with virtual machines in modern data centers. In the edge deployment scenario, Kubernetes is a vital platform that supports the management and orchestration of distributed workloads.
There are factors that come into play when Kubernetes is being implemented in cloud-edge architectures where Kubernetes clusters are distributed on edge points. One is how different tools or solutions and processes can be used to manage the application workloads in multi-cluster deployments. Another is how managing and operating network edge networks gets more complicated and costly as Kubernetes edge deployments increases. While considering DevOps principles in orchestrating the edge workloads and platform components, these factors overlap as development and IT teams are focused on a common goal of smoother delivery of services and their management. In this process, the focus of development teams is on process pipelines and the IT team cares about the management of clusters and network optimizations.
In this case, a common, centralized abstraction layer is required to help with deploying new clusters, and with lifecycle management of platforms and applications on top of it.
Zero-touch provisioning, automation
Once we get into distributed edge computing architecture, the number of required edge nodes varies from case to case. A small-scale deployment may require 10 to 50 edge nodes. Extensive deployments may require hundreds or thousands of edge nodes. Every edge node is equipped with applications running on top of infrastructure environments like an OS, platforms, and the like, as well as specific edge hardware and network resources that provide the needed performance.
For IT and development teams, it is difficult to manually deploy, configure, upgrade and manage compute-intensive AI/ML applications, platforms and system firmware. Zero-touch and automation will be required in this case to remotely handle the configuration and management of every component in the edge nodes.
In the case of large-scale and telco edge deployments, a central controller is implemented to control the far edge or cell sites. A central controller can govern the hundreds of connected nodes. This adds an additional layer enabling a zero-touch and automated solution for centralized deployment and lifecycle management of all nodes.
Taking computing closer to devices has several benefits. But when it comes to recovering from a failure of any components in edge nodes, it can be difficult to address issues from longer distances. Every component or layer in the edge node needs to have self-healing capabilities. Looking at the number of edge nodes, it can be paramount for IT teams to set up a configuration of every component in the edge node to recover from any hardware or software failures.
Hardware and software accelerators for maximum performance
Various devices served by the edge nodes are supposed to produce a large volume of data sets which will be processed by CPUs, memory, and I/O units at edge nodes. This huge chunk of data is driven by the technologies like AI/ML, and AR/VR technologies associated with 5G and Industry 4.0 applications. To support modern latency-sensitive use cases, edge nodes need to process huge data sets almost in real-time. To achieve this, scaling memory and I/O bandwidth are critical factors that can lead to enabling hardware acceleration at the storage and networking layers for optimizing data flow at the system level. This need has led to the evolution of dedicated accelerators focused on the processing of AI, ML, and deep learning workloads. Acceleration technologies leverage FPGA, SmartNICs, and GPUs that can be integrated at the access (RAN DUs and CUs) and network edge. These accelerators play a significant role in accelerating infrastructure components and for processing AI/ML workloads.
Observability & Monitoring
Edge nodes can run diverse applications and will also run on diverse hardware resources from different vendors. In such cases, once edge nodes are deployed, it becomes important to keep track of the health of infrastructure and services/applications on top of it. A diversified edge environment may introduce interoperability and application runtime issues and security vulnerabilities. Monitoring is important because it enables teams to see critical issues and hunt them down quickly. An organization needs a strategy for implementing observability and monitoring techniques to address these issues in real-time.
In the edge computing model, several edge nodes loaded with services can be scattered geographically. These services are accessed by devices that connect to edge nodes for computing and data transmission requirements. An important challenge is raised when lacking discoverability within services that communicate in a distributed environment.
A similar challenge has been observed in microservices when monolithic applications are fragmented into small services and seek information from other running services. Using DNS can be the solution that can help find the closest instances of edge services. This is already a popular service discovery solution that Kubernetes is natively supporting. Considering the services are deployed in containers and orchestrated using Kubernetes, the Kubernetes internal DNS service helps resolve IP addresses by mapping a certain service’s name to its IP address, easing the finding of services by other pods.
Small-scale enterprises and organizations have started evaluating and implementing edge architectures due to the need to support innovative use cases to increase efficiency and/or enhance their customer experience. But there is still a long way to go for enterprise edge deployments as there are more challenges in large-scale distributed computing. Consideration of design principles such as provisioning and automation, resilience, security, and observability are needed, along with technology such as an orchestration platform that can help bring enterprises into the edge computing era.