By Frank Brockners
What Cloud-Native Really Means for Operators and Why the Internet of Agents Can Propel Progress
As a board member of the Linux Foundation Networking (LFN), I was recently invited by the LFN team to join a panel discussion at the Mobile Europe “Becoming a Cloud-Native Telco” event. In preparation for the discussion and afterward, I collected a few thoughts that I would like to share here.
The panel was moderated by Tiffany Amorós (SJP Businessmedia) and featured Philippe Ensarguet (Orange), Kai Steuernagel (Deutsche Telekom), Inderpreet Kaur (Omdia), and me. You can watch the discussion here: “Becoming a Cloud Native Telco 2025 – How close are operators to becoming cloud-native?”
____________________________________________________________________________________________________
The telecommunications industry stands on the brink of a profound transformation. As consumer expectations rise and technologies like 6G, AI, and immersive experiences reshape the connectivity landscape, operators are rethinking how networks are built and operated. “Cloud-native” has become the rallying term for this transformation – expanding from its original meaning.
In telecom, cloud-native is not just about running virtualized functions in containers managed by Kubernetes. It signifies a comprehensive operational and architectural overhaul. The drive to lower total cost of ownership (TCO) has motivated operators to pursue greater automation and reduced human intervention for more than a decade. While there’s been significant progress, true transformation is still limited to a few beachhead deployments. Now, AI might prove to be the game changer.
AI acts as both a driver for infrastructure and architectural evolution (“networking for AI”) and a catalyst for solving deeply rooted operational challenges (“AI for networking”). The latter is closely linked to the emerging concept of the Internet of Agents (IoA), which offers a new paradigm for cross-domain automation and decision-making.
Redefining “Cloud-Native” in the Telco Context
In enterprise IT, cloud-native typically focuses on microservices, DevOps, and container orchestration. For telecom, it goes much deeper. Operators must account for the high-performance, mission-critical nature of networks delivering voice, data, video, and AI-powered services.
At its core, telco cloud-native principles draw from the 12-factor app design model —resilience, self-healing, security, automation, separation of execution and state, and full observability across the entire stack. The aim is ambitious: to enable agile, responsive, and scalable operations that reduce TCO.
To use the by now classic “pets and cattle” analogy: The shift demands abandoning the “pets” approach for network virtual functions (VNF) — where every VNF is treated as a custom-configured, highly customized asset — in favor of a “cattle” model, where services are designed to scale and recover automatically, with minimal human intervention.
Progress So Far
Achieving a truly cloud-native system—a resilient, fully automated service delivery platform—requires transformation on two fronts: the infrastructure that runs, secures, connects, and monitors virtual network functions (VNFs) – in which case they are referred to as cloud-native network functions (CNFs), and the VNFs, i.e., CNFs themselves.
From an infrastructure standpoint, the industry has made significant progress. Operators are deploying compute, storage, and networking resources while adopting Kubernetes as a foundational middleware layer. This shift enables a rebalancing of responsibilities between the application/CNF and the underlying infrastructure/Kubernetes. Functions such as load balancing or firewalling can be moved from the application layer to the infrastructure layer, where they can be handled by Kubernetes and associated CNI plugins—for example, Cilium. Conversely, AI may enable a shift in the opposite direction, moving certain responsibilities from the infrastructure layer back to the application layer. We’ll explore that further below.
Several large-scale deployments have validated what’s possible. For example, T-Mobile US, in partnership with Cisco, launched the world’s largest cloud-native converged core gateway — supporting both 4G and 5G with unprecedented scale and flexibility. Saudi Telecom Company (STC) showcased a resilient, cloud-native network powering an esports gaming event — one of the most demanding use cases in terms of latency and reliability.
Remaining Challenges
Despite these advancements, high-performance networking continues to be one of the hurdles in realizing fully cloud-native telecom environments.
Although VNFs are increasingly containerized, i.e., are CNFs, many still behave like static, stateful workloads. In effect, some still resemble “virtual machines inside containers,” lacking true cloud-native characteristics. These workloads require “pet-like” treatment—not just during operations, but also in testing and qualification. One ends up deploying a dedicated controller infrastructure for CNFconfiguration and operation – in parallel and completely decoupled from Kubernetes.
Additionally, operators and vendors often face challenges due to the tight coupling between CNFs and specific Kubernetes distributions or CNI plugins. CNFs frequently depend on a particular Kubernetes variant and a specific CNI plugin. These challenges are compounded by the fact that operators’ deployments are typically multi-vendor, requiring the integration of components from different suppliers across the ecosystem. The Linux Foundation Networking community is tackling these challenges through initiatives like Anuket and CNTi.
Moreover, Kubernetes, for all its flexibility, was never designed for high-performance networking. Most traditional applications (e.g., web apps or databases) are CPU-bound and don’t require multi-interface support or deterministic latency. Telecom, however, demands precisely that.
To compensate, the industry has adopted pragmatic approaches. Multus CNI allows Kubernetes pods to have multiple network interfaces. Multus acts as a “meta-plugin”, a CNI plugin that can call multiple other CNI plugins. For control plane traffic, Cilium (built on eBPF) provides observability, policy enforcement, and eliminates the need for external firewalls—offloading those functions to the network fabric itself as already mentioned above.
However, Cilium still relies on the Linux kernel helper functions for many packet processing operations, which limits forwarding performance. For data-plane traffic requiring high throughput and low latency, the industry uses SR-IOV and FD.io/VPP for kernel-bypassed, user-space packet forwarding, see e.g., the FD.io CSIT test reports. This enables high throughput but sacrifices many Kubernetes-native benefits like flexible workload migration, service discovery, and dynamic IP management. SR-IOV delivers raw performance—at the cost of manageability. See above: One requires a dedicated SDN controller to manage networking which is not integrated with workload management.
Bridging this gap remains an open challenge, especially as evolving user behavior, new applications, and emerging use cases continue to test existing assumptions and implementations. For instance, the rise of Fixed Wireless Access (FWA) in the U.S. has prompted a significant shift in how packets are processed and forwarded, compared to traditional mobile networks. Meanwhile, with the growing adoption of AI and agent-based systems, network traffic patterns are shifting—from simply delivering data to users, to enabling data exchange between AI agents and supporting distributed LLM deployments for inference and training.
Emerging Solutions for Flexible Networking in Kubernetes
Recognizing the limitations, the community is developing more robust, native solutions. One early initiative was the collaboration between FD.io and project Calico to create Calico-VPP—a high-performance, VPP-based CNI supporting multiple Kubernetes networks. It merges telco-grade networking with cloud-native principles.
Calico-VPP Multi-Network can be seen as a forerunner to broader efforts like Kubernetes SIG-Network Multi-Network (design document, meeting minutes of the working group), which aims to extend Kubernetes to natively support multi-interface, multi-network use cases—critical for telecom and other complex environments.
Combining such networking capabilities with flexible service chaining using Segment Routing over IPv6 (SRv6, RFC 8986) opens new possibilities—e.g., Leveraging Cilium and SRv6 for Telco Networking.
The AI Factor: Changing the Game for Cloud-Native Networks
AI is reshaping telecom—not just in terms of what networks deliver (“networking for AI”) but in how they function (“AI for networking”).
Like CNFs, AI workloads require high I/O, low latency, and deterministic performance. Training and inference jobs are especially sensitive to tail latency. Groups like the Ultra Ethernet Consortium (UEC) are addressing these requirements by optimizing networking for AI workloads. The hope is that the demands of “networking for AI” will accelerate the effort to make networking a first-class citizen in Kubernetes.
Returning to the previously mentioned rebalancing of functionality between the infrastructure and AI layers: With AI and agentic concepts such as long-term memory, CNFs can manage their own state more intelligently. This enables a clear separation between code execution and state/memory, making CNFs more disposable in the cloud-native sense. By shifting state management from the infrastructure layer (e.g., the SDN controller) to the application layer (i.e., the CNF itself), Kubernetes can natively handle CNF scheduling and resiliency — furthering cloud-native integration.
The convergence of telecom and AI is accelerating, as shown by initiatives like the AI-RAN Alliance. Imagine a radio network that learns and optimizes itself in real time—this is no longer hypothetical.
Companies like T-Mobile, Cisco, NVIDIA, Mitre, ODC, and Booz Allen Hamilton are collaborating on AI-native wireless networks for 6G, aiming to unify radio processing and AI inference to boost spectral efficiency and reduce operational costs.
In March 2025, Jio Platforms (JPL), with AMD, Cisco, and Nokia, launched the Open Telecom AI Platform at Mobile World Congress. This platform integrates AI technologies—from LLMs to domain-specific ML—into telecom operations, accelerating automation and innovation.
Completing the Vision: Agentic AI and the Path to End-to-End Cloud Nativeness
Let’s take a step back. Even if we overcome the technical hurdles in Kubernetes and CNF infrastructure, another challenge remains: the human factor.
Over the past decade, teams have made impressive progress in automating their own domains. But for cross-domain tasks—like end-to-end service delivery or cross-domain troubleshooting—human intervention is still required.
Telecom operations are typically fragmented across multiple teams, each using their own preferred tools and interpretations of data—that is, their own data schemas. Software architectures and data schemas often reflect the structure of the organization itself, a phenomenon known as Conway’s law. A commonly suggested solution is to migrate toward a unified schema and toolset. While this may be feasible for some, it has proven challenging for large and diverse organizations that frequently rely on multi-vendor solutions.
Agentic, generative AI offers a promising new approach. It has the potential to assist—and eventually even replace—humans in many operational roles, enabling fully automated, machine-controlled workflows. AI can automatically derive schemas, models, and dependencies from data, relieving humans of the tedious and error-prone task of schema definition. Agentic AI systems can even generate knowledge graphs and derive ontologies. They can also enable AI agents to collaborate and reason like humans—using natural language as a common abstraction across organizations and domains.
I explored this topic in a short blog earlier this year: Rather than consolidating data interpretation into newly defined schemas, we shift toward an existing abstraction: natural language. Much like experts convening in a “war room” to address a cross-domain issue, AI agents can team up and engage in dialogue. Human experts typically begin by examining the initial issue, forming hypotheses about potential causes, exploring various data sources across domains, and seeking evidence to either reject or refine these hypotheses. AI agents can follow the same reasoning process—eliminating the need for humans to gather in a physical or virtual “war room.”
Striking the right balance between deterministic, schema-based operations and probabilistic, data-driven processes using agentic AI remains an open question. What may start as simple assistance for Level 1 troubleshooting could evolve into a comprehensive solution that automates end-to-end service provisioning, change management, and operations—as agentic AI systems mature, large language models evolve, and legal considerations around machine decision-making in sensitive operator environments are addressed.
Much like human teams, AI agents will be distributed across domains and tethered to specific datasets, operational contexts or services. Abstractions such as the Model Context Protocol (MCP) make monetization APIs—like those defined by Project CAMARA—accessible to these agents, simplifying system- and service-level integration. Real-world implementations of truly cloud-native systems will therefore be inherently distributed. This is where the Internet of Agents (IoA) enters the picture.
IoA enables a distributed system of intelligent, autonomous AI agents that can manage and coordinate operations across layers and domains without human input.
Today’s automation is largely rule-based and domain-specific. Even within a single domain, critical workflows such as change management often rely on human decision-making. True autonomy—where systems adapt, reason, and collaborate across organizational boundaries—requires intelligent agents that act like humans, can make contextual decisions and dynamically interact with one another.
Open-source collectives like AGNTCY.ORG are laying the groundwork for this future, developing open frameworks that enable agents to discover, secure, observe, and coordinate across distributed environments. This paradigm shift may finally allow telecoms to reach TMForum’s Levels 4 and 5 of network autonomy, replacing human orchestration with intelligent, agent-based systems. And by simplifying system and service integration, with agents acting on top of APIs like those defined by Project CAMARA – and removing the need for tedious API-level wiring between systems, it has the potential to create new revenue opportunities for telecoms monetizing their network and the associated services.
Open-source collectives like AGNTCY.ORG are laying the groundwork for this future by developing open frameworks that enable agents to discover, secure, observe, and coordinate across distributed environments. This paradigm shift could finally enable telecoms to achieve TMForum’s Levels 4 and 5 of network autonomy, replacing manual orchestration with intelligent, agent-based systems. By simplifying system and service integration—allowing agents to operate on top of APIs like those defined by Project CAMARA or TMForum and eliminating the need for tedious API-level wiring—it also opens the door to new revenue opportunities for telecoms looking to monetize their networks and related services.
A New Operating Model for Telecom Is Taking Shape
Fully embracing “cloud-native” and becoming “AI-native” is not merely a technical shift—it’s a redefinition of the telecom operating model.
Operators are beginning to think holistically: from infrastructure and orchestration to network functions and AI. They’re embracing the performance imperatives of modern networking while preparing for a future where intelligent agents—not humans—drive secure service provisioning, orchestration, and operations end to end.
And that future is arriving faster than most expect.
This blog reflects insights and contributions from many colleagues and peers. Special thanks to Philippe Ensarguet, Kai Steuernagel, Chase Wolfinger, Jerome Tollet, Brian Meaney, Ravi Guntupalli, Alberto Donzelli, Andre Czerwinski, Pablo Gonzalez Ornia, Guillaume De Saint Marc, Nathan Skrzypczak, and Ranny Haiby.