Kubernetes in the retail industry

How a $4 billion retail giant built an enterprise-grade Kubernetes platform

Elkjøp, the largest electronics retailer in the Nordics, built an internal Kubernetes platform that is now successfully hosting over 200 production-grade microservices to increase development speed—without compromising security or visibility.

Retail revolution powered by microservices

With over 400 retail locations and 12,000 employees across Norway, Sweden, Finland, Denmark, and franchises in Iceland, Greenland, and the Faroe Islands, Elkjøp is the largest electronics retailer in the Nordics. It also has a large e-commerce presence in all these markets.

Although reliant on technology to power its point-of-sale (POS) systems, historically, the IT department was mostly focused on integrating third-party products and externally developed solutions. Five years ago this strategy changed and the team honed in on developing microservices to provide shared functionality between systems and increase development velocity.

Initially, Elkjøp hosted these microservices in individual Azure Web Apps, but as the environment grew a new approach was needed. “Azure Web Apps is a great platform for simple systems, but when you start having 70 or 100 copies of web apps it becomes hard to manage and expensive,” said Henry Hagnäs, Elkjøp’s Cloud Solution Architect.

On top of this, Elkjøp was about to start an extensive project called “Next-Generation Retail” that would put even more pressure on microservices. Next-Generation Retail’s purpose was to replace the 20-year-old POS system, with a modern, flexible and scalable system that runs on a microservice architecture.

The indispensable service mesh

The team started the migration by dockerizing and deploying applications onto Kubernetes. But they quickly realized that they lacked the metrics and insight needed to assess performance. Additionally, since they terminated TLS at the ingress controller, all communication between the applications was unencrypted. They needed to solve both problems—and quickly.

To gain visibility into service health and encrypt all service-to-service communication, Hagnäs and his team chose Linkerd, the lightweight, ultra-fast Cloud Native Computing Foundation (CNCF) service mesh.

Linkerd injects an ultra-lightweight “micro-proxy” as a sidecar for each application. The proxy can offload many cross-cutting concerns such as end-to-end encryption, provide valuable metrics, and give insight into service to service communication—precisely the problems the team needed to solve.

Linkerd was Elkjøp’s choice for several reasons.
Importantly, they wanted a project backed by the CNCF with all of its benefits including a rigorous maturity framework, a community-based commitment to high-quality projects, and technical excellence.

 

Also, a priority was ease of setup. Within a week, the team had run, tested, and was ready to move forward with Linkerd. “The initial setup was really quick,” said Fredrik Klingenberg. “Overall, it took very few hours to get it up and running and realize value.”

Kubernetes, retail, microservices, only the location differs

Nordstrom wanted to increase the efficiency and speed of its technology operations, which includes the Nordstrom.com e-commerce site. At the same time, Nordstrom Technology was looking for ways to tighten its technology operational costs.

Every improvement counts, especially if it’s significant

Nordstrom’s Dev and Ops team members built a CI/CD pipeline, working with the company’s servers on premise. Before that, it took weeks for a developer to acquire a VM. With the help of the pipeline, now it is a matter of minutes.

While Kubernetes is often thought as a platform for microservices, the first application to launch on Kubernetes in a critical production role at Nordstrom was Jira. “It was not the ideal microservice we were hoping to get as our first application,” Dhawal Patel, a senior engineer on the team building a Kubernetes enterprise platform for Nordstrom admits, “but the team that was working on it was really passionate about Docker and Kubernetes, and they wanted to try it out. They had their application running on premises, and wanted to move it to Kubernetes.”

The benefits were immediate for the teams that came on board. “Teams running on our Kubernetes cluster loved the fact that they had fewer issues to worry about. They didn’t need to manage infrastructure or operating systems,” says Marius Grigoriu, Sr. Manager of the Kubernetes team at Nordstrom. “Early adopters loved the declarative nature of Kubernetes. They loved the reduced surface area they had to deal with.”

Nordstrom’s journey to the cloud resulted in immediate benefits. “With Kubernetes, without even trying to make our cluster efficient, we are currently at 40 percent CPU utilization—a 10x increase. we are running 2600+ customer pods that would have been 2600+ VMs if they had gone directly to the cloud. We are running them on 40 VM now, so that’s a huge reduction in operational overhead.”

More than 10.000 tills to keep an eye on
Monitoring, Security and Scalability at Tesco with Kubernetes

It became evident that Tesco needs to overhaul their current technological infrastructure, if they want to become the leader of the retail industry.

An in-house built infrastructure offers many advantages. Not only does it make shopping more comfortable, with the help of self-checkouts, but also makes the whole process more secure. These are the main purposes of the Loss Prevention, Monitoring and Alerting developments at Tesco’s Budapest office. With the help of these developments, the state of tills can be tracked in real time. Experts can intervene immediately in case of an error, or when a customer mistakenly or intentionally scanned another barcode.

All the tills in real time

Tesco Technology developed a robust monitoring and alerting system to cope the hardware infrastructure of a supermarket. Scalability and robustness were really sought after because more than 70.000 devices need to be monitored in real time. It makes the problem more challenging, that every device runs at least 30 containerised service, whose state and metrics need to be tracked every 15 seconds.

It was obvious from day one that Kubernetes was going to be the platform, to host and run this system.

To enable the system to effectively transport metrics, a stable network was needed. It is not uncommon that stores do not have access to large enough bandwidth to support volumes like this. Hence it was crucial to prioritise specific data flows, moreover to enable offline operation.

The engineers and network specialists at Tesco found their solution in Envoy proxy, the open source C++ layer 4 proxy. Envoy enabled them to prioritise on the transportation layer, and set rate limits, so that when a software update happens the network of the given supermarket does not collapse. This also guarantees that payment and authentication processes have the highest priority.

Rock ‘n’ rollout

Managing development cycles is one thing, rolling these changes out to the great public is another genre of art. Tesco developed their own rollout process, with a complex pipeline to synchronise the work of 30-40 teams. This enables continuous releases to each individual supermarket’s central server. From here the supermarket’s central server is responsible for distributing the images to the tills and other devices. All this happens with the adequate data prioritisation so that everything else goes without a hitch.

As we can see through these examples, Kubernetes’ flexibility, robustness and ability to auto scale make the platform extremely appealing to retail companies all around the globe.

References: