The CHARITY Edge Storage (CH.E.S.)

2023-03-14

In edge computing a large amount of data is generated and consumed by various edge applications. One of the key challenges in the development of applications at the edge is the efficient data sharing between multiple edge clients. Data sharing can be realized within individual application frameworks or through an external storage service. In general, edge computing moves the computational load to the edge of the network in order to exploit the computational capabilities of edge nodes. Moving data and computation closer to the client results in latency minimization but also improves network bandwidth. Thus, edge storage can greatly improve data access which in turn enables latency-sensitive applications. Nevertheless, implementing an efficient edge-enabled storage system is challenging due to the distributed and heterogeneous nature of the edge and its limited resource capabilities.

The CHARITY Edge Storage Component (CHES) provides optimized edge storage services to the CHARITY framework and its hosted applications. These services include data storage, retrieval and migration tasks, security and privacy protection capabilities, quality of service (QoS) and quality of experience (QoE) violation prevention and mitigation as well as other data-related services that serve the runtime requirements of CHARITY. In detail, the edge storage component provides a reliable, fast, stable and secure shared storage engine, accessible by all devices and users in the cloud-edge continuum. Furthermore, it is extremely lightweight since it is created for edge devices with extremely limited resources, like Raspberry Pies or other micro-computer devices.

CHES is based on Kubernetes, MinIO and Prometheus technologies, combining and optimizing them to better serve CHARITY's needs. More specifically, a lightweight Kubernetes distribution built for IoT & edge computing is used, called K3s. K3s is a highly available, certified Kubernetes distribution designed for production workloads in unattended, resource-constrained, remote locations or inside IoT appliances. As a storage solution, an open-source framework created by IBM is utilized, called MinIO. MinIO is an inherently decentralized and highly scalable Peer-to-Peer (P2P) solution, allowing us to deploy it freely on usable nodes. It is designed to be cloud native and can run as lightweight containers managed by external orchestration services such as Kubernetes. Moreover, it uses object storage over block storage, so it is in fact, a combination of the two systems, preserving the lightweight distributed nature of block storage while providing a plethora of metadata and easy usage of the object storage. Unlike other object storage solutions built for archival use cases only, the MinIO platform is designed to deliver the high-performance object storage required by modern big data applications. Prometheus is responsible for collecting monitoring data about the real-time performance of the nodes and the component as a whole to analyze the behaviour of different applications and optimize the cluster architecture, the options, and the data distribution.

 

Hybrid edge/cloud environment is rapidly becoming the new trend for organizations seeking the perfect mix of scalability, performance and security. As a result, it is now common for an organization to rely on a mix of on-premises data centers (private cloud), and cloud/edge solutions from different providers to store and manage their data. However, many obstacles arise when applications have to access the data. On the one hand, developers need to know the exact location of the data and, on the other hand, manage the correct credentials to access the specified data-sources holding their data. In addition, access to cloud/edge storage is often completely transparent from the cloud management standpoint and it is difficult for infrastructure administrators to monitor which containers have access to which cloud storage solutions. Even if containerized components and micro-services are widely promoted as the appropriate solution for efficiently deploying and managing storage over a hybrid edge/cloud infrastructure, containerization makes it more difficult for the workloads to access the shared file systems.

 

To address these issues, the Kubernetes Dataset Lifecycle Framework provided by IBM’s Datashim is employed on top of MinIO, which enables automated and transparent access to data sources for containerized applications. DLF enables users to access remote data-sources via a mount-point within their containerized workloads and it is aimed to improve usability, security and performance, providing a higher level of abstraction for dynamic provisioning of storage for the users’ applications. DLF is cloud-agnostic and makes use of Kubernetes access control and secret so that pipelines do not need to be run with escalated privilege or to handle secret keys, thus making the platform more secure. It also introduces the Dataset as a Custom Resource Definition (CRD), which is a reference to existing data sources and is materialized as a Persistent Volume Claim (PVC). DLF utilizes the Operator SDK to create the Dataset Operator, a component which reacts to the various events which are part of the lifecycle of a CRD and implements the desired functionality. Its main functionality is to react to the creation (or the deletion) of a new Dataset and materialize the specific object.

 

Keywords: edge computing, edge storage, container-based virtualization, cloud computing, internet of things, Kubernetes, MinIO

 

Relevant Publications:

HUA Team.