RESOURCE USAGE PREDICTION IN CLOUD COMPUTING

2023-06-16

During the last decades, we have witnessed the emergence of demanding applications in terms of Quality of Service (QoS) requirements. Contemporary applications , such as  Extended Reality (XR), are often associated with various QoS requirements. The driving force  behind this type of applications is the desire to provide immersive experiences to end-users. Providing acceptable levels of immersion requires low latencies and high bandwidths. In the field of XR applications, these requirements are rather strict. The scientific literature has shown that for an end-user experience to be acceptable, the end-to-end latency shall be less than 15ms, and the bandwidth should be able to scale up to 30 Gbps. Furthermore, the inevitable emergence of faults in task processing may impose dire ramifications on the creation of immersive experiences since they often result in disruption of service delivery and thus, the desired immersion is jeopardised. In order to keep up with such demanding QoS requirements, it is of paramount importance to orchestrate and manage the available computational resources in an optimal manner. The management and orchestration paradigm in the context of Cloud computing includes processes such as adaptive resource allocation, intelligent task offloading and proactive fault tolerance.

 

Adaptive Resource Allocation
 Applications performance depends on the computational capacity of the available resources. In order to overcome potential computational limitations, it is possible to allocate additional computational resources,  which may help alleviate the computational burden that is imposed by the various tasks. There are two ways in which this functionality may be achieved. The first one is referred to as horizontal scaling and is the process of allocating additional computational resources. The second one is called vertical scaling and is the process of increasing the computational capacity of the existing nodes. In the case of horizontal scaling it is of vital importance to possess a prior estimation of the resource consumption that is expected to take place during the next minutes because the creation of additional virtual machines requires a time-span of multiple minutes. This unfortunate time-related bound deals a rather significant blow to the infrastructure’s ability to timely react in case of a sudden burst in resource usage. Unfortunately, the allocation of additional computational resources increases the overall cost of the operations.  Thus, the solution would be the implementation of a “goldilocks zone” principle which makes sure that the virtual machines work at a specified capacity which simultaneously keeps the overall cost to a minimum and allows the Cloud infrastructure to timely react to the changes in traffic.

 

Intelligent Task Offloading
 Task offloading refers to the process of designating specific resources to handle a number of tasks, depending on their requirements. This process is based on the rather simple principle of properly distributing the workload among the available computational and storage resources in order to achieve better response time. The task offloading decision mechanism, after examining the ongoing resource utilization of each computational node, shall decide which one shall receive the next task. The main design purpose of the edge node selection process is to avoid selecting an edge node, which is already working closely to its maximum capacity. Furthermore, one can not help but notice that the functionalities of intelligent task offloading and adaptive resource allocation are closely intertwined. By ensuring that the various computational nodes do not work closely to maximum capacity, the time-frame that the infrastructure can respond to a potential, sudden burst in service demand gets significantly wider. 

 

Intelligent Task Offloading
 The task offloading refers to the process of designating specific network resources to handle various tasks, depending on their requirements. This process is based on the rather simple principle of properly distributing the workload among the available computational and storage resources in order to achieve better response time. The task offloading decision mechanism, after examining the ongoing resource utilization of each running edge node, will decide which one shall receive the next task. The main design purpose of the edge node selection process is to avoid selecting an edge node, which is already working closely to its maximum capacity. Furthermore, one can not help but notice that the functionalities of intelligent task offloading and adaptive resource allocation are closely intertwined. Through ensuring that the various edge nodes do not work closely to maximum capacity, the time-frame that the network can respond to a potential, sudden burst of traffic gets significantly wider. 
 
 Proactive Fault Tolerance
 Since Cloud environments consist of a vast number of computational nodes operating simultaneously, it is extremely important to regard component failure as an inevitability. By doing so it is possible to ensure that the infrastructure shall keep operating without interruption when one or more of its components fail. The main way of ensuring that a infrastructure will be able to keep carrying out its operations even after the event of a critical failure, is by utilizing backup components, which can automatically replace failed ones, in a manner which guarantees no loss of service.  As stated above, the creation of a new virtual machine requires a certain waiting period, which would have grave ramifications on the infrastructure’s abillity to keep up with the aforementioned QoS requirements. Thus, fault tolerance can be achieved via implemenig proactive strategies instead of reactive ones. At any given time, the infrastructure should include a specific number of computational nodes, which remain idle until one of the operational components ceases to function properly. Given the additional operational costs that cloud-based computational nodes may come at, it is important to keep this number to a minimum. 

 

By utilizing Deep Learning algorithms it is possible to extract information regarding the resource usage patterns of the prone to fail components, as well as the time-frame in which the faults are expected occur. This enables the proactive fault tolerance processes to ensure that the infrastructure’s operations shall continue to be conducted uninterrupted and that the overall cost will be kept to a bare minimum. As a matter of fact, the management and orchestration processes in the context of Cloud computing infrastructures can be greatly improved by incorporating Deep Learning models that are capable of predicting the time-evolving  resource utilization metrics. The most notable of these metrics are CPU, RAM, bandwidth, and disk I/O. 

 

Predicting resource consumption in Cloud environments is a quite challenging endeavour since these environments are dynamic and heterogeneous. In order to tackle this scientific problem, CHARITY has developed a plethora of Deep Learning models that are capable of formulating accurate predictions in the context of resource usage. These models are based on the Encoder-Decoder paradigm and are capable of exactring information regarding the of resource usage patterns that are present in Cloud topologies in order to produce quite accurate multi-step predictions that are then leveraged in the contex of optimal resource orchestration and management.