Resource Indexing in CHARITY

2024-02-15

The Resource Indexing is an important service of the CHARITY platform, collecting performance related cluster metrics and making them available for the HLO (High Level Orchestrator). Since the HLO is responsible for choosing the most suitable cluster(s) to each application, either based on the resources required by a new deployment, or those needed by an application migrating to a new cluster, the Resource Indexing is the key tool to readily gather accurate information from the platform in an efficient, scalable way.

The Resource Indexing in CHARITY is thus implemented as a distributed service made up of service instances per cluster, as shown in Figure 1, that collect data from the cluster in which they are deployed. These instances communicate with the main Resource Indexing entity, an architectural component aggregating information from all clusters available in all domains and for all providers.

Within each cluster, the Cluster Resource Indexing contains a database where the lastest value of each one of the metrics is stored, and two update helper tools for input and output.

The first updater makes periodic HTTP requests to Prometheus. Prometheus is typically used to collect numeric values of metrics from services that run 24/7, and allow metric data to be accessed via HTTP endpoints, supporting queries in the PromQL format. This updater uploads the relevant information to the local Resource Indexing database.

The second helper periodically updates the main Resource Indexing entity. In order to to take deployment decisions, the HLO will need to evaluate several cluster metrics provided by the central Resource Indexing entity, including CPU, memory, storage, bandwidth between clusters and latency between clusters. The first prototype of the Resource Indexing collects various values related to the first three metrics from the controlled clusters. Table 1 shows the Prometheus metrics used by the Resource Indexing prototype.

The communication between the HLO and the Resource Indexing is through a REST API, where two types of queries are defined, according to the needs of the HLO. The first type of query returns the metric values from all clusters and all domains, while the second one returns values from the clusters belonging to the domain specified by its input parameters. The Resource Indexing thus allows efficient information retrieval in the two important cases that a status update is needed (a) for the whole platform (e.g., platform startup, or periodic replacement of old information) or (b) for a specific domain (e.g., a new cluster is added to the platform, or a cluster status is known to be changing). 

Some basic experimentation has been done regarding the communication between a main cluster and a couple of standalone clusters, to check if all the tools needed for the Resource Indexing do work and interoperate as expected. To perform our tests, we deployed each standalone cluster with a set of servers, and set up the Prometheus tool to obtain the necessary information.

Resource Indexing was able to correctly store the information in the database, that the information was sent to the Main Cluster, and that it was correctly received. Finally, we simulated the HLO calls to the Resource Indexing through the REST API, particularly the ones related to availability, to check if the obtained data was useful to the HLO.

As the project continues, more experiments will be performed and features will be added, evaluating the Resource Indexing in more varied conditions and complex use cases. How will it fare with respect to other solutions, and how does its design compare with them? Let us know what approach you are following in your project!