Wissam Mallouli, Montimage, France (email@example.com) 13.02.2017
This blog presents different architectures for security SLA monitoring in multi-cloud based environment to guarantee security and efficiency of the overall complex application. Besides, a list of requirements, features and challenges to be faced when deploying this kind of monitoring solutions in real-world is presented.
Monitoring in virtualized environments
Monitoring is a solution that is required to ensure the correct operation of the whole system. Malfunctioning or even minor problems in a virtual machine could introduce vulnerabilities and instability of other virtual machines, as well as the integrity of the host machine. In MUSA H2020 project, the monitoring function is needed to be able to precisely understand what is going on in the network, system and application levels, with a twofold objective. First, it is necessary for improving the security in the communications and services offered by the virtual environments. Second, from the administration and management’s point of view, it will help ensure the environment’s health and guarantee that the system functions as expected and respects its security SLAs.
Existing monitoring solutions to assess security and performance can still be used in virtualized network environments. Nevertheless, existing solutions need to be adapted and correctly controlled since they were meant mostly for physical and not virtual systems and boundaries, and do not allow fine-grained analysis adapted to the needs of cloud and virtualized networks. The lack of visibility and controls on internal virtual networks, and the heterogeneity of devices used make many performance assessment applications ineffective. On one hand, the impact of virtualization on these technologies needs to be assessed. For instance, QoS monitoring applications need to be able to monitor virtual connections. On the other hand, these technologies need to cope with ever-changing contexts and trade-offs between the monitoring costs and the benefits involved. Here, virtualization of application component facilitates changes, making it necessary for monitoring applications to keep up with this dynamicity.
Solutions such as Ceilometer, a monitoring solution for OpenStack, provide efficient collection of metering data in terms of CPU and network costs. However, it is focused on creating a unique contact point for billing systems to acquire all of the measurements they need, and it is not oriented to perform any action to try to improve the metrics that it monitors. Furthermore, security issues are not considered.
StackTach is another example oriented to billing issues that monitors performance and audits the OpenStack’s Nova component. Similarly, but not specifically oriented to billing collected gathers system performance statistics and provides mechanisms to store the collected values.
A recent project from OPNFV, named Doctor, focuses on the creation of a fault management and maintenance framework for high availability of network services on top of virtualized infrastructures.
In terms of security, OpenStack provides a security guide providing best practices determined by cloud operators when deploying their OpenStack solutions. Some tools go deeper in order to guarantee certain security aspects in OpenStack, for instance: Bandit provides a framework for performing security analysis of Python source code; Consul is a monitoring tool oriented to service discovery that also performs health checking to prevent routing requests to unhealthy hosts.
Monitoring virtualized applications
To be able to assure end-to-end security in virtualized application components, a monitoring architecture needs to be defined and deployed. This will permits to measure and analyse the network/application flows at different observation points that could include any component of the system, such as physical and virtual machines. The choice of the observation point depends on the monitoring objective and also the monitoring administrator that can be one of the following actors:
- The Cloud Service Provider: Can deploy a monitoring tool (e.g., MMT solution) in his own cloud infrastructure including servers and routers. It has not the possibility to deploy its solution in any virtual machine or container but it can propose to his customers to deploy new VMs or containers from OS images that already integrate a monitoring solution.
- The application owner: Can deploy a monitoring tool (e.g., MMT solution) in each VM or container it deploys. A best practice is to have for each application component a monitoring agent or a set of monitoring agents to observe different behaviours at runtime and check security SLAs.
Setting up several observation points will help to better diagnose the problems detected. In cloud environments, it is possible to create network monitoring applications that collect information and make decisions based on a network-wide holistic view. This enables centralized event correlation on the network controller, and allows new ways of mitigating network faults.
The monitoring probes can be deployed in different points of the system. Let’s consider a single hardware entity that is controlled by a hypervisor that manages the virtual machines. A first approach consists of installing the monitoring solution (MMT) in the host system (hypervisor) that operates and administers the virtual machines (see Figure 1), in this way providing a global view of the whole system. This approach requires less processing power and memory to perform the monitoring operations, since the protection enforcement is located in a central point. In this way, network connections between the host and the virtual machines can be easily tracked allowing early detection of any security and performance issue. The main problem of this approach resides in the minimum visibility that the host machine has inside the virtual machines, not being able to access to key parameters such as the internal state, the intercommunication between virtual machines, or the memory content.
Figure 1. Network-based protection
Monitoring probes can also be located in a single privileged virtual machine that is responsible for inspection and monitoring of the rest (see Figure 2). This approach is called Virtual Machine Introspection (VMI) and offers good performance since the monitoring function is co-located on the same machine as the host it is monitoring and leverages a virtual machine monitor to isolate it from the monitored host . In this way, the monitoring probes analyses the activity of the host through direct observation of the hardware state and thanks to inferences on software state based on a priori knowledge of software structure. VMI allows the monitoring function to maintain high levels of visibility, evasion resistance (even if host is compromised), and attack resistance (isolation), and even enables the manipulation of the state of virtual machines. Unfortunately, VMI based monitoring software is highly dependent on the particular deployment and requires privileged access that cloud providers need to authorize.
Figure 2. Virtual machine introspection
The approach that offers the best security performance is the deployment of the monitoring tools in every virtual machine. In this way robust protection can be achieved since the security software has a complete view of the internal state of every virtual machine, as well as the interactions with the host or any other virtual machine. Figure 3 shows how this approach can be deployed.
Figure 3. Host-based protection
This third solution offers a good performance in terms of security even if loose visibility of hypervisor behaviour. Here, the processing power and memory required are distributed among the virtual machines. Furthermore, its deployment is simpler than other approaches since it can be included in the software image of the virtual machine, so it is automatically initiated when instantiating each virtual machine with no further configuration needed. On the other hand, the probes lose control over the physical resources and it is impossible to monitor what happens at hardware and hypervisor level. As an example, we can consider the case of one physical CPU shared among two virtual machines VM1 and VM2 assigned to two users U1 and U2. If the virtualization engine move CPU power from one VM1 to VM2, assigning two time slot to VM2 and 1 slot toVM1, VM1 will start going more slowly, but will always perceive 100% CPU. The only way of bypassing this, is to access and monitor the hypervisor or use their APIs if available.
Despite of the individual probes installed on each virtual machine, there is the need of a global monitoring coordinator that supervises the monitoring tasks of each probe installed on each virtual machine. For this, each probe must be able to directly interact with any other probe, as well as with the monitoring coordinator. Local decisions can be taken by the individual monitoring probes installed on each virtual machine, and the monitoring coordinator can perform coordination, orchestration and complex event detection.