We are monitoring a business application, these applications are running on multiple servers in the infrastructure. A business process sensor is used to verify the overall status. But... the traffic light state of the business process is turning into red very quickly.
For example: we have 10 servers with 20 sensors configured on each server. The servers are added into one channel in the business process, threshold is 90% error. If 1 server has one error (1 out of the 20 sensors in total of this specific server), the business sensor turns into red. So in this situation, the whole server is marked as unavailable? The threshold is 90%.. so 1 server is marked as "down". In practical, the application still works.
A solution for this could be the following, add 20 channels per business sensor. And configure the thresholds for "important" sensors and "less-important" sensors on those servers. But this causes a lot of rework when something changes in the infrastructure. When we add one server with 20 sensors, we have to add this server in all the channels.
We want to use traffic light and business sensors. Currently, the monitoring is based on total percentages of healthy checks in total. This has nothing to say about the functionality of the application. We are looking for a suitable structure for this. How can we do this?
Thanks in advance!.