• Ben Kochie's avatar
    Reduce the cardinality of health endpoint metrics (#4650) · 9edfaed6
    Ben Kochie authored
    The health endpoint histogram has a large amount of cardinality for a
    simple endpoint. Introduce a new "Slim" set of buckets for `/health` to
    reduce the metrics load on large deployments. Especially those that have
    per-node DNS caching services.
    
    Add a metric to count internal health check failures rather than use the
    timeout value as side effect monitor of the check error. This avoids
    incorrectly recording the timeout value if there is an error that is not
    a timeout (ex. refused)
    Signed-off-by: default avatarSuperQ <superq@gmail.com>
    9edfaed6
plugin.go 3.79 KB