Palo alto network HA link monitor failover Monitor Fail Hold Up Time
Hello good afternoon, thank you very much for your usual collaboration.
I have a question, regarding HA Links monitor timers.
When one has configured a track or HA Link Monitor, some interfaces, checking the HA timers, the one that corresponds is: Monitor Fail Hold Up Time (ms). It is both in recommended as in aggressive by default this in "0 ms", therefore it means that to the smallest detail, to the smallest Flap at the level of interfaces that detects the Palo Alto, this will immediately trigger a Fail Over, that is to say immediately since it waits according to the Monitor Fail Hold Up Time (ms) 0 Milliseconds.
Reviewing the doc indicates:
Monitor Fail Hold Up Time (ms): Interval during which the firewall will remain active following a path monitor or link monitor failure. This setting is recommended to avoid an HA failover due to the occasional flapping of neighboring devices.
https://docs.paloaltonetworks.com/pan-os/9-1/pan-os-admin/high-availability/ha-concepts/ha-timers
If we take it to a scenario where for example there may be an occasional flap on the interface of the switch where it is connected, as the doc indicates, a slight flap, the firewall high stick will detect the drop and trigger a fail over, so here what would be the recommendation ???? Leave it at 0 ms ? or set it to 1000 ms ( 1 second ) or 1500 ( 1.5 seconds ) ? so that the Palo Alto waits and validates that there is no longer a failure in that Link and therefore the Link Monitor will not trigger the Fail over since it waited 1 or 2 seconds approx since it was only a flap, not a real fall of the interface, a physical failure of the link or a failure of the entire switch.
As always thank you very much for your support
I remain attentive
Best regards
By default it is assumed neighbours don't just flap. This would be true for most situations as modern hardware is pretty reliable. In cases where you do expect occasional flaps, the best way to determine the hold time is for you to actually see which 'gap' is normal and where a failover would be desirable, hence there is no recommendation there for an actual time. In some cases the 'normal' flap could be 5ms while in other cases it may be 2 seconds