Debug health monitor for a single pool member in F5 LTM


Here’s an old post that shows how to debug bigd that gets you the debugs of all the health monitors that are running on the system. The rule of thumb with debugs is that the files get too large and may have an impact on other important services that may need that extra space.

What if you want to enable the debugs for just one pool member to see what’s going on with the health monitor associated with the pool member?

Monitor logging option is a better approach than debugging the bigd for this purpose.

You can find this setting under Local Traffic > Pools > pool_name > Members > Monitor Logging

Capture

By default, this option is disabled.

Once monitor logging has been enabled and you click on Update, the logs can be found in /var/log/monitors/ where each monitor log file is appended with the pool-member ip address and port as shown below;

# cat /var/log/monitors/Common_site1.lab.com-mon-80-Common_172.16.5.11-80.log

 

Here are a couple of examples showing the debug logs in different situations;

If logging is enabled (and health monitor is failing):

[0][5644] 2017-10-29 10:38:14.127070: ID 22    :(inst_to_service) Logging enabled. [ addr=::ffff:172.16.5.11:80 srcaddr=none ]
[0][5644] 2017-10-29 10:38:17.113836: ID 22    :(_send_active_service_ping): pinging [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]
[0][5644] 2017-10-29 10:38:17.113875: ID 22    :(_send_active_service_ping): writing [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ] send=GET /site1.lab.com.html HTTP/1.1\x0d\x0aHost: site1.lab.com:80\x0d\x0aConnection: close\x0d\x0a\x0d\x0a
[0][5644] 2017-10-29 10:38:17.122313: ID 22    : Response Code: 404 (Not Found) [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]
[0][5644] 2017-10-29 10:38:17.122379: ID 22    :(_recv_active_service_ping): Response did not match recv regex yet [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]
[0][5644] 2017-10-29 10:38:17.122413: ID 22    :(_main_loop): rfd selected, now=[1509253697.122391][2017-10-29 10:38:17] [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 fd=11 pend=0 #conn=0 ]
[0][5644] 2017-10-29 10:38:17.122427: ID 22    :(_recv_active_service_ping): reading [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]
[0][5644] 2017-10-29 10:38:17.122440: ID 22    :(_recv_active_service_ping): read failed [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]
[0][5644] 2017-10-29 10:38:17.122483: ID 22    :(_recv_active_service_ping): Response did not match recv regex yet [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]
[0][5644] 2017-10-29 10:38:17.122498: ID 22    :(_recv_active_service_ping): Hit EOF [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:47877 ]

If logging is enabled (and health monitor is successful):

[0][5644] 2017-10-29 11:25:01.597539: ID 60    :(_do_ping): time to ping, now=[1509256501.597441][2017-10-29 11:25:01], status=UP [ addr=::ffff:172.16.5.11:80 mon=/Common/site2.lab.com fd=-1 pend=0 #conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1509256501.533661][2017-10-29 11:25:01] last_ping=[1509256496.576655][2017-10-29 11:24:56] deadline=[1509256517.533661][2017-10-29 11:25:17] on_service_list=True snd_cnt=11 rcv_cnt=11 ]
[0][5644] 2017-10-29 11:25:01.597582: ID 60    :(_send_active_service_ping): pinging [ addr=::ffff:172.16.5.11:80 srcaddr=none ]
[0][5644] 2017-10-29 11:25:01.597601: ID 60    :(_connect_to_service): creating new socket (rd0) [ addr=::ffff:172.16.5.11:80 ]
[0][5644] 2017-10-29 11:25:01.597679: ID 60    :(_connect_to_service): connect: Operation now in progress [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ]
[0][5644] 2017-10-29 11:25:01.597702: ID 60    :(_do_ping): post ping, status=UP [ addr=::ffff:172.16.5.11:80 mon=/Common/site2.lab.com fd=11 pend=1 #conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1509256506.533661][2017-10-29 11:25:06] last_ping=[1509256501.597441][2017-10-29 11:25:01] deadline=[1509256517.533661][2017-10-29 11:25:17] on_service_list=True snd_cnt=12 rcv_cnt=11 ]
[0][5644] 2017-10-29 11:25:01.602032: ID 60    :(_main_loop): wfd selected, now=[1509256501.597723][2017-10-29 11:25:01] [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 fd=11 pend=1 #conn=1 ]
[0][5644] 2017-10-29 11:25:01.602067: ID 60    :(_send_active_service_ping): pinging [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ]
[0][5644] 2017-10-29 11:25:01.602095: ID 60    :(_send_active_service_ping): writing [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ] send=GET / HTTP/1.1\x0d\x0aHost: site2.lab.com\x0d\x0aConnection: close\x0d\x0a\x0d\x0a
[0][5644] 2017-10-29 11:25:01.602124: ID 60    :(_send_active_service_ping): sent ping [ addr=::ffff:172.16.5.11:80 mon=/Common/site2.lab.com fd=11 pend=0 #conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1509256506.533661][2017-10-29 11:25:06] last_ping=[1509256501.597441][2017-10-29 11:25:01] deadline=[1509256517.533661][2017-10-29 11:25:17] on_service_list=True snd_cnt=12 rcv_cnt=11 ]
[0][5644] 2017-10-29 11:25:01.605520: ID 60    :(_main_loop): rfd selected, now=[1509256501.602143][2017-10-29 11:25:01] [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 fd=11 pend=0 #conn=0 ]
[0][5644] 2017-10-29 11:25:01.605549: ID 60    :(_recv_active_service_ping): reading [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ]
[0][5644] 2017-10-29 11:25:01.605580: ID 60    :(_recv_active_service_ping): rcvd 432 bytes: -->HTTP/1.1 200 OK\x0d\x0aContent-Type: text/html\x0d\x0aLast-Modified: Thu, 17 Aug 2017 21:14:10 GMT\x0d\x0aAccept-Ranges: bytes\x0d\x0aETag: "0f29c59d17d31:0"\x0d\x0aServer: Microsoft-IIS/7.5\x0d\x0aDate: Sun, 29 Oct 2017 11:26:52 GMT\x0d\x0aConnection: close\x0d\x0aContent-Length: 191\x0d\x0a\x0d\x0a<html>\x0d\x0a<head>\x0d\x0a\x0d\x0a<style>\x0d\x0abody {\x0d\x0a\x09background-color: ffcc99\x0d\x0a}\x0d\x0a</style>\x0d\x0a</head>\x0d\x0a<body>\x0d\x0a<h1 style="font-size:300%;text-align:center;">site2.lab.com on dc1-win-srv02</h1>\x0d\x0a</body>\x0d\x0a</html><-- [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ]
[0][5644] 2017-10-29 11:25:01.605734: ID 60    : Response Code: 200 (OK) [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ]
[0][5644] 2017-10-29 11:25:01.605757: ID 60    :(_recv_active_service_ping): Response matched regex [ addr=::ffff:172.16.5.11:80 srcaddr=::ffff:10.10.46.13%0:53135 ] send=GET / HTTP/1.1\x0d\x0aHost: site2.lab.com\x0d\x0aConnection: close\x0d\x0a\x0d\x0a
 recv=site2.lab.com
disable=
[0][5644] 2017-10-29 11:25:01.606071: ID 60    :(adjust_deadline): xxxxxxxx
[0][5644] 2017-10-29 11:25:01.606110: ID 60    :(_response_success): node was up and is still up [ addr=::ffff:172.16.5.11:80 srcaddr=none mon=/Common/site2.lab.com snd_cnt=12 rcv_cnt=11 ]

 

 

 

 

One thought on “Debug health monitor for a single pool member in F5 LTM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s