Thought I would post a simplified version of best practices for
monitoring. If anyone would like to add or provide feedback. Please go
right ahead. Cheers.
WebSphere MQ Monitoring Tips.
Monitoring like anything else is about process. We hope you will find
these tips beneficial regarding of the tool/product
you choose for monitoring WebSphere MQ.
1. Queue Manager
1.1 Set up Queue Manager Down Monitor
1.2 Set up Command Server Down Monitor. Many applications including
Qflex depend on command server being available.
1.3 Configure TCP Connections Equal or Greater Than monitor close
to setting specified in the queue manager qm.ini file
IBM default is only about 20 connections and this is a common cause
of problems. You usually want to increase that value
and monitor for situations when number of connection is close to
max or has been exceeded.
2. Important Queues
2.1 SYSTEM.DEAD.LETTER.QUEUE. This should be monitored for
conditions when depth is more than 0. Each message on dead
letter queue is potentially a problem that a developer might not
know about. It is important to know that a message
arrived on a dead letter queue as well as to investigate why.
2.2 All Transmission queues should be monitored. If messages begin
to backlog on a xmitq, it might indicate that there
is ongoing or intermittent problem with the channel.
3. Error, Failure and Backout Queues
3.1 Applications have their own error queues. Its a good practice
for each application to have at least one reserved
error queue. It is wise to monitor for conditions when the depth on
those queues is more than 0, same as dead letter
queue.
4. Application Input Queues
4.1 By monitoring the application input queue, we can tell several
things. How well application is doing processing
messages if at all. Each application will have a specific threshold
that we should know about. For example for
some applications it is not abnormal to have more than few thousand
messages whereas for others anything more than
10 might imply a serious problem.
4.2 If an application is supposed to be reading from the queue or
writing to a queue at all times, it is a good
idea to monitor input and output counts. If we see that input count
had dropped below 1, we know that application
might have crashed or stopped.
5. Channels.
5.1 If an application is supposed to be connected at all times via
an SVRCONN channel, it is a good idea to set up a
monitor which will detect a condition when that channel is not
running. This, just like read and write count monitors,
can act as early signs of a potential problem with the application.
5.2 Sender and receiver channels should also be monitored based on
their status.
6. FDC Files.
6.1 Presence of FDC files sometimes means that a serious error had
occurred on the server and it should be looked into
though that is not always the case. Sometimes FDC files are
generated due to minor severity events such as a client
disconnecting abruptly.
6.2 AMQERROR log files. It is a good idea to keep an eye on those
log files as well. There are certain AMQ error codes
that definitely should be monitored for. We are working on the list
of most severe ones and will post it as soon as
it is ready.