Re: Enormous transaction tables in Version: 5.0 (Build 14-e)

Amy Kang
 

Hello Will,

On 9/25/17, 6:07 PM, Will Hartung wrote:
We've been running two legs of OpenMQ 5.0 successfully for, heck, years. It's been really hands off.

It almost sounds like your recent issue described below had something to do with PostCres behavior regarding deleted rows/"empty pages" .. especially you mentioned below the "tables had few rows, but large amounts of dead space" and PostGres Vacuum facility was run because of that..

You mentioned below the unexpected large size of the 3rd table which is mqtx41cmqcluster.  This table records all transactions and the broker lazily and periodically purge completed transactions (committed/rolledback) transactions.  PostCres optimization on row deletions and empty space compact likely an area can be helpful to look at.  You can also run 'imqcmd list txn' to see how many transactions in the broker(s).  Are you using MQ cluster ?  if yes,  enhanced cluster or regular ?  Since data in these tables are internal to MQ broker(s),  shouldn't be arbitrarily truncked.

By the way since 5.0,  there have been many bugs fixed,  please upgrade to the latest MQ 5.1.1 release.

Thanks,
amy

We have 2 legs in a cluster.

We had to restart one of the machines, and when OpenMQ stopped, the other leg went in to a recovery mode to take over the loss of the other leg.

The problem was that this was taking absolutely forever. What we didn't at the time realize, was that the MQ was performing some SELECTs on the backing PostGres DB.

PostGres does not return empty pages back to the database after all of the rows are deleted from them, so anything that requires a table scan (in this case I think it was doing a SELECT DISTINCT on a session id or something), will cause the DB to load all of the empty pages.

We have 3 rather large tables:
mqconstate41cmqcluster -- 6GB
mqmsg41cmqcluster -- 6GB
mqtxn41cmqcluster -- 32GB

These tables had few rows (< 1000), but large amounts of dead space.

When we discovered this, we took some down time, and used the PostGres Vacuum facility to recover disk pages and such. We did that a week ago Sunday. We thought these table sizes were anomalies.

However, we had a different failure, ran out of connections on the database which caused a failover, and we see that these tables have exploded again.

Now the first two have few rows -- ~2000
The final one, has over 21M rows.

And our recovery are taking forever and shutting everything down.

Now, sure, we do a lot of JMS transactions, but even though our queues and such are persistent, we keep up, our queue depths are usually very small (< 5) but that 21M number is pretty crazy. All of these numbers are pretty crazy.

I'm tempted to TRUNCATE all of these tables and restart, not particularly caring about any in flight messages and what not at the moment. Just so I can restart.

Why are these tables so large? They literally exploded over a week, I thought the sizes were anomalies but I guess not.

What would happen if I truncated those tables and restarted? I've dredged through the source code, but it's quite difficult to find out what's being done where to these tables.

Thanks!









Join openmq@javaee.groups.io to automatically receive all group messages.