Jobs not processing or taking way too long

A locked table in Analytics database (e.g. Calendar Data Source table) may cause a job to take too long or not get processed at all. After releasing the lock (by killing the blocking/hanging session) the scheduled jobs start to work correctly.

It is common to have locks, but they usually resolve themselves, even though it can take a long time when you have a very slow query running elsewhere. And it can impact the system drastically when that happens. For them to stay there forever, it is very rare.

Troubleshooting

If there are hanging sessions in Greenplum, Analytics DB, you can end up with locked tables. Here are steps to resolve this issue.

  1. Find any sessions that have not (yet) been granted a necessary lock:
    select * from pg_locks where not granted;

    We see here that there are many, all for the same table, which is already suspicious.

  2. We get the table like this:
    select * from pg_locks where relation = 16829341

     

  3. Now we need to see which sessions are holding the locks, i.e. those that have a lock granted:
    select * from pg_locks where relation = 16829341 and granted

    There are two such sessions. Of course, when they complete, they will no longer be an issue. But when we see they remain there for a long time, it points to an issue.

  4. Find what the sessions owning a lock are doing:
    select * from pg_stat_activity where pid = 32370


    Notice how long this session has been up, and that it is in an 'idle' state. Sometimes it is 'active', when it is e.g. running SQL that takes many hours to complete. In the end, such a session will close, and the issue will be resolved automatically, even though such a slow SQL statement would need to be investigated.

    But here it looks like we have a client that disappeared without closing the txn/session, which can happen when the client dies unexpectedly. There should be a timeout for this session in the end, and it could be this is a Greenplum specific issue. Maybe it is just a DB setting that needs to be changed. This is best to be evaluated by TechOps.

  5. Resolve the issue, here by killing the idle session(s):
    select pg_cancel_backen(32370)


    The statement returns 'true', which means the session was cancelled, BUT that is not always the case. You need to check again, and resort to using pg_terminate_backend() if the session does not disappear.
    Here it worked fine. Also for the second session, which was also idle:

     

 

Now there are no more blocking sessions, all locks have been granted:

 

Found an issue in documentation? Write to us.