Ambari database cleanup - How to Speed up

I have ran the following ambari db-cleanup  on a ambari database used by on one of the 500 node prod HDP cluster. It ran for more than 15 hours but without any success. 

ambari-server db-cleanup -d 2016-09-30 --cluster-name=raj_hdp

I analyzed the ambari server logs to check where it is taking most of the time, it seems to be spending more time on batch deletes on ambari.alert_notice, ambari.alert_current ,ambari.alert_history table.

Since these tables are connected with each other through Foreign keys, the deletes are slowing down because of the constraints.  

I initially tried with disabling the constraints which also gave the faster cleanup completion.  But to make it consistent cleanup i have tried the index on the ambari.alert_notice table. 


To improve the performance of the db cleanup, I have created the index on the ambari.alert_notice table..
-bash-4.1$ psql -U ambari -d ambari
Password for user ambari:
psql (8.4.20) Type "help" for help.
ambari=> CREATE INDEX alert_notice_idx ON ambari.alert_notice(history_id);

After this i ran re-loaded my ambari database from the backup and ran the db-cleanup, it took only less than 2 min to complete the cleanup.

Also to reclaim the disk space and reindex after the cleanup, i ran the following commands as super user "postgres"
Vacuum full;
reindex database ambari;


Note:  This is tested on Ambari 2.5.1 version.

No comments:

Post a Comment

Boost Your Download Speed with lftp Segmentation

Looking for a faster way to download files via sftp to a Linux machine? Try using "lftp" instead. This tool offers segmented downl...

Other relevant topics