Oracle GoldenGate 12.2 Heartbeat Table Issue Resolved

Monitoring lag in GoldenGate has always been an important part of monitoring GoldenGate.  Lags are reported in several ways.  When using the ggsci command lag only the latest lag is reported with a 1 second resolution providing the last reported lag.  This isn’t accurate and does not provide a history for the Oracle GoldenGate 12.2 Heartbeat.  In the past, GoldenGate implementers have created heartbeat tables manually.

In Oracle GoldenGate 12.2 a built-in heartbeat table feature has been added.  This heartbeat table allows for more accurate heartbeats and heartbeat history.  It works by creating an artificial transaction every minute that contains timing information that is used for heartbeats.  The heartbeat tables are accessed via views that provide accurate lag data and lag history.

Recently I setup a pair of Oracle databases with Oracle GoldenGate 12.2.  I setup these systems in hope of testing the new Heartbeattable feature that was introduced in GoldenGate version 12.2.  The internal heartbeat mechanism is a great improvement in that it provides automatic and accurate lag times and includes a lag history.

I implemented GoldenGate 12.2 between two Oracle 11.2.0.4 databases using standard parameter files that I typically use as a starting point for GoldenGate projects or testing.  In addition, database connectivity was set up using the GoldenGate credentialstore.  In this case, for testing I have set up GoldenGate to replicate the HR example tables.  Unfortunately the heartbeat mechanism failed to work.  This blog entry describes the issue that I had and potential solutions.

Configuring GoldenGate

In order to configure replication I used the following GLOBALS and extract parameter files:

GLOBALS

GGSCHEMA ggadmin

EXT1HR

———————————-

— Local extract for HR schema

————————————

 

Extract EXT1HR

 

SETENV (NLS_LANG = AMERICAN_AMERICA.AL32UTF8)

 

USERIDALIAS ggadm

 

ReportCount Every 30 Minutes, Rate

Report at 01:00

ReportRollover at 01:15

 

DiscardFile dirrpt/EXT1HR.dsc, Append

DiscardRollover at 02:00 On Sunday

 

DDL INCLUDE MAPPED

TRANLOGOPTIONS EXCLUDEUSER ggadmin

GETTRUNCATES

 

ExtTrail dirdat/la

Table HR.*;

 

At first look I thought that this should work.  The GLOBALS parameter GGSCHEMA ggadmin is required for both DDL replication and the heartbeattable.  Unfortunately this turned out to be part of the problem.

The pump parameter file was configured for passthrough as shown here:

PUMP1HR

————————————

— Pump extract for HR schema

————————————

 

Extract PUMP1HR

 

PASSTHRU

 

USERIDALIAS ggadm

 

ReportCount Every 1000 Records, Rate

Report at 01:00

ReportRollover at 01:15

 

DiscardFile dirrpt/PUMP1HR.dsc, Append

DiscardRollover at 02:00 ON SUNDAY

 

RmtHost target, MgrPort 7809

 

RmtTrail dirdat/ra

 

Table HR.*;

 

The initial extract and pump were registered using an obey file containing the following commands:

dblogin useridalias ggadm

 

add extract EXT1HR, tranlog, begin now

 

add exttrail dirdat/la, extract EXT1HR, megabytes 100

 

add extract PUMP1HR, exttrailsource dirdat/la

 

add rmttrail dirdat/ra, extract PUMP1HR, megabytes 100

 

On the target GoldenGate 12.2 was setup using the following GLOBALS and replciat parameter files:

GLOBALS

GGSCHEMA ggadmin

REP1HR

————————————

— Replicat for HR schema

————————————

 

replicat REP1HR

 

SETENV (NLS_LANG = AMERICAN_AMERICA.AL32UTF8)

 

USERIDALIAS ggadm

 

BATCHSQL

 

HandleCollisions

 

— Only one of these is used at a time

AssumeTargetDefs

 

ReportCount Every 30 Minutes, Rate

Report at 01:00

ReportRollover at 01:15

 

DiscardFile dirrpt/REP1HR.dsc, Append

DiscardRollover at 02:00 ON SUNDAY

 

Map HR.COUNTRIES, Target HR.COUNTRIES ;

Map HR.DEPARTMENTS, Target HR.DEPARTMENTS ;

Map HR.EMPLOYEES, Target HR.EMPLOYEES ;

Map HR.JOBS, Target HR.JOBS ;

Map HR.JOB_HISTORY, Target HR.JOB_HISTORY ;

Map HR.LOCATIONS, Target HR.LOCATIONS ;

Map HR.REGIONS, Target HR.REGIONS ;

 

This replicat was registered using an obey file that contained the following commands:

dblogin USERIDALIAS ggadm

 

add checkpointtable ggadmin.rep1hr_chkpt

 

add replicat REP1HR, exttrail dirdat/ra, checkpointtable ggadmin.rep1hr_chkpt

 

All of the GoldenGate processes were started and test transactions run.  Once GoldenGate was verified to be running correctly it was time to set up the 12.2 heartbeat table and see if it worked.

Setting up the Built-In Heartbeat Table

The heartbeat table was setup first on the target system and then on the source and the results of that are shown here:

ADD HEARTBEATTABE on target

GGSCI > add heartbeattable

 

2016-02-18 10:00:13  INFO    OGG-14001  Successfully created heartbeat seed table [“GG_HEARTBEAT_SEED”].

 

2016-02-18 10:00:13  INFO    OGG-14032  Successfully added supplemental logging for heartbeat seed table [“GG_HEARTBEAT_SEED”].

 

2016-02-18 10:00:13  INFO    OGG-14000  Successfully created heartbeat table [“GG_HEARTBEAT”].

 

2016-02-18 10:00:13  INFO    OGG-14033  Successfully added supplemental logging for heartbeat table [“GG_HEARTBEAT”].

 

2016-02-18 10:00:13  INFO    OGG-14016  Successfully created heartbeat history table [“GG_HEARTBEAT_HISTORY”].

 

2016-02-18 10:00:13  INFO    OGG-14023  Successfully created heartbeat lag view [“GG_LAG”].

 

2016-02-18 10:00:13  INFO    OGG-14024  Successfully created heartbeat lag history view [“GG_LAG_HISTORY”].

 

2016-02-18 10:00:13  INFO    OGG-14003  Successfully populated heartbeat seed table with [ORCLS].

 

2016-02-18 10:00:13  INFO    OGG-14004  Successfully created procedure [“GG_UPDATE_HB_TAB”] to update the heartbeat tables.

 

2016-02-18 10:00:13  INFO    OGG-14017  Successfully created procedure [“GG_PURGE_HB_TAB”] to purge the heartbeat history table.

 

2016-02-18 10:00:13  INFO    OGG-14005  Successfully created scheduler job [“GG_UPDATE_HEARTBEATS”] to update the heartbeat tables.

 

2016-02-18 10:00:13  INFO    OGG-14018  Successfully created scheduler job [“GG_PURGE_HEARTBEATS”] to purge the heartbeat history table.

 

ADD HEARTBEATTABLE on source

GGSCI > add heartbeattable

 

2016-02-18 10:02:36  INFO    OGG-14001  Successfully created heartbeat seed table [“GG_HEARTBEAT_SEED”].

 

2016-02-18 10:02:37  INFO    OGG-14032  Successfully added supplemental logging for heartbeat seed table [“GG_HEARTBEAT_SEED”].

 

2016-02-18 10:02:37  INFO    OGG-14000  Successfully created heartbeat table [“GG_HEARTBEAT”].

 

2016-02-18 10:02:37  INFO    OGG-14033  Successfully added supplemental logging for heartbeat table [“GG_HEARTBEAT”].

 

2016-02-18 10:02:37  INFO    OGG-14016  Successfully created heartbeat history table [“GG_HEARTBEAT_HISTORY”].

 

2016-02-18 10:02:37  INFO    OGG-14023  Successfully created heartbeat lag view [“GG_LAG”].

 

2016-02-18 10:02:37  INFO    OGG-14024  Successfully created heartbeat lag history view [“GG_LAG_HISTORY”].

 

2016-02-18 10:02:37  INFO    OGG-14003  Successfully populated heartbeat seed table with [ORCLP].

 

2016-02-18 10:02:37  INFO    OGG-14004  Successfully created procedure [“GG_UPDATE_HB_TAB”] to update the heartbeat tables.

 

2016-02-18 10:02:37  INFO    OGG-14017  Successfully created procedure [“GG_PURGE_HB_TAB”] to purge the heartbeat history table.

 

2016-02-18 10:02:37  INFO    OGG-14005  Successfully created scheduler job [“GG_UPDATE_HEARTBEATS”] to update the heartbeat tables.

 

2016-02-18 10:02:37  INFO    OGG-14018  Successfully created scheduler job [“GG_PURGE_HEARTBEATS”] to purge the heartbeat history table.

 

Once the heartbeat table was created it should have been an easy matter to go to the target system and query the ggadmin.gg_heartbeat and ggadmin.gg_heartbeat_history tables to see the automated heartbeats.  Unfortunately at this point there were no rows in these tables.  It would take a little bit of investigation in order to determine what the issue was.

Debugging the Issue

Since there are a number of parts to the heartbeat table including the extract, pump and replicat I had to decide where to start.  I knew that the heartbeat mechanism took advantage of the replication that was currently configured in order to move its heartbeat information from the source to the target.  I also believed that it was using GoldenGate replication.

In order to see if anything was moving in the trail file I used the GoldenGate stats command against the extract.  I only saw the transactions that I had run for my own testing of the replication.  This led me to believe that it was a problem at the source side.  I also ran logdump and looked at the trail files and I saw no “heartbeat” records in the trail.

I then looked in the database at the database schedule to see if the GG_UPDATE_HEARTBEATS job was running.  It was and it was updating the GG_HEARTBEAT_SEED HEARTBEAT_TIMESTAMP column.  So, the scheduler job was running and the column was being updated so there was nothing in the trail file, so it was most likely an issue with replication.

Looking back at the extract parameter file it became apparent that this might be related to the parameter TRANLOGOPTIONS EXCLUDEUSER ggadmin parameter.  So, I commented out that parameter and suddenly the GG_HEARTBEAT and GG_HEARTBEAT_HISTORY table began populating on the target side.  In addition, after a while a stats command against the extract showed updates to the GG_HEARTBEAT_SEED table.

*** Daily statistics since 2016-02-18 00:00:00 ***

        Total inserts                                      0.00

        Total updates                                    603.00

        Total deletes                                      0.00

        Total discards                                     0.00

        Total operations                                 603.00

 

In addition the gg_lag_history view showed the data that I was looking for:

column heartbeat_received_ts format a30

column incoming_path format a40

column incoming_timestamp format 9.99999999

 

select heartbeat_received_ts, incoming_path, incoming_lag from gg_lag_history;

 

18-FEB-16 11.41.41.893670 AM    EXT1HR ==> PUMP1HR ==> REP1HR            4.868161000

18-FEB-16 11.42.42.944752 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.925239000

18-FEB-16 11.43.41.993427 AM    EXT1HR ==> PUMP1HR ==> REP1HR            4.970112000

18-FEB-16 11.44.42.041501 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.021219000

18-FEB-16 11.45.43.091402 AM    EXT1HR ==> PUMP1HR ==> REP1HR            6.065255000

18-FEB-16 11.46.42.140396 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.121159000

18-FEB-16 11.47.41.189519 AM    EXT1HR ==> PUMP1HR ==> REP1HR            4.136698000

18-FEB-16 11.48.43.240700 AM    EXT1HR ==> PUMP1HR ==> REP1HR            6.215541000

18-FEB-16 11.49.43.289275 AM    EXT1HR ==> PUMP1HR ==> REP1HR            6.268965000

18-FEB-16 11.50.42.338042 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.312789000

18-FEB-16 11.51.42.386426 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.368242000

18-FEB-16 11.52.42.435722 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.412282000

18-FEB-16 11.53.43.487633 AM    EXT1HR ==> PUMP1HR ==> REP1HR            6.470509000

18-FEB-16 11.54.42.538333 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.523112000

18-FEB-16 11.55.42.588896 AM    EXT1HR ==> PUMP1HR ==> REP1HR            5.563893000

 

This was now successful.

Problem Recap

Because I was setting up for bi-directional replication and using the ggadmin user for both source and target I had configured the excludeuser parameter.  This keeps GoldenGate from re-replicating replicated transactions by ignoring transactions submitted by the ggadmin user.  This was fine for normal transactions but I didn’t expect that the heartbeat transactions would be excluded as well.

The excludeuser ggadmin caused updates to the GG_HEARTBEAT_SEED table to not be replicated.  You cannot exclude the ggadmin user in the extract.  The heartbeat schema is defined by the GGSCHEMA parameter in the GLOBALS file.  In addition, the GGSCHEMA parameter defines the schema for DDL replication.  This causes a bit of a problem for bi-directional replication when you want to use the ggadmin user for both extract and replicat. 

Solutions

I thought about a number of different solutions to this problem and consulted some of my colleagues.  We decided that the best approach to this problem was to simply use a different Oracle database user for extract and replicat.  This would allow us to still maintain the same GGSCHEMA user for heartbeat and DDL replication.  The new user account used for the replicat would be excluded in the TRANLOGOPTIONS EXCLUDEUSER parameter and everything should work well.

This would allow the ggadmin user at the source to submit both DDL and heartbeats, since you can only have one setting for GGSCHEMA which both must use.  The different user that is used for replicating back to the source will be excluded via TRANLOGOPTIONS EXCLUDEUSER <new user>.

I really would like to see the GoldenGate developers take a look at this and internally allow replication of the GoldenGate lag tables to be excluded from the excludeuser option.

Managing Heartbeat Data

As seen above the heartbeat table is created via the ADD HEARTBEATTABLE command within ggsci.  By default a heartbeat is generated every minute, retained for 30 days then purged.  The frequency of the heartbeat, the history retention and how often the purge process runs is configurable.  This is done via the ALTER HEARTBEATTABLE command.

Viewing Heartbeat Data

Viewing the heartbeat table is done via the two heartbeat views; GG_LAG and GG_LAG history.  These views provide information on lags for each set of ext -> pump -> replicat that is configured.  This information as well as the history is valuable for monitoring the performance of the GoldenGate configuration.

I have implemented viewing these tables via the following scripts.  The scripts and output are shown here:

lag.sql

col local_database format a10

col current_local_ts format a30

col remote_database format a10

col incoming_path format a30

col incoming_lag format 999,999.999999

 

select local_database, current_local_ts, remote_database, incoming_path, incoming_lag from gg_lag;

 

Output of lag.sql

TARGET on GG16B:ggadmin > @lag

 

LOCAL_DATA CURRENT_LOCAL_TS               REMOTE_DAT INCOMING_PATH                     INCOMING_LAG

———- —————————— ———- —————————— —————

ORCLS      19-FEB-16 08.23.40.715171 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR        5.848900

 

Lag_history.sql

set pagesize 100

col local_database format a10

col heartbeat_received_ts format a30

col remote_database format a10

col incoming_path format a32

col incoming_lag format 999,999.999999

 

select local_database, heartbeat_received_ts, remote_database, incoming_path, incoming_lag from gg_lag_history;

 

Output of lag_history.sql

ORCLS      19-FEB-16 08.30.40.678817 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          3.656291

ORCLS      19-FEB-16 08.31.41.702019 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.681604

ORCLS      19-FEB-16 08.32.41.724873 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.700153

ORCLS      19-FEB-16 08.33.41.747616 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.729306

ORCLS      19-FEB-16 08.34.41.770055 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.748563

ORCLS      19-FEB-16 08.35.41.793370 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.769183

ORCLS      19-FEB-16 08.36.41.816100 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.798405

ORCLS      19-FEB-16 08.37.41.839139 PM   ORCLP       EXT1HR ==> PUMP1HR ==> REP1HR          4.817937

 

The output of the lag history can be used to monitor lags over long periods of time and be used for alerting and monitoring.  The lag history can be imported into a spreadsheet and graphed as well.

GGLagGraph