Wednesday, July 16, 2014

DB2 for z/OS Buffer Pool Enlargement is NOT Just an Elapsed Time Thing

A couple of weeks ago, I got a question from a mainframe DB2 DBA about the impact of DB2 buffer pool enlargement on application and system performance. This individual had requested an increase in the size of a buffer pool on his system, and the system administration team had agreed to implement the change; however, one of the system administrators told the DBA that while the larger buffer pool would improve elapsed times for application processes accessing objects assigned to the pool, no CPU efficiency gains should be expected.

I am regularly surprised at the persistence of this notion that bigger DB2 for z/OS buffer pools do not drive CPU savings. Let me see if I can set the record straight in clear terms: YES, THEY DO. I'll explain herein why this is so, and I'll provide a means whereby you can measure the CPU -- yes, CPU -- impact of a DB2 buffer pool size increase.

I don't think that anyone would dispute that a larger buffer pool will decrease I/O activity (especially read I/O activity) for objects (table spaces and/or indexes) assigned to the pool. The disagreement is over the impact of I/O activity on the CPU consumption of DB2-accessing applications, and on the CPU consumption of DB2 itself. What I've found is that some people believe that a System z server's I/O assist processors handle ALL of the processing associated with I/O operations (this view seems to be more widely held by people who have been working with mainframes for a long time, perhaps because I/O assist processors were a more talked-about feature of the platform back in the day). This is not true. I/O assist processors offload from general-purpose engines a substantial portion -- but not all -- of the work involved in reading and writing data from and to disk. I/O assist processors are great, and they are one reason that System z has long excelled as a platform for I/O-intensive applications, but general-purpose engines (and zIIP engines, for that matter) still have to shoulder some of the read/write load.

Thus it is that a reduction in I/O activity will reduce CPU consumption on a mainframe system. If you enlarge a DB2 buffer pool (to reduce disk reads and writes) AND you change that buffer pool to be page-fixed in real storage (via -ALTER BUFFERPOOL bpname PGFIX(YES)), you'll get even more in the way of CPU savings, because one of the things that a general-purpose engine typically has to do in support of a DB2 I/O operation is fix in memory the page holding the DB2 buffer in question (the one into which data will be read into or written from) until the I/O action is complete, after which the page is released (i.e., made pageable again). This is done so that the buffer won't be paged out to auxiliary storage in the middle of the I/O operation. When a pool's buffers are fixed in memory from the get-go (true when PGFIX(YES) is in effect), the page-fix/page-release actions formerly needed for I/Os are not required, and CPU consumption is reduced accordingly. In a DB2 10 or 11 for z/OS system, you can get even more CPU efficiency benefits from page-fixed buffer pools, because in those environments DB2 will request that a page-fixed buffer pool be backed by 1 MB page frames, versus 4 KB page frames (the LFAREA parameter of the IEASYSxx member of PARMLIB specifies the amount of a z/OS LPAR's memory that is to be managed in 1 MB frames). The 1 MB page frames save CPU by improving the efficiency of virtual storage to real storage address translation.

OK, on now to measuring the effect of a buffer pool change (such as enlarging a pool, or page-fixing the buffers in a pool) on application and DB2 CPU efficiency. For the application-level CPU effect, use Accounting Long Reports that can be generated by your DB2 for z/OS monitor (depending on the monitor that you use, these might be called Accounting Detail Reports). Input to these reports is the data contained in records generated when DB2 accounting trace classes 1, 2, and 3 are active (these records are typically written to SMF). With those trace classes active (and BEFORE you've implemented the buffer pool change), do the following:
  • Generate an Accounting Long Report for a particular day of the week (e.g., Tuesday) and a particular time period. That time period could capture a "peak" of system activity (e.g., 9-11 AM in the morning), or it might be an entire 24 hours -- go with the FROM and TO times that are of interest to you. You can have the DB2 monitor aggregate information in the report in a variety of ways (using an ORDER or GROUP specification -- or something similar, depending on the monitor that you use -- in the report control statement in the SYSIN part of the JCL for the report-generating job). Use the aggregation level (or levels -- you could choose to generate several reports) of interest to you. Want to see the CPU impact on the overall application workload for the DB2 system? Have the data aggregated at the DB2 subsystem level. Want to see the impact for different subcomponents of the workload (e.g., CICS-DB2 work, DRDA work, call attach facility batch work, etc.)? Have the data aggregated by connection type. Note that, by default, a DB2 monitor will typically aggregate accounting information by primary DB2 authorization ID within DB2 plan name -- that is an aggregation that I usually find to be not very useful.
  • Implement the buffer pool change.
  • Generate an "after" Accounting Long Report, for the same day of the week (e.g., Tuesday) and the same time period (e.g., 9-11 AM) as for the "before" report. Use the same aggregation specification as before (e.g., at the DB2 subsystem level). Looking at the "before" and "after" reports, find the average in-DB2 CPU time (also known as the average class 2 CPU time), which is the average CPU time for SQL statement execution. Note that this time will be in two fields: general-purpose engine time, and "specialty engine" CPU time (this is typically zIIP engine time). Do NOT overlook the specialty engine time -- for some workloads, particularly the DRDA workload that comes through the DB2 DDF address space, specialty engine CPU time can be greater than general-purpose CPU time. See how these CPU times (general-purpose and specialty engine) have changed, and there's your effect at the application level (the "average" is per DB2 accounting trace record -- one of these is usually generated per online transaction, and per batch job). If you requested that the monitor aggregate data at (for example) the connection type level, you will have in the accounting report a sub-report for each connection type (one for the CICS connection type, one for DRDA, one for call attach, etc.), and there will be an average in-DB2 CPU time (again, both a general-purpose engine and a specialty engine time) in each of these sub-reports.

The procedure for measuring the impact of a buffer pool change on DB2's CPU consumption (i.e., on the CPU time charged to DB2 tasks versus tasks associated with DB2-accessing application programs) is similar to what I described above:
  • BEFORE making the buffer pool change, use your DB2 monitor to generate a Statistics Long Report for the subsystem (your monitor might refer to this as a Statistics Detail Report). Input to this report is the data in records generated by the "standard" DB2 statistics trace classes (1, 3, 4, 5, and 6). Use the same day of the week and same time period as for the aforementioned Accounting Long Reports.
  • AFTER making the buffer pool change, generate another Statistics Long Report, for the same day of the week and the same time period as before. In the "before" and "after" reports, find the section of the report in which the CPU times for the DB2 address spaces are provided. Look at the CPU times for the DB2 database services address space (the one most affected by I/O activity -- it handles prefetch reads and database writes), and there's your DB2 CPU impact. I say "look at the CPU times" because you should see both a total CPU time for the address space and a field with a name like "preemptable IIP SRB time." The latter is zIIP engine time, and it is NOT included in the former (reported "total" CPU time is general-purpose engine time).

To summarize this blog entry's message: buffer pool size increases should deliver CPU savings on your system, at both the application level and the DB2 subsystem level, by reducing I/O activity. Those CPU savings can be boosted further by page-fixing pools (usually done most effectively for your higher-activity pools), and page-fixed pools save additional CPU when they are backed by 1 MB page frames (automatic in DB2 10 and DB2 11 environments, when LFAREA in IEASYSxx sets aside some of the LPAR's memory resource to be managed in 1 MB frames). When you've made a buffer pool change that should provide enhanced CPU efficiency for your DB2 applications and subsystem, by all means measure that impact. Your best measurement tool for that purpose is your DB2 monitor, and the Accounting and Statistics Long Reports that it can generate.

I hope that this information will be useful to you.

No comments:

Post a Comment