Quantcast
Channel: Maris Elsins – Official Pythian Blog
Viewing all 26 articles
Browse latest View live

Be Warned: cmclean.sql Is Dangerous!

$
0
0

I’m sure one of the most popular scripts for Apps DBAs on My Oracle Support is cmclean.sql from MOS Article ID 134007.1 “Concurrent Processing – CMCLEAN.SQL – Non Destructive Script to Clean Concurrent Manager Tables”. DBAs usually use the script to clean up stale data from concurrent processing tables (FND_CONCURRENT_%) after incidents like a crash of the database or concurrent processing node. This script sets correct completion phase and status codes for terminated concurrent requests and sets correct control codes for terminated concurrent manager processes. Despite the assuring “Non Destructive” claim in the title of the MOS Article there is a possibility to lose concurrent request schedules when cmclean.sql is executed.

First of all it’s important to understand how scheduled concurrent requests are executed and resubmitted. A simplified process of the execution is:

  1. Concurrent manager process (e.g. FNDLIBR in case of Standard Manager) queries the FND_CONCURRENT_REQUESTS table for pending requests.
  2. When a pending request is found, the manager process updates the PHASE_CODE=R (Running) and STATUS_CODE=R (Running).
  3. The next step is to start the executable of the concurrent program. If it’s a PL/SQL procedure – FNDLIBR  connects to the DB and executes the PL/SQL code, if it’s a java program – FNDLIBR starts up a java process to execute the java class, etc.
  4. FNDLIBR catches the exit codes from the executable of the concurrent program and updates the statuses in FND_CONCURRENT_REQUESTS accordingly – PHASE_CODE=C (Completed) and STATUS_CODE = C (Normal), G (Warning) or E (Error).
  5. FNDLIBR checks if the concurrent request has a schedule and needs to be resubmitted. If yes – it resubmits a new concurrent request with the same parameters.

But what happens if the FNDLIBR process crashes, terminates or gets killed while it’s running a concurrent request? Who takes care of the statuses in FND_CONCURRENT_REQUESTS table and how the request is resubmitted if the concurrent manager process is not there anymore?

It appears the Internal Concurrent Manager (ICM) takes care of these tasks. It checks the running requests periodically (every two minutes by default) and if it finds any that are missing the concurrent manager process and the DB session, it updates the statuses for the concurrent request and also resubmits it if it has a schedule. This action is followed by a log entry in the ICM log file:

                   Process monitor session started : 17-JUL-2013 04:24:24

Found running request 5829148 attached to dead manager process.
Setting request status to completed.

Found dead process: spid=(15160), cpid=(2032540), ORA pid=(35), manager=(0/0)

Starting STANDARD Concurrent Manager               : 17-JUL-2013 04:24:25

                     Process monitor session ended : 17-JUL-2013 04:24:25

Interesting to note, if the Internal Concurrent Manager is terminated at the same time with the manager process and is restarted later by the reviver process or by running “adcmctl.sh start” manually, the ICM performs the same check of running requests as part of the startup sequence, but this time it restarts the request instead of terminating and resubmitting it. The log of the ICM contains the following lines:

Found running request 5829146 attached to dead manager process.
Attempting to restart request.

The concurrent request is started again with exactly the same request_id as the previous time it was terminated, and the log file of the request will contain information from 2 executions – the 1st which didn’t complete and then the 2nd which probably completed. I think this scenario is very confusing and instead of restarting the request it should better be terminated and a new one should be submitted.

Let’s get back to the problem with cmclean.sql! The worst thing that can be done is running cmclean.sql after the crash of the concurrent processing node before starting up the concurrent managers. Why? Because cmclean.sql cleans up data in FND_CONCURRENT_REQUESTS by executing one simple update statement to change the phase and status of any “Running” or “Terminating” request to “Completed/Error”:

UPDATE fnd_concurrent_requests
SET phase_code = 'C', status_code = 'E'
WHERE status_code ='T' OR phase_code = 'R';

Cmclean.sql does not resubmit the request if it has a schedule. Execute it and you risk to lose some scheduled programs without any warning.

Similarly – never run cmclean.sql if you stopped the concurrent managers using “adcmctl.sh abort” or “kill -9” on concurrent manager processes to speed up the shutdown procedure. There’s the same risk to lose some scheduled requests.

Despite the risks, cmclean.sql is still a useful tool in case concurrent managers don’t come up after a failure or there are some stale data that is otherwise not cleaned up. But please, be careful when you run it! Check closely the list of requests reported in the following section of the outputs from cmclean.sql, because these requests have to be resubmitted manually if they had schedules.

-- Updating any Running or Terminating requests to Completed/Error

Request ID Phase  Status
---------- ------ ------
6607       R      W
6700       R      W
893534056  R      R

3 rows updated.

“Concurrent Manager Recovery” wizard is even worse! (Added on Jul 21, 2013)

After posting this article I started thinking about whether the “Concurrent Manager Recovery” Wizard available from Oracle Applications Manager in e-Business Suite was any better then cmclean.sql or not. As I didn’t have much experience with it I decided to give it a try. This is what I did:

  1. I scheduled 2 concurrent programs (“CP Java Regression Test” and “CP PLSQL Regression Test”) to restart in 1 minute after the previous execution completes. These are simple test concurrent programs which sleep for some time and then complete.
  2. I made sure both programs were running and terminated all concurrent manager process and DB sessions for these concurrent programs.
  3. The termination of the processes and sessions left the rows in FND_CONCURRENT_REQUESTS with PHASE_CODE=R and STATUS_CODE=R
  4. I executed the “Concurrent Manager Recovery” wizard which fixed the status codes of the concurrent manager processes, but didn’t touch the statuses of the concurrent requests – I thought this was a good thing (I expected the ICM to clean up the statuses and resubmit the requests at its startup phase)
  5. I started up the concurrent managers, but ICM didn’t clean up the 2 stale records in FND_CONCURRENT_REQUESTS table. The 2 requests appeared as they would be running, while in fact they didn’t have any OS processes or DB sessions.

I didn’t have much time to look into the details, but it looks like the ICM is only cleaning up requests attached to dead managers (“Active” status in the FND_CONCURRENT_PROCESSES table and no OS processes running). Here, the Wizard updated the statuses of the manager processes as if they completed normally, so the ICM couldn’t identify them as being “dead”.
This actually means that the “Concurrent Manager Recovery” wizard can cause serious issues too – it doesn’t clear up the concurrent_request statuses and it prevents ICM from doing it too, so once we start up the system the terminated requests appear as if they were running. And because of this, the Conflict Resolution Manager might prevent execution of some other programs with the incompatibility rules against the terminated requests. You will need to stop the managers and run cmclean.sql to fix the statuses (and loose the schedules) to get out of this situation.

So what should we do to clean up the concurrent processing tables after crashes or cloning? (Added on Jul 21, 2013)

It appears to me that no reliable way exists to clean up the tables properly. The cmclean.sql can remove some schedules without warning. The “Concurrent Manager Recovery” wizard may leave some requests in the running state even if they were terminated.
I’m going to open a SR for Oracle to request a proper solution, but meanwhile I’d suggest to use the cmclean.sql. However, make sure to check its outputs carefully and reschedule any requests which got cleaned up (as described above).


P.S. The description of the behavior of ICM in this blog post is a result of investigation performed on R12.1.3. I believe it behaves the same way in R12.1 and probably even in R12.0 and 11i, but I didn’t check. MOS Article ID 134007.1 which contains the cmclean.sql script is valid for Applications versions 10.7 to 12.1.3 – be careful when using it independently from the version of your e-Business Suite installation.


There’s Always Another Bug Hiding Just Around the Corner

$
0
0

We were using a 10.2.0.3 database, and it had been running without any issues for several years. What could possibly go wrong? Anything! Suddenly, we started getting “ORA-07445: exception encountered: core dump [qercoStart()+156] [SIGSEGV] [Address not mapped to object] ” a few times a minute in the alert log. A closer investigation revealed that one of the popular SQLs in the application couldn’t complete anymore. It looked like a bug, since only the SQL was failing.

We found a few references for various releases with the same conditions: ORA-07445 + qercoStart(). This list summarizes the possible causes for the error I found on My Oracle Support:

  • Using ROWNUM < x condition in the where clause
  • Using ROWNUM condition and FULL OUTER joins
  • Using ROWNUM condition with UNION ALL set operation

The strange thing was that this started suddenly; no changes to the code were made. Moreover, the SQL didn’t contain FULL OUTER join operations or UNION ALL  set operations:

SELECT CBMD.CBMD_BASE_MDL_NUMBER,
 MFG_GROUP MFG_ID,
 MFG_NAME,
 CBMD_CATALOG_MODEL_NUMBER,
 CBMD_CATALOG_MODEL_SHORT_DESC,
 CBMD_IMAGE_PATH,
 SUM (CSMD_AVAILABLE_QTY) QTY
FROM CATALOG_BASE_MODEL_DATA CBMD,
 CATALOG_BASE_MODEL_CATEGORY CBMC,
 CATALOG_SUB_MODEL_DATA CSMD
WHERE CBMD.CBMD_BASE_MDL_NUMBER = CSMD.CBMD_BASE_MDL_NUMBER
 AND CBMD.CBMD_BASE_MDL_NUMBER = CBMC.CBMD_BASE_MDL_NUMBER
 AND (
 (CSMD.CSMD_AVAILABLE_FOR_SALE_FLAG = 'N'
 AND CBMC.DC_DIVISION_CODE = 1)
 OR (CBMC.DC_DIVISION_CODE = 2)
 )
 AND CBMD_PUT_ON_WEB_FLAG = 'Y'
 AND CSMD_AVAIL_FOR_WEB_DISP_FLAG = 'Y'
 AND CBMC.DC_DIVISION_CODE = :B1
 AND ROWNUM < 5
GROUP BY CBMD.CBMD_BASE_MDL_NUMBER,
 MFG_GROUP,
 MFG_NAME,
 CBMD_CATALOG_MODEL_NUMBER,
 CBMD_CATALOG_MODEL_SHORT_DESC,
 CBMD_IMAGE_PATH,
 CBMD_IMAGE_NAME
ORDER BY QTY DESC

We also tried all the possible workarounds listed in the bug descriptions, but nothing helped:

  • Flushed the shared pool
  • setting “_complex_view_merging”=false
  • bouncing the database

As raising a SR for our 10.2.0.3 DB was unlikely to help, I decided to dig deeper. I knew something had changed, and that change was what triggered the bug.  I didn’t know where to start, so I decided to look more closely in the bug descriptions in My Oracle Support. All the bugs listed examples of SQL statements containing “ROWNUM < X” condition. The second similarity was harder to notice. Here are some examples – I’ve highlighted the interesting lines:

  1. from bug 7704557 on 10.2.0.4
    select jsp1.name name , jsp1.value value
    from "SYSJCS".jcs_scheduler_parameters jsp1
    where jsp1.name in ('database_name', 'global_names', 'scheduler_hostname', 'remote_start_port', 'scheduler_connect_string', 'oracle_sid', 'listener_port', 'remote_http_output','remote_http_port')
    and rownum <= 9
    and scheduler_name = nvl (:scheduler, scheduler_name);
    
  2. from bug 7528596 on 10.2.0.3
    SELECT /*+ FIRST_ROWS(200) */ rv.STATUS_NAME H_STATUS_ID
    ,rv.DESCRIPTION H_DESCRIPTION  ,rv.PRIORITY_MEANING H_PRIORITY_CODE
    ,rv.CREATED_BY_NAME H_CREATED_BY  , rv.CREATED_BY_EMAIL H_CREATED_BY_E ,
    H_CREATED_BY_N  , rv.CREATION_DATE H_CREATION_DATE  ,rv.ASSIGNED_TO_NAME
    H_ASSIGNED_TO_USER_ID  , rv.ASSIGNED_TO_EMAIL H_ASSIGNED_TO_USER_ID_E ,
    rv.ASSIGNED_TO_USERNAME H_ASSIGNED_TO_USER_ID_N  ,rv.REQUEST_ID
    H_REQUEST_ID
    FROM itgadm.kcrt_requests_v rv,
         itgadm.kcrt_req_header_details rh WHERE (1=1
    AND (rv.batch_number =1 OR rv.batch_number is null)
    AND rv.REQUEST_TYPE_ID in (30593)
    AND rv.REQUEST_TYPE_ID in (30593)
    AND ( rv.STATUS_CODE NOT LIKE 'CLOSED%'
    AND rv.STATUS_CODE NOT LIKE 'CANCEL%' )
    AND exists(SELECT /*+ NO_UNNEST */
    pcv.REQUEST_ID FROM itgadm.KCRT_PARTICIPANT_CHECK_V pcv WHERE
    pcv.request_id = rv.request_id and pcv.user_id = 30481)
    AND rh.request_id = rv.request_id )
    AND ( rh.PARAMETER22 = 30481 OR rv.ASSIGNED_TO_USER_ID = 30481 )
    AND ROWNUM <= 200
    ORDER BY rv.REQUEST_ID DESC;
  3. from bug 7416171 on 10.2.0.3
    SELECT COUNT(*) FROM (
      SELECT (
        SELECT DECODE(COUNT(*),0,0,1) isStemData
        FROM ccd2.vw_policy_admins q
        WHERE q.system_id = a.system_id
          AND q.policy_no = a.policy_no
          AND ((q.policy_admin_id in (440502405,440502499)))
          AND rownum < 2)isStemData
      FROM ccd2.vw_ordf_partners_cnt a
      WHERE a.PARTNER_ID = 36977489
      /*AND a.PARTNER_ID2 = a.PARTNER_ID */
      AND a.ORDF_ID = 2
      AND 1=1) b
    WHERE isStemData=1
    AND rownum < 300
  4. from bug 3211315 on 9.2.0.4
    select dummy  from
      (SELECT dummy from dual where rownum < 2)  FULL OUTER JOIN
      (SELECT dummy from dual where rownum < 2)
    using (dummy)

The first 3 SQLs contain “IN” or “OR” operators, and the last one contains a FULL OUTER JOIN set operation that was said to have issues. Knowing a bit of theory helped me identify some similarities:

  • Oracle introduced native FULL OUTER JOIN operation in 10.2.0.5. Before that, it was implemented using the UNION ALL operation. (Cristian Antognini explains it here and gives some examples.)
  • “OR” and “IN” predicates can sometimes be optimized by applying the “OR Expansion” transformation, which acquires the result set of each disjunction condition separately and then combines them using the set operations, i.e. UNION ALL. (Maria Colgan explains it here better than anyone else could.)

At that moment, I started suspecting this could be our case too because the SQL had an “OR” predicate. It was easy to check and confirm by looking at the execution plan. The highlighted line contained the CONCATENATION operation, which is the same as UNION ALL:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
Plan hash value: 132832423

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                         |   434 | 69874 |    61   (4)| 00:00:01 |
|   1 |  SORT ORDER BY                     |                         |   434 | 69874 |    61   (4)| 00:00:01 |
|   2 |   HASH GROUP BY                    |                         |   434 | 69874 |            |          |
|*  3 |    COUNT STOPKEY                   |                         |       |       |            |          |
|   4 |     CONCATENATION                  |                         |       |       |            |          |
|*  5 |      FILTER                        |                         |       |       |            |          |
|   6 |       NESTED LOOPS                 |                         |     1 |   161 |    30   (0)| 00:00:01 |
|   7 |        NESTED LOOPS                |                         |     7 |  1008 |    23   (0)| 00:00:01 |
|*  8 |         TABLE ACCESS FULL          | CATALOG_SUB_MODEL_DATA  |    21 |   420 |     2   (0)| 00:00:01 |
|*  9 |         TABLE ACCESS BY INDEX ROWID| CATALOG_BASE_MODEL_DATA |     1 |   124 |     1   (0)| 00:00:01 |
|* 10 |          INDEX UNIQUE SCAN         | CBMD_C1_1_PK            |     1 |       |     0   (0)| 00:00:01 |
|* 11 |        INDEX RANGE SCAN            | CBMC_C1_1_PK            |     1 |    17 |     1   (0)| 00:00:01 |
|* 12 |      FILTER                        |                         |       |       |            |          |
|  13 |       NESTED LOOPS                 |                         |     2 |   322 |    29   (0)| 00:00:01 |
|  14 |        NESTED LOOPS                |                         |     7 |  1008 |    22   (0)| 00:00:01 |
|* 15 |         TABLE ACCESS FULL          | CATALOG_SUB_MODEL_DATA  |    20 |   400 |     2   (0)| 00:00:01 |
|* 16 |         TABLE ACCESS BY INDEX ROWID| CATALOG_BASE_MODEL_DATA |     1 |   124 |     1   (0)| 00:00:01 |
|* 17 |          INDEX UNIQUE SCAN         | CBMD_C1_1_PK            |     1 |       |     0   (0)| 00:00:01 |
|* 18 |        INDEX RANGE SCAN            | CBMC_C1_1_PK            |     1 |    17 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

A quick Google search gave me a NO_EXPAND hint, which disables the OR expansion. However, I couldn’t use it since it required a code change. I knew that the behavior of the optimizer was controlled by a large number of hidden parameters that are also listed in the 10053 trace:

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever,level 1';

Session altered.

SQL> alter session set tracefile_identifier=CR758708_2;

Session altered.

SQL> alter session set max_dump_file_size=unlimited;

Session altered.

SQL> explain plan for
 SELECT CBMD.CBMD_BASE_MDL_NUMBER,
 /*removed some lines for readability*/
Explained.

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

$ more test_ora_17805_CR758708_2.trc
/*removed some lines for readability*/
...
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
...
 *************************************
 PARAMETERS WITH DEFAULT VALUES
 ******************************
...
 _fast_full_scan_enabled = true
 _optim_enhance_nnull_detection = true
 _parallel_broadcast_enabled = true
 _px_broadcast_fudge_factor = 100
 _ordered_nested_loop = true
 _no_or_expansion = false
 optimizer_index_cost_adj = 100
 optimizer_index_caching = 0
 _system_index_caching = 0
 _disable_datalayer_sampling = false
...

I disabled the OR expansion by setting the parameter _no_or_expansion = true, checked the execution plan, and confirmed that the query transformation didn’t happen:

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
Plan hash value: 1045847658

-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                         |     4 |   644 |   324   (3)| 00:00:04 |
|   1 |  SORT ORDER BY            |                         |     4 |   644 |   324   (3)| 00:00:04 |
|   2 |   HASH GROUP BY           |                         |     4 |   644 |   324   (3)| 00:00:04 |
|*  3 |    COUNT STOPKEY          |                         |       |       |            |          |
|*  4 |     HASH JOIN             |                         |   434 | 69874 |   322   (2)| 00:00:04 |
|*  5 |      TABLE ACCESS FULL    | CATALOG_SUB_MODEL_DATA  |  3777 | 75540 |    62   (4)| 00:00:01 |
|*  6 |      HASH JOIN            |                         |  3087 |   425K|   260   (2)| 00:00:04 |
|*  7 |       INDEX FAST FULL SCAN| CBMC_C1_1_PK            |  1759 | 29903 |    13   (8)| 00:00:01 |
|*  8 |       TABLE ACCESS FULL   | CATALOG_BASE_MODEL_DATA |  2685 |   325K|   247   (2)| 00:00:03 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(ROWNUM<5)
   4 - access("CBMD"."CBMD_BASE_MDL_NUMBER"="CSMD"."CBMD_BASE_MDL_NUMBER")
       filter("CSMD"."CSMD_AVAILABLE_FOR_SALE_FLAG"='N' AND "CBMC"."DC_DIVISION_CODE"=1 OR
              "CBMC"."DC_DIVISION_CODE"=2)
   5 - filter("CSMD_AVAIL_FOR_WEB_DISP_FLAG"='Y')
   6 - access("CBMD"."CBMD_BASE_MDL_NUMBER"="CBMC"."CBMD_BASE_MDL_NUMBER")
   7 - filter(("CBMC"."DC_DIVISION_CODE"=1 OR "CBMC"."DC_DIVISION_CODE"=2) AND
              "CBMC"."DC_DIVISION_CODE"=TO_NUMBER(:B1))
   8 - filter("CBMD_PUT_ON_WEB_FLAG"='Y')

28 rows selected.

In our case, the optimizer had changed the execution plan after fresh statistics were collected – this was the change that triggered the bug. We set the parameter to disable the OR expansion until we upgrade to 11.2.

I wanted to share this story with you because it’s interesting how different things (IN and OR predicates, UNION ALL and FULL OUTER JOIN set operations, etc.) transform behind the scenes into the same conditions to trigger the same bug. I think this incident has also changed the way I’ll read bug descriptions on My Oracle Support in the future – there is information hidden between the lines.

Mining the AWR to Identify Performance Trends

$
0
0

Sometimes it’s useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I’ve been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:

  • How does the performance of a particular SQL change over time?
  • How do wait times of a particular wait event change over time?
  • How does a particular statistic change over time?


Please note, the scripts provided here require diagnostic pack licenses and it’s your task to make sure you have them before running the scripts.

SQL performance

I use script awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:

  1. SQL ID
  2. Days to report. It will summarize all AWR snapshots starting with “trunc(sysdate)-{days to report}+1”, so if you pass “1”, it will summarize all snapshots from today, if “2” – than it’s yesterday and today are included.
  3. Interval in hours. “24” will provide one row for each day, “6” will give 4 rows a day.

Nothing shows it better than an example. Below you see how I’m checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I’m taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval.


TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
16.10.2013 00:00:00         351              195.571           74.995           .097           .000           .000           .000           134417.570      21319182.291        293731.556          304434.305
18.10.2013 00:00:00         364               91.225           47.474          1.687           .000           .000           .002           141140.228      20364053.544        270107.745          273343.709
20.10.2013 00:00:00         542               20.686            9.378           .004           .000           .000           .000           146436.875       4597922.220             3.168                .000
22.10.2013 00:00:00         531               25.060           12.086           .161           .000           .000           .000           146476.605       6026729.224         23999.684           23998.859
24.10.2013 00:00:00         542               51.611           40.675          1.880           .000           .000           .000           146814.220      21620264.039        287994.862          287994.701
26.10.2013 00:00:00         534               39.949           26.688          1.050           .000           .000           .000           147099.275      14081016.607        159704.463          159704.418
28.10.2013 00:00:00         245               37.837           29.384          1.150           .000           .000           .000           147135.216      15505533.959        179244.437          179244.367

7 rows selected.

 


TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
26.10.2013 00:00:00          72               19.209            9.439           .000           .000           .000           .000           147076.000       4623816.597              .111                .000
26.10.2013 06:00:00          72               15.391            9.401           .000           .000           .000           .000           147086.403       4624153.819              .000                .000
26.10.2013 12:00:00          72               14.022            9.351           .000           .000           .000           .000           147099.000       4624579.639              .000                .000
26.10.2013 18:00:00          55               48.174           35.723          1.575           .000           .000           .000           147099.000      19192781.582        243584.055          243584.055
27.10.2013 00:00:00          72               76.723           43.350          2.116           .000           .000           .000           147099.000      23258326.875        314445.111          314445.111
27.10.2013 06:00:00          72               64.921           43.914          2.084           .000           .000           .000           147107.542      23258506.028        315673.000          315673.000
27.10.2013 12:00:00          72               52.567           43.383          2.041           .000           .000           .000           147116.000      23258739.403        315673.000          315673.000
27.10.2013 18:00:00          47               25.522           18.095           .523           .000           .000           .000           147117.532       9382873.851         80597.702           80597.362
28.10.2013 00:00:00          65               17.645            9.384           .000           .000           .000           .000           147120.000       4625354.262              .000                .000
28.10.2013 06:00:00          19               17.571            9.451           .000           .000           .000           .000           147122.421       4625411.263              .000                .000
28.10.2013 12:00:00           6               14.083            9.645           .000           .000           .000           .000           147208.167       4629315.167              .000                .000
28.10.2013 18:00:00          48               42.173           35.208          1.509           .000           .000           .000           147236.375      18606643.833        229433.750          229433.750
29.10.2013 00:00:00          72               53.015           43.517          2.022           .000           .000           .000           147245.125      23265547.847        314507.319          314507.083
29.10.2013 06:00:00          30               52.181           43.638          1.932           .000           .000           .000           147250.300      23265839.767        303949.000          303949.000
29.10.2013 12:00:00           5               59.576           43.836          1.177           .000           .000           .000           144049.800      23267109.200        227814.000          227814.000

15 rows selected.

I’ve checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn’t change – it’s always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan.
Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?

Wait event performance

Script awr_wait_trend.sql can be used to  show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT.

I typically use this script in two situations:

  • To check if a particular wait event performs worse when an overall performance problem is reported (usually I’m looking at IO events)
  • Illustrate how the implemented change improved the situation.

The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files.


TIME                EVENT_NAME                       TOTAL_WAITS   TOTAL_TIME_S    AVG_TIME_MS
------------------- ---------------------------- --------------- -------------- --------------
09.10.2013 00:00:00 log file parallel write              4006177      31667.591          7.905
10.10.2013 00:00:00 log file parallel write              3625342      28296.640          7.805
11.10.2013 00:00:00 log file parallel write              3483249      31032.324          8.909
12.10.2013 00:00:00 log file parallel write              3293462      33351.490         10.127
13.10.2013 00:00:00 log file parallel write              2871091      36413.925         12.683
14.10.2013 00:00:00 log file parallel write              3763916      30262.718          8.040
15.10.2013 00:00:00 log file parallel write              3018760      28262.172          9.362
16.10.2013 00:00:00 log file parallel write              3303205      31062.276          9.404
17.10.2013 00:00:00 log file parallel write              3012105      31831.491         10.568
18.10.2013 00:00:00 log file parallel write              2692697      26981.966         10.020
19.10.2013 00:00:00 log file parallel write              1038399        512.950           .494
20.10.2013 00:00:00 log file parallel write               959443        427.554           .446
21.10.2013 00:00:00 log file parallel write              1520444        606.580           .399
22.10.2013 00:00:00 log file parallel write              1618490        655.873           .405
23.10.2013 00:00:00 log file parallel write              1889845        751.216           .398
24.10.2013 00:00:00 log file parallel write              1957384        760.656           .389
25.10.2013 00:00:00 log file parallel write              2204260        853.691           .387
26.10.2013 00:00:00 log file parallel write              2205783        856.731           .388
27.10.2013 00:00:00 log file parallel write              2033199        785.785           .386
28.10.2013 00:00:00 log file parallel write              2439092        923.368           .379
29.10.2013 00:00:00 log file parallel write              2233614        840.628           .376

21 rows selected.

Visualizing the data from output like that is easy too!

Creating a graph has never been easier

System Statistics

The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again – the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there’s huge number of statistics available (638 different statistics in 11.2.0.3) and that’s why I’m sure you’ll find your own reasons to use this script.


TIME                STAT_NAME                             VALUE
------------------- ------------------------- -----------------
27.10.2013 00:00:00 redo size                        1739466208
27.10.2013 04:00:00 redo size                        2809857936
27.10.2013 08:00:00 redo size                         648511376
27.10.2013 12:00:00 redo size                         533287888
27.10.2013 16:00:00 redo size                         704832684
27.10.2013 20:00:00 redo size                         819854908
28.10.2013 00:00:00 redo size                        2226799060
28.10.2013 04:00:00 redo size                        3875182764
28.10.2013 08:00:00 redo size                        1968024072
28.10.2013 12:00:00 redo size                        1125339352
28.10.2013 16:00:00 redo size                        1067175300
28.10.2013 20:00:00 redo size                         936404908
29.10.2013 00:00:00 redo size                        1758952428
29.10.2013 04:00:00 redo size                        3949193948
29.10.2013 08:00:00 redo size                        1715444632
29.10.2013 12:00:00 redo size                        1008385144
29.10.2013 16:00:00 redo size                         544946804

17 rows selected.

redo size


AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too!
P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you’ll find new versions for the script.


Update (4-Nov-2013)

I’ve added the instance numbers to the outputs in all three scripts. This is how it looks now:


 INST TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
    1 28.10.2013 00:00:00         840                 .611             .014           .595           .007           .000           .000                1.000          1085.583           128.724                .000
      30.10.2013 00:00:00        1466                 .491             .011           .479           .005           .000           .000                1.000           976.001            88.744                .000
      01.11.2013 00:00:00         542                 .798             .023           .760           .025           .000           .000                1.000           896.978           114.196                .000
      03.11.2013 00:00:00         544                 .750             .021           .719           .017           .000           .000                1.000          1098.213           134.941                .000

    2 28.10.2013 00:00:00        1638                 .498             .017           .474           .013           .000           .000                1.001           953.514            96.287                .000
      30.10.2013 00:00:00        1014                 .745             .022           .712           .019           .000           .000                1.000          1034.249           131.057                .000
      01.11.2013 00:00:00        1904                 .633             .011           .624           .002           .000           .000                1.000          1045.668           104.568                .000
      03.11.2013 00:00:00         810                 .602             .017           .581           .010           .000           .000                1.000           929.778           108.998                .000


8 rows selected.

Meaning of “Disk Reads” Values in DBA_HIST_SQLSTAT

$
0
0

This post relates to my previous writing on mining the AWR. I noticed that it’s very easy to misinterpret the DISK_READS_TOTAL and DISK_READS_DELTA columns in DBA_HIST_SQLSTAT. Let’s see what the documentation says:

  • DISK_READS_TOTAL – Cumulative number of disk reads for this child cursor
  • DISK_READS_DELTA – Delta number of disk reads for this child cursor

You might think it’s clear enough and that’s exactly what I thought too. The number of disk reads is the number of IO requests to the storage. But is it really true?
I started suspecting something was not right after using my own awr_sqlid_perf_trend.sql script (see more details on this script here). I noticed the DISK_READS_DELTA values were too close to BUFFER_GETS_DELTA values for queries that use full table scans, which are normally executed using multi-block IO requests to the storage. I was expecting disk reads to be at least two times lower than the buffer gets, but it was something closer to 90% in a few cases. So was I looking at the number of IO requests or the number of blocks read from disks? The best way to find it out was a test case.
The following testing was done in an 11.2.0.3 database:

  1. I created a new AWR snapshot and enabled tracing for my session. I made sure the db_file_multiblock_read_count parameter was set to a high value and then executed a SQL that was forced to use a full table scan (FTS) to read the data from disks. Another AWR snapshot was taken after that.
    SQL> alter session set tracefile_identifier='TEST1';
    Session altered.
    
    SQL> show parameter multiblock
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    db_file_multiblock_read_count        integer     26
    
    SQL> alter system set db_file_multiblock_read_count=128;
    System altered.
    
    SQL> show parameter multiblock
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    db_file_multiblock_read_count        integer     128
    
    SQL> exec dbms_workload_repository.create_snapshot();
    PL/SQL procedure successfully completed.
    
    SQL> alter session set max_dump_file_size=unlimited;
    Session altered.
    
    SQL> alter system set events '10046 trace name context forever, level 12';
    System altered.
    
    SQL> select /*+ full(a) */ count(I_DATA) from tpcc.item;
    COUNT(I_DATA)
    -------------
           100000
    
    SQL> exec dbms_workload_repository.create_snapshot();
    PL/SQL procedure successfully completed.
    
  2. I found the sql_id in the trace file (it was 036c3dmx2u3x9) and executed the awr_sqlid_perf_trend.sql to find out how many disk reads were made (I removed a few columns that are not important here).
    SQL> @awr_sqlid_perf_trend.sql 036c3dmx2u3x9 20 0.001
    
     INST TIME                BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC  EXECUTIONS ROWS_PROCESSED_1EXEC
    ----- ------------------- ----------------- ----------------- ------------------- ----------- --------------------
        1 24.10.2013 03:53:06          1092.000          1073.000                .000           1                1.000
    

    It was a single execution and look at the numbers! 1073 disk reads and 1092 buffer gets. Could it be the DISK_READS_DELTA is actually the number of blocks read from disks? I need to check the raw trace file to find out.

  3. I found the following lines in the trace file. I’ve highlighted all lines that report waits on physical IO. Notice the first query (sqlid=’96g93hntrzjtr’) is a recursive SQL (dep=1) for the query I executed (sqlid=’036c3dmx2u3x9′, and it was executed during the PARSE phase “PARSE #7904600” for my query. There were few other recursive statements, but they didn’t do any disk IOs (you’ll have to trust me here). It’s good to know the lines are written to the trace file after the corresponding event completes, this is why the recursive statements of the parse phase are reported before the line describing the whole parse operation.
    PARSING IN CURSOR #25733316 len=210 dep=1 uid=0 oct=3 lid=0 tim=1382576068532471 hv=864012087 ad='3ecd4b88' sqlid='96g93hntrzjtr'
    select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
    END OF STMT
    PARSE #25733316:c=0,e=240,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=1382576068532470
    EXEC #25733316:c=0,e=404,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=2239883476,tim=1382576068532919
    WAIT #25733316: nam='db file sequential read' ela= 885 file#=1 block#=64857 blocks=1 obj#=427 tim=1382576068533833
    WAIT #25733316: nam='db file sequential read' ela= 996 file#=1 block#=58629 blocks=1 obj#=425 tim=1382576068534935
    FETCH #25733316:c=0,e=2092,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=1382576068535022
    STAT #25733316 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=2079 us)'
    STAT #25733316 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=989 us)'
    CLOSE #25733316:c=0,e=58,dep=1,type=3,tim=1382576068535146
    =====================
    PARSING IN CURSOR #7904600 len=50 dep=0 uid=0 oct=3 lid=0 tim=1382576068535413 hv=4197257129 ad='3618a2a4' sqlid='036c3dmx2u3x9'
    select /*+ full(a) */ count(I_DATA) from tpcc.item
    END OF STMT
    PARSE #7904600:c=8001,e=12985,p=2,cr=19,cu=0,mis=1,r=0,dep=0,og=1,plh=1537583476,tim=1382576068535411
    EXEC #7904600:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1537583476,tim=1382576068535500
    WAIT #7904600: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1382576068535530
    WAIT #7904600: nam='db file sequential read' ela= 1960 file#=4 block#=113050 blocks=1 obj#=65019 tim=1382576068537566
    WAIT #7904600: nam='direct path read' ela= 1203 file number=4 first dba=113051 block cnt=5 obj#=65019 tim=1382576068539309
    WAIT #7904600: nam='direct path read' ela= 1531 file number=4 first dba=123392 block cnt=8 obj#=65019 tim=1382576068541567
    WAIT #7904600: nam='direct path read' ela= 1047 file number=4 first dba=123401 block cnt=15 obj#=65019 tim=1382576068542719
    WAIT #7904600: nam='direct path read' ela= 1081 file number=4 first dba=123417 block cnt=15 obj#=65019 tim=1382576068543895
    WAIT #7904600: nam='direct path read' ela= 956 file number=4 first dba=123433 block cnt=15 obj#=65019 tim=1382576068544997
    WAIT #7904600: nam='direct path read' ela= 950 file number=4 first dba=123449 block cnt=15 obj#=65019 tim=1382576068546096
    WAIT #7904600: nam='direct path read' ela= 1168 file number=4 first dba=123465 block cnt=15 obj#=65019 tim=1382576068547425
    WAIT #7904600: nam='direct path read' ela= 1151 file number=4 first dba=123481 block cnt=15 obj#=65019 tim=1382576068548784
    WAIT #7904600: nam='direct path read' ela= 1279 file number=4 first dba=123497 block cnt=15 obj#=65019 tim=1382576068550229
    WAIT #7904600: nam='direct path read' ela= 9481 file number=4 first dba=123522 block cnt=126 obj#=65019 tim=1382576068559912
    WAIT #7904600: nam='direct path read' ela= 6872 file number=4 first dba=123650 block cnt=126 obj#=65019 tim=1382576068566997
    WAIT #7904600: nam='direct path read' ela= 5562 file number=4 first dba=123778 block cnt=126 obj#=65019 tim=1382576068573516
    WAIT #7904600: nam='direct path read' ela= 7524 file number=4 first dba=123906 block cnt=126 obj#=65019 tim=1382576068582195
    WAIT #7904600: nam='direct path read' ela= 5858 file number=4 first dba=124034 block cnt=126 obj#=65019 tim=1382576068589263
    WAIT #7904600: nam='direct path read' ela= 5326 file number=4 first dba=124162 block cnt=126 obj#=65019 tim=1382576068595750
    WAIT #7904600: nam='direct path read' ela= 5788 file number=4 first dba=124290 block cnt=126 obj#=65019 tim=1382576068602627
    WAIT #7904600: nam='direct path read' ela= 2446 file number=4 first dba=124418 block cnt=70 obj#=65019 tim=1382576068607337
    FETCH #7904600:c=4000,e=73444,p=1071,cr=1073,cu=0,mis=0,r=1,dep=0,og=1,plh=1537583476,tim=1382576068608996
    STAT #7904600 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=1073 pr=1071 pw=0 time=73444 us)'
    STAT #7904600 id=2 cnt=100000 pid=1 pos=1 obj=65019 op='TABLE ACCESS FULL ITEM (cr=1073 pr=1071 pw=0 time=49672 us cost=198 size=3900000 card=100000)'
    WAIT #7904600: nam='SQL*Net message from client' ela= 148 driver id=1650815232 #bytes=1 p3=0 obj#=65019 tim=1382576068609235
    FETCH #7904600:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=1537583476,tim=1382576068609261
    WAIT #7904600: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=65019 tim=1382576068609276
    WAIT #7904600: nam='SQL*Net message from client' ela= 940 driver id=1650815232 #bytes=1 p3=0 obj#=65019 tim=1382576068610226
    CLOSE #7904600:c=0,e=17,dep=0,type=0,tim=1382576087713173
    
  4. The next task was to count the “blocks” for db file sequential reads and “block cnt” for direct path reads. The recursive SQL (96g93hntrzjtr) read 2 data blocks from disks and the main SQL (036c3dmx2u3x9) read 1071 data blocks from disks. The total number is 1073! Hey, this is exactly what DISK_READS_DELTA (DISK_READS_1EXEC in the script outputs above) reported – so it’s the number of data blocks, and not the number of IO requests!

The investigation resulted in two obvious conclusions:

  • DISK_READS_TOTAL and DISK_READS_DELTA in DBA_HIST_SQLSTAT report the number of blocks read from disks.
  • The query statistics in DBA_HIST_SQLSTAT also include the data from execution of the recursive statements.

P.S. Later I found another column – DBA_HIST_SQLSTAT.PHYSICAL_READ_REQUESTS_DELTA – that was introduced in 11.2 along with a large number of additional columns. PHYSICAL_READ_REQUESTS_DELTA and PHYSICAL_READ_REQUESTS_TOTAL represent the number of IO requests that were executed. You can compare the numbers by counting the highlighted rows above to the value I found in DBA_HIST_SQLSTAT below.

SQL> select DISK_READS_DELTA, PHYSICAL_READ_REQUESTS_DELTA from dba_hist_sqlstat where sql_id='036c3dmx2u3x9';

DISK_READS_DELTA PHYSICAL_READ_REQUESTS_DELTA
---------------- ----------------------------
            1073                           20

getMOSPatch.sh – Downloading Patches From My Oracle Support

$
0
0

How to download patches from My Oracle Support (MOS) directly to the server? This has bothered me since the ftp access was closed a few years ago. Of course, I’ve been given some options by Oracle, like, I could access MOS from the server using a browser (probably from a VNC desktop – thank you very much), or I could look up the patches on my workstation to download the WGET script from MOS, which I uploaded to the server, adjusted with the username and the password of my MOS account and then started the downloads. Not too convenient, is it?
Then, back in 2009 my teammate John published a blog post on Retrieving Oracle patches with wget. This eliminated the need to upload the wget script from MOS to the server and I only had to get the URLs of the patches and pass them to a shell function. While this was so much easier, I still needed to open the browser to find those URLs.
I think it’s time to get rid of browser dependency. So I’ve written a shell script getMOSPatch.sh that can be used to download patches directly to the server using only the patch number.
I’ve tested the tool on Linux, and there is a good chance it won’t work on some other platforms as it utilizes tools like awk, sed, grep, egrep and wget with options that probably only work on Linux, but if there’s much interest in this tool and I get many comments on this blog post I promise to change that :)
You can use wget to download the script to the server directly:

[oracle@mel1 Patches]$ wget --no-check-certificate -nv https://raw.github.com/MarisElsins/TOOLS/master/Shell/getMOSPatch.sh
WARNING: cannot verify raw.github.com's certificate, issued by `/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert High Assurance CA-3':
  Unable to locally verify the issuer's authority.
2013-11-10 17:42:17 URL:https://raw.github.com/MarisElsins/TOOLS/master/Shell/getMOSPatch.sh [4021/4021] -> "getMOSPatch.sh" [1]
[oracle@mel1 Patches]$ chmod u+x getMOSPatch.sh

First time you run the script (or when you run it with parameter reset=yes) it will let you choose which platforms and languages the patches need to be downloaded for and the choices will be saved in a configuration file. The available platforms and languages are fetched from MOS.

[oracle@mel1 Patches]$ ./getMOSPatch.sh reset=yes
Oracle Support Userid: elsins@pythian.com
Oracle Support Password:

Getting the Platform/Language list
Available Platforms and Languages:
527P - Acme Packet OS
293P - Apple Mac OS X (Intel) (32-bit)
522P - Apple Mac OS X (Intel) (64-bit)
...
226P - Linux x86-64
912P - Microsoft Windows (32-bit)
...
7L - Finnish (SF)
2L - French (F)
4L - German (D)
104L - Greek (EL)
107L - Hebrew (IW)
...
39L - Ukrainian (UK)
43L - Vietnamese (VN)
999L - Worldwide Spanish (ESW)
Comma-delimited list of required platform and language codes: 226P,4L
[oracle@mel1 Patches]$

After this you simply have to run the script with parameter patch=patchnr1,patchnr2,… and the listed patches will be downloaded. This is how it happens:

  • the script looks up each of the patches for each platform and language and:
    • if one patch is found – it is automatically downloaded
    • if multiple patches are found (this can happen if the same patch is available for multiple releases) – the tool will ask you to choose which patches to download.
  • you can also specify parameter download=all to download all found patches without being asked to choose ones from the list.
  • you can also specify parameter regexp to apply filters to the filenames of the looked up patches. This is especially useful for Apps DBAs as filter regexp=”.*A_R12.*” would be helpful for e-Business Suite Release 12.0 and regexp=”.*B_R12.*” – for R12.1.
  • if you set environment variables mosUser and mosPass before running the script you won’t be asked to enter the user credentials.

Take a look at the following examples:

  • downloading the latest CPU patch (patch 16902043, OCT2013) for 11gR2.
    [oracle@mel1 Patches]$ ./getMOSPatch.sh patch=16902043
    Oracle Support Userid: elsins@pythian.com
    Oracle Support Password:
    
    Getting patch 16902043 for "Linux x86-64"
    p16902043_112030_Linux-x86-64.zip completed with status: 0
    
    Getting patch 16902043 for "German (D)"
    no patch available
    
  • Downloading the latest patch for OPatch (there are multiple patches available on the same platform):
    [oracle@mel1 Patches]$ export mosUser=elsins@pythian.com
    [oracle@mel1 Patches]$ ./getMOSPatch.sh patch=6880880
    Oracle Support Password:
    
    Getting patch 6880880 for "Linux x86-64"
    1 - p6880880_112000_Linux-x86-64.zip
    2 - p6880880_111000_Linux-x86-64.zip
    3 - p6880880_121010_Linux-x86-64.zip
    4 - p6880880_131000_Generic.zip
    5 - p6880880_101000_Linux-x86-64.zip
    6 - p6880880_102000_Linux-x86-64.zip
    Comma-delimited list of files to download: 3
    p6880880_121010_Linux-x86-64.zip completed with status: 0
    
    Getting patch 6880880 for "German (D)"
    no patch available
    [oracle@mel1 Patches]$
    
  • Downloading multiple patches at the same time without prompting user to specify which files to download if multiple files are found. (don’t be confused that files with “LINUX” and not “Linux-x86-64” in the filename are downloaded here. These are e-Business Suite patches and both 32b and 64b platforms have the same patch):
    [oracle@mel1 Patches]$ ./getMOSPatch.sh patch=10020251,10141333 download=all
    Oracle Support Userid: elsins@pythian.com
    Oracle Support Password:
    
    Getting patch 10020251 for "Linux x86-64"
    p10020251_R12.AR.B_R12_LINUX.zip completed with status: 0
    p10020251_R12.AR.A_R12_LINUX.zip completed with status: 0
    
    Getting patch 10020251 for "German (D)"
    p10020251_R12.AR.A_R12_d.zip completed with status: 0
    
    Getting patch 10141333 for "Linux x86-64"
    p10141333_R12.AR.A_R12_LINUX.zip completed with status: 0
    p10141333_R12.AR.B_R12_LINUX.zip completed with status: 0
    
    Getting patch 10141333 for "German (D)"
    p10141333_R12.AR.B_R12_d.zip completed with status: 0
    p10141333_R12.AR.A_R12_d.zip completed with status: 0
    
  • Downloading the same patches as in the previous example with an additional filter for e-Business Suite 12.1 patches only:
    [oracle@mel1 Patches]$ ./getMOSPatch.sh regexp=".*B_R12.*" patch=10020251,10141333 download=all
    Oracle Support Userid: elsins@pythian.com
    Oracle Support Password:
    
    Getting patch 10020251 for "Linux x86-64"
    p10020251_R12.AR.B_R12_LINUX.zip completed with status: 0
    
    Getting patch 10020251 for "German (D)"
    no patch available
    
    Getting patch 10141333 for "Linux x86-64"
    p10141333_R12.AR.B_R12_LINUX.zip completed with status: 0
    
    Getting patch 10141333 for "German (D)"
    p10141333_R12.AR.B_R12_d.zip completed with status: 0
    

My Experience at UKOUG Tech13

$
0
0

UKOUG Tech13 was great! Not only because of the amount of interesting presentations to choose from, but also because of the surrounding events. I felt this conference was quite different from previous UKOUG Oracle Technology & E-Business Suite events in Birmingham (I’ve attended six previous events so you can trust me when I say that.) I decided to take a few notes about my experience at the conference to reveal what made it such a great event to attend.

The Changes

This was the first year Oracle Apps and Oracle Tech streams were separated in two different conferences. This is a great change for most Oracle DBAs, because the conference is able to concentrate more on Oracle Tech topics. However, I personally missed the opportunity to attend a couple of presentations for Apps DBAs. I usually looked for these presentations at previous events, as UKOUG conferences had been the only events (that I knew of) in Europe that gave a voice to Apps DBAs too. This year, it was no longer possible — on the other hand, I’m not interested in Apps Tech stuff only. I like learning new things about the database too, like performance tuning, internals and advanced troubleshooting. This conference had the widest selection of topics to chose from.

“Six presentations per company” is another change introduced this year. Alex Gorbachev discusses it more in his blog post here. I think this rule makes the conference worse, and not because Pythian had to remove some of the presentations and I couldn’t meet a few other colleagues. Mainly because the overall quality of the topics presented at the conference was artificially decreased. If I was able to suggest an experiment I’d ask UKOUG to remove that rule for the conference next year to check if the average marks from the session feedback surveys increase or decrease when the rule is not applied. I think the score would increase.

Meeting and Making Friends and Having Fun

The is the most important part of the conference for me. Nothing attracts so many great people working with the same technology as a good conference. It was great to finally meet some of the community activists that I knew from twitter and blogs only: Øyvind Isene, Osama Mustafa, David Kurtz, Philippe Fierens, Tim Hall, Alex Nuijten, Martin Bach, Elliot Zissman, Alex Zaballa, Fahd Mirza, Marcin Przepiorowski and others I forgot to put on the list. I checked and confirmed the rumors that a great way of making new friends is volunteering for some work. This time I volunteered to be one of the RAC Attack Ninjas and in the end, it was more fun and less work than I expected.

Me and Andrejs Karpovs practicing some useful RAC Attach Ninja moves.

Me and Andrejs Karpovs practicing some useful RAC Attack Ninja moves.

Having fun is undeniably an important part of the conference. The brain needs to rest from the intense learning experience, and trust me, there were plenty of options to chose from. There were the official social events, and usually some unofficial events too. Take a look! These are my colleagues Michael McKee and Luke Davies discussing how useless Twitter is at one of the unofficial events.

"Luke, Twitter is useless!"

“Luke, Twitter is useless!”

There are ways to spend quality time, even if you’re tired by the end of the conference. The city can offer something you won’t find anywhere else.

Attending Manchester United vs Everton

Attending Manchester United vs Everton

The Twitter

Oh man, this was intense. I’m glad I had prepared for this by buying an extra battery for my phone – the money was well spent. The statistics – 150 tweets in 5 days. This doesn’t sound too much, just 30 tweets a day, but for me, who had been doing an average of ~1 tweet a day, it was a lot. I even saw my name in twitter analytics board in place 3 ranked by twitter mentions, but this most likely is because @UKOUG retweeted almost all of my tweets!

Twitter Analytics by Rittman Mead

Twitter Analytics by Rittman Mead

At one point, I almost quit tweeting from the conference but one tweet kept me going:

If you asked me which of my tweets highlights the experiences in the conference best, I’d choose this one, which also brings me to the next topic – my presentation.

My Presentation

OK, this was not really my presentation. The topic was submitted by a good friend of mine – Yury Velikanov, who was not able to attend the conference. I volunteered to present the paper as I thought it would be easy – topic by Yury, slides by Yury, speech by Maris. I was not a first-time presenter. I actually counted around 15 presentations I had delivered at international conferences all done in English (not my native language as you might have noticed by reading this blog), so I felt confident. I had seen Yury present the topic, I had checked the slides carefully and changed a few things, I understood the topic, so what could possibly go wrong? Anything. There were two things that went wrong for me.

  1. The Auditorium. This was the first time I presented in the largest room at the conference. I found out which room it was on Monday. The presentation was on Wednesday, so I had two full days to panic. Having the biggest room had put additional pressure on the task and yes I was worried.
  2. I and two other Pythian speakers – Luke Davies and Michael McKee – decided to rehearse the presentations in a hotel room on Tuesday afternoon. And I’m glad we did, because I struggled a lot – I was not able to formulate the thought fluently, and I was not able to do much more than reading the bullet points. There were couple of complicated slides that I couldn’t explain at all. My morale went down to zero, and panic levels skyrocketed. But I’m extremely glad we did the rehearsal, as I found out I was not ready at all by that time. That day I spent 6 hours preparing for the presentation. I reviewed all the slides again, and again. I put down some notes, which I never do for my own presentations. I tested few things that I wasn’t 100% sure would work, as I had to present them. I exchanged a number of thoughts with Luke and at 2AM I closed the laptop.

My presentation “10 Ways to improve your RMAN script” was scheduled for 10:05 on Wednesday. I got up early to rehearse it one more time in front of the hotel mirror — I felt I was much more fluent, but I was still not satisfied. At that time there was nothing more to do and the auditorium was waiting for me. I can’t say much about my own performance. I only remember all panic disappeared in first 3 minutes of the presentation; In fact, I panicked much more in front of the mirror than on the stage — I think I did OK. Not perfect, but OK. The most important thing for me was to come to these conclusions:

  • Presenting a topic which I didn’t write is much more difficult. I usually think a lot about the topic before submitting it, and I know large part of the content I’m going to include in the presentation. I missed this stage of preparation and therefore had to spend way more time on working through the slides to prepare the speech.
  • Rehearse the presentation at least a day before the show. This reveals how ready I am and I can still use the time left to improve the presentations.
  • Rehearse the presentation shortly before they “go live” on the day of the speech. This reduces my panic levels and puts my brain on track for the topic aI present.

I’m looking forward to receiving the session evaluations as this was a unique experience for me and I really hope I didn’t disappoint anyone. I appreciate the support and feedback I received on twitter. Thank you for that. And thanks to Yury for trusting the delivery of his presentation to me.

Learning

There were many  great presentations but these are the ones I enjoyed the most. I’ve already managed to use some of the lessons I learned from these sessions in my day-to-day work.

  • Jonathan Lewis: “Compression: Index, basic and OLTP”
  • James Morle: “Optimal Oracle Configuration for Efficient Table Scanning”
  • David Kurtz: “Partition, Archive, Compress, Purge – Keep your ERP on the Road”
  • Christo Kutrovsky: “Maximise Data Warehouse Performance with Parallel Queries”
  • Frits Hoogland: “Hacking session: Advanced profiling of Oracle using function calls” – this was the only presentation I attended at “OakTable World UK 2013” conference.

This is the short summary of my experiences. I’m really happy I was able to attend the conference and meet all the people I usually meet virtually. I’m already waiting for UKOUG Tech14. See you there!

Do AWR Reports Show the Whole Picture?

$
0
0

AWR report is a great source of aggregated information on top activities happening in our databases. I use data collected in AWR quite often, and obviously the easiest way of getting the data out from the AWR is by running the AWR report. In most cases that’s not an issue, but there are certain scenarios when it hides the information one is looking for, just because of how it’s designed.

If I’m trying to collect information about top queries by physical reads, I would normally look at the “SQL ordered by Reads” section and this is what I’d see:

AWR DIsk readsI have the top SQLs by physical reads – just what I’ve been looking for (except the fact that AWR report covers only one of my RAC nodes).

But wait a second, what if there are queries that don’t use bind variables? This might be a problem as each query would have it’s own SQL_ID and probably they wouldn’t make it into the TOP10 just because each of them is treated separately. Nothing to worry about – AWR also collects FORCE_MATCHING_SIGNATURE values (read this blog post to understand why I know they would help) and we can use them to identify and group “similar” statements, we just need a custom script to do that.

Here I use my custom script to report TOP 20 SQL_IDs by physical reads in last 7 days (and I’m reporting data from both RAC nodes in the same list) – you can see the few TOP SQLs are the same as reported in AWR report, but because I’m reporting database-wide statistics instead of instance-wide as AWR does, I have other SQLs on the list too. I’ve also included 2 additional columns:

  • DIFF_PLANS – number of different PLAN_HASH_VALUE values reported for this SQL_ID, and if only one is found – it shows the actual PLAN_HASH_VALUE
  • DIFF_FMS – number of different FORCE_MATCHING_SIGNATURE values reported for this SQL_ID, and if only one is found – it shows the actual FORCE_MATCHING_SIGNATURE

Custom Script - sqlidNow, I can adjust the custom script to aggregate the data by FORCE_MATCHING_SIGNATURE, instead of SQL_ID. I’ll still keep the DIFF_PLANS column and will add a new one – DIFF_SQLID.

Custom Script - fmsThe situation is a little bit different now. Notice how the second row reports FORCE_MATCHING_SIGNATURE  = 0, this typically shows PL/SQL blocks that execute the SQL statements and aggregate statistics from them, so we’re not interested in them. Otherwise the original report by SQL_ID showed quite accurate data in this situation and my suspicions regarding the misuse of literal values where binds should be used, didn’t materialize. Could I be missing anything else? Yes — even the FORCE_MATCHING_SIGNATURE could be misleading in identification of TOP resource consumers, you can write two completely different SQLs (i.e. “select * from dual a” and “select * from dual b”) that will do the same thing and will use the same execution plan. Let’s query the top consumers by PLAN_HASH_VALUE to check this theory!

Custom Script - planI’ve highlighted the third row as the same PLAN_HASH_VALUE is reported for 20 different SQL_IDs, which allowed it to take the third place in the TOP list by physical reads (actually it’s the second place as PLAN_HASH_VALUE=0 is ignorable). The next query expands the third row:

Custom Script - sqlids for planAnd here are All the SQL statements:

All plan sqlsWhat I have here is 20 different  views generated by Oracle Discoverer that query the database by using exactly the same execution plan. Closer look revealed the views included hardcoded query parameters (date intervals for reporting), but in the end, this was the same query! It’s the TOP2 query by physical reads. in the database and if I tune it – all 20 discoverer views will benefit.

I think one of the drawbacks of AWR reports is that it is not able to identify such situations, it would be great if user could choose the column by which he aggregation is done. In the situation I described I was able to identify one of the top queries by physical reads only when I aggregated data by PLAN_HASH_VALUE.

Using the ILOM for Troubleshooting on ODA

$
0
0

I worked on root cause analysis for a strange node reboot on client’s Oracle Database Appliance yesterday. The case was quite interesting from the perspective that none of the logs contained any information related to the cause of the reboot. I could only see the log entries for normal activities and then – BOOM! – the start-up sequence! It looked like someone just power cycled the node. I also observed the heartbeat timeouts followed by the node eviction on the remaining node. There was still one place I hadn’t checked and it revealed the cause of the issue.

One of the cool things about ODA is it’s service processor (SP) called Integrated Lights Out Manager (ILOM), which allows you to do many things that you’d normally do being physically located in the data center – power cycle the node, change the BIOS settings, choose boot devices, and … (the drum-roll) … see the console outputs from the server node! And it doesn’t only show the current console output but it keeps logging it too. Each ODA server has its own ILOM, so I found out the IP address for the ILOM of the node which failed and connected to it using SSH.

$ ssh pythian@oda01a-mgmt
Password:

Oracle(R) Integrated Lights Out Manager

Version 3.0.14.13.a r70764

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.

->
-> ls

 /
    Targets:
        HOST
        STORAGE
        SYS
        SP

    Properties:

    Commands:
        cd
        show

ILOM can be browsed as it would be a directory structure. Here the “Targets” are different components of the system. When you “cd” into a target you see sub-components and so on. Each target can have properties, they are displayed as variable=value pairs under “Properties” section. And there are also list of “Commands” that you can execute for the current target. the “ls” command shows the sub-targets, the properties and the commands for the current target. Here’s how I found the console outputs from the failed node:

-> cd HOST
/HOST

-> ls

 /HOST
    Targets:
        console
        diag

    Properties:
        boot_device = default
        generate_host_nmi = (Cannot show property)

    Commands:
        cd
        set
        show

-> cd console
/HOST/console

-> ls

 /HOST/console
    Targets:
        history

    Properties:
        line_count = 0
        pause_count = 0
        start_from = end

    Commands:
        cd
        show
        start
        stop

-> cd history
/HOST/console/history

-> ls

The last “ls” command started printing all the history of console outputs on my screen and look what I found just before the startup sequence (I removed some lines to make this shorter and I also highlighted the most interesting lines):

divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:1f:00.0/host7/port-7:1/expander-7:1/port-7:1:2/end_device-7:1:2/target7:0:15/7:0:15:0/timeout
CPU 3
Modules linked in: iptable_filter(U) ip_tables(U) x_tables(U) oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl(U) mptbase(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) rfkill(U) nfs(U) fscache(U) nfs_acl(U) auth_rpcgss(U) lockd(U) sunrpc(U) bonding(U) be2iscsi(U) ib_iser(U) rdma_cm(U) ib_cm(U) iw_cm(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) dm_round_robin(U) ipv6(U) cxgb3i(U) libcxgbi(U) cxgb3(U) mdio(U) libiscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) video(U
) output(U) sbs(U) sbshc(U) parport_pc(U) lp(U) parport(U) ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) igb(U) ixgbe(U) joydev(U) ses(U) enclosure(U) e1000e(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd(U) soundcore(U) snd_page_alloc(U) iTCO_wdt(U) iTCO_vendor_support(U) shpchp(U) i2c_i801(U) i2c_core(U) ioatdma(U) dca(U) pcspkr(U) dm_multipath(U) usb_storage(U) mpt2sas(U) scsi_transport_sas(U) raid_class(U)
 ahci(U) raid1(U) [last unloaded: microcode]
Pid: 29478, comm: top Tainted: P        W  2.6.32-300.11.1.el5uek #1 SUN FIRE X4370 M2 SERVER
RIP: 0010:[<ffffffff8104b3e8>]  [<ffffffff8104b3e8>] thread_group_times+0x5b/0xab
...
Kernel panic - not syncing: Fatal exception
Pid: 29478, comm: top Tainted: P      D W  2.6.32-300.11.1.el5uek #1
Call Trace:
 [<ffffffff8105797e>] panic+0xa5/0x162
 [<ffffffff8107ae09>] ? up+0x39/0x3e
 [<ffffffff810580d1>] ? release_console_sem+0x194/0x19d
 [<ffffffff8105839a>] ? console_unblank+0x6a/0x6f
 [<ffffffff8105764b>] ? print_oops_end_marker+0x23/0x25
 [<ffffffff81456ea6>] oops_end+0xb7/0xc7
 [<ffffffff8101565d>] die+0x5a/0x63
 [<ffffffff8145677c>] do_trap+0x115/0x124
 [<ffffffff81013674>] do_divide_error+0x96/0x9f
 [<ffffffff8104b3e8>] ? thread_group_times+0x5b/0xab
 [<ffffffff810dd2f8>] ? get_page_from_freelist+0x4be/0x65e
 [<ffffffff81012b1b>] divide_error+0x1b/0x20
 [<ffffffff8104b3e8>] ? thread_group_times+0x5b/0xab
 [<ffffffff8104b3d4>] ? thread_group_times+0x47/0xab
 [<ffffffff8116ee13>] ? collect_sigign_sigcatch+0x46/0x5e
 [<ffffffff8116f366>] do_task_stat+0x354/0x8c3
 [<ffffffff81238267>] ? put_dec+0xcf/0xd2
 [<ffffffff81238396>] ? number+0x12c/0x244
 [<ffffffff8107419b>] ? get_pid_task+0xe/0x19
 [<ffffffff811eac34>] ? security_task_to_inode+0x16/0x18
 [<ffffffff8116a77b>] ? task_lock+0x15/0x17
 [<ffffffff8116add1>] ? task_dumpable+0x29/0x3c
 [<ffffffff8116c1c6>] ? pid_revalidate+0x80/0x99
 [<ffffffff81135992>] ? seq_open+0x25/0xba
 [<ffffffff81135a08>] ? seq_open+0x9b/0xba
 [<ffffffff8116d147>] ? proc_single_show+0x0/0x7a
 [<ffffffff81135b2e>] ? single_open+0x8f/0xb8
 [<ffffffff8116aa0e>] ? proc_single_open+0x23/0x3b
 [<ffffffff81127cc1>] ? do_filp_open+0x4f8/0x92d
 [<ffffffff8116f8e9>] proc_tgid_stat+0x14/0x16
 [<ffffffff8116d1a6>] proc_single_show+0x5f/0x7a
 [<ffffffff81135e73>] seq_read+0x193/0x350
 [<ffffffff811ea88c>] ? security_file_permission+0x16/0x18
 [<ffffffff8111a797>] vfs_read+0xad/0x107
 [<ffffffff8111b24b>] sys_read+0x4c/0x70
 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
Rebooting in 60 seconds..???

A quick search on My Oracle Support quickly found a match: Kernel Panic at “thread_group_times+0x5b/0xab” (Doc ID 1620097.1)”. The call stack and the massages are a 100% match and the root cause is a kernel bug that’s fixed in more recent versions.
I’m not sure how I would have gotten to the root cause if this system was not an ODA and the server would have just bounced without logging the Kernel Panic in any of the logs. ODA’s ILOM definitely made the troubleshooting effort less painful and probably saved us from couple more incidents caused by this bug in the future as we’d been able to troubleshoot it quicklyand we’ll be able to implement the fix sooner.


Are you Ready for the Leap Second?

$
0
0

If you’re not aware of what the leap second is look into it. The fact is, this year the last minute of June 30th will be one second longer and “June 30, 2015 23:59:60” will be a valid and correct time. There are a few issues that could be caused by the leap second, so I’ve reviewed a number of MOS notes and this blog post is the summary of the findings.

Update (June 4th, 2015): I’ve put together another blog post about handling the leap second on Linux here.

There are 2 potential issues, which are described below.

1. NTPD’s leap second update causes a server hang or excessive CPU usage

Any Linux distributions using kernel versions from 2.4 though and including 2.6.39 may be affected (including both UEK and RedHat compatible kernels). This range is very wide and includes any RHEL and OEL releases except version 7 unless the kernel versions are kept up to date on lower versions.

Problems may be observed even a day before the leap second happens, so this year it could cause the symptoms any time on June 30. This is because the NTP server lets the host know about the upcoming leap second up to a day ahead of time, and the update from the NTP triggers the issues.

There are 2 possible symptoms:

  1. Servers will become unresponsive and the following can be seen in system logs, console, netconsole or vmcore dump analysis outputs:
    INFO: task kjournald:1119 blocked for more than 120 seconds.
    "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kjournald     D ffff880028087f00     0  1119      2 0x00000000
    ffff8807ac15dc40 0000000000000246 ffffffff8100e6a1 ffffffffb053069f
    ffff8807ac22e140 ffff8807ada96080 ffff8807ac22e510 ffff880028073000
    ffff8807ac15dcd0 ffff88002802ea60 ffff8807ac15dc20 ffff8807ac22e140
  2. Any Java applications suddenly starts to use 100% CPU (leap second insertion causes futex to repeatedly timeout).
    $top - 09:38:24 up 354 days,  5:48,  4 users,  load average: 6.49, 6.34, 6.44
    Tasks: 296 total,   4 running, 292 sleeping,   0 stopped,   0 zombie
    Cpu(s): 97.2%us,  1.8%sy,  0.0%ni,  0.7%id,  0.1%wa,  0.1%hi,  0.2%si,  0.0%st
    Mem:     15991M total,    15937M used,       53M free,      107M buffers
    Swap:     8110M total,       72M used,     8038M free,    13614M cached
    PID USER      PR    NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    22564 oracle    16   0 1400m 421m 109m S  353  2.6   2225:11 java
    7294 oracle     17   0 3096m 108m 104m S   22  0.7   0:02.61 oracle
    

And the only workaround mentioned in the notes is to run these commands as root after the problem has occurred (obviously it would be for issue 2) only, as the issue 1) would require a reboot)

# /etc/init.d/ntpd stop
#  date -s "`date`"    (reset the system clock)
# /etc/init.d/ntpd start

I think, as the problem is triggered by the update coming from NTP on June 30, it should also be possible to stop the NTPD service on June 29th, and re-enable it on July 1st instead. This would allow it to bypass the problem conditions.
Just because any Java application can be effected we need to think about where Java is used. And for Oracle DBAs the typical ones to worry about would be all enterprise manager agents as well as any fusion Middleware products. So if you’re using Grid control or Cloud control to monitor your Oracle infrastructure it’s very likely most of your servers are potentially under risk if the kernels are not up to date.

2. Inserts to DATE and TIMESTAMP columns fail with “ORA-01852: seconds must be between 0 and 59”

Any OS could be affected. Based on MOS note “Insert leap seconds into a timestamp column fails with ORA-01852 (Doc ID 1553906.1)”, any inserts of time values having “60” seconds into DATE or TIMESTAMP columns will result in ORA-01852.
This can’t be reliably mitigated by stopping the NTPD as the up to date TZ information on the server may already contain the information about the extra second. The note also provides a “very efficient workaround”: *the leap second record can be stored in a varchar2 datatype instead.*.  You might be thinking, “What? Are you really suggesting me that?” According to MOS note 1453523.1 it appears that the time representation during the leap second is something that could differ depending on the OS/kernel/ntpd versions. For example, it could show “23:59:60” or it could should show “23:59:59” for 2 consecutive seconds, which would allow avoiding the ORA-01852. Be sure to check it with your OS admins and make sure that the clock never shows “23:59:60” to avoid this issue completely.

Consider your infrastructure

By no means are the issues described above an exhaustive list. There’s too much information to cover everything, but based on what I’m reading the issues caused by leap second can be quite severe. Please consider your infrastructure and look for information about issues and fixes to address the upcoming leap second. Search MOS for the products you use and add the “leap second” keyword too, If you’re using software or OS from another vendor, check their support notes regarding leap seconds. Here are additional MOS notes for reading if you’re on some of Oracle’s engineered systems, but again, you’ll find more information if you search:

  • Leap Second Time Adjustment (e.g. on June 30, 2015 at 23:59:59 UTC) and Its Impact on Exadata Database Machine (Doc ID 1986986.1)
  • Exalogic: Affected EECS Releases and Patch Availability for Leap Second (Doc ID 2008413.1)
  • Leap Second on Oracle SuperCluster (Doc ID 1991954.1)
  • Leap Second Handling in Solaris – NTPv3 and NTPv4 (Doc ID 1019692.1)

References

Discover more about Pythian’s expertise in Oracle.

Handling the Leap Second – Linux

$
0
0

Last week I published a blog post titled “Are You Ready For the Leap Second?“, and by looking at the blog statistics I could tell that many of you read it, and that’s good, because you became aware of the risks that the leap second on June 30th, 2015 introduces. On the other hand, I must admit I didn’t provide clear instructions that you could use to avoid all possible scenarios. I’ve been looking into this for a good while and I think the official RedHat announcements and My Oracle Support notes are confusing. This blog post is my attempt to explain how to avoid the possible issues.

Update (June 9th, 2015): Made it clear in the text below that ntp’s slewing mode (ntp -x) is mandatory from Oracle Grid Infrastructure and therefore for RAC too.

The complexity of solving these problems comes from the fact that there are multiple contributing factors. The behavior of the system will depend on a combination of these factors.
In the coming sections I’ll try to explain what exactly you should pay attention to and what you should do to avoid problems. The content of this post is fully theoretical and based on the documentation I’ve read. I have NOT tested it, so it may behave differently. Please, if you notice any nonsense in what I’m writing, let me know by leaving a comment!

1. Collect the data

The following information will be required for you to understand what you’re dealing with:

  1. OS version and kernel version:
    $ cat /etc/issue
    Oracle Linux Server release 6.4
    Kernel \r on an \m
    
    $ uname -r
    2.6.39-400.17.1.el6uek.x86_64
    
  2. Is NTP used and which version of NTP is installed:
    $ ps -ef | grep ntp
    oracle    1627  1598  0 02:06 pts/0    00:00:00 grep ntp
    ntp       7419     1  0 May17 ?        00:00:17 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
    
    $ rpm -qa | grep ntp-
    ntp-4.2.4p8-3.el6.x86_64
    
  3. Version of tzdata and the configuration of /etc/localtime:
    $ rpm -qa | grep tzdata-
    tzdata-2012j-1.el6.noarc
    
    $ file /etc/localtime
    /etc/localtime: timezone data, version 2, 5 gmt time flags, 5 std time flags, no leap seconds, 235 transition times, 5 abbreviation chars
    

2. Check the kernel

Here’s a number of bugs that are related to leap second handling on Linux:

  1. System hangs on printing the leap second insertion message – This bug will hang your server at the time when the NTP notifies kernel about the leap second, and that can happen anytime on the day before the leap second (in our case anytime on June 30th, 2015). It’s fixed in kernel-2.6.9-89.EL (RHEL4) and kernel-2.6.18-164.el5 (RHEL5).
  2. Systems hang due to leap-second livelock – Because of this bug systems repeatedly crash due to NMI Watchdog detecting a hang. This becomes effective when the leap second is added. The note doesn’t exactly specify which versions fix the bug.
  3. Why is there high CPU usage after inserting the leap second? – This bug causes futex-active applications (i.e. java) to start consuming 100% CPU. Based on what’s discussed in this email in Linux Kernel Mailing List Archive, it’s triggered by a mismatch between timekeeping and hrtimer structures, which the leap second introduces. The document again does not clearly specify which versions fix the problem, however this “Kernal Bug Fix Update” mentions these symptoms to be fixed in 2.6.32-279.5.2.el6.

MOS Note: “How Leap Second Affects the OS Clock on Linux and Oracle VM (Doc ID 1453523.1)” mentions that kernels 2.4 to 2.6.39 are affected, but I’d like to know the exact versions. I’ve searched a lot, but I haven t found much, so here are the ones that I did find:

I’m quite sure by reading this you’re thinking: “What a mess!”. And that’s true. I believe, the safest approach is to be on kernel 2.6.39-200.29.3 or higher.

3. NTP is used

You’re using NTP if the ntpd process is running. In the outputs displayed above it’s running and has the following arguments: ntpd -u ntp:ntp -p /var/run/ntpd.pid -g. The behavior of the system during the leap second depends on which version of NTP you use and what’s the environment.

  • ntp-4.2.2p1-9 or higher (but not ntp-4.2.6p5-19.el7, ntp-4.2.6p5-1.el6 and ntp-4.2.6p5-2.el6_6) configured in slew mode (with option “-x”) – The leap second is not added by kernel, but the extra time is added by increasing the length of each second over ~2000 second period based on the differences of the server’s time and the time from NTP after the leap second. The clock is never turned backward. This is the configuration you want because:
    • Time never goes back, so there will be no impact to the application logic.
    • Strange time values like 23:59:60 are not used, so you won’t hit any DATE and TIMESTAMP datatype limitation issues.
    • As the leap second is not actually added, It should be possible to avoid all 3 kernel bugs that I mentioned by using this configuration. In many cases updating NTP is much simpler than a kernel upgrade, so if you’re still on an affected kernel use this option to bypass the bugs.

    The drawbacks of this configuration are related to the fact that the leap second is smeared out over a longer period of time:

    • This probably is not usable for applications requiring very accurate time.
    • This may not be usable for some clusters where all nodes must have exactly the same clocktime, because NTP updates are usually received every 1 to 18 minutes, plus giving the ~2000 seconds of time adjustment in slew mode the clocks could be off for as long as ~50 minutes. Please note, the slewing mode is (ntp -x) is mandatory for Oracle Grid Infrastructure as documented in Oracle® Grid Infrastructure Installation Guides 11g Release 2 and 12c Release 1.
  • ntp-4.2.2p1-9 or higher configured without slew mode (no “-x” option) – The NTP will notify the kernel about the upcoming leap second some time during June 30th, and the leap second will be added as an extra “23:59:59” second (time goes backward by one second). You will want to be on kernel with all fixes present.
  • below ntp-4.2.2p1-9 – The NTP will notify the kernel about the upcoming leap second some time during June 30th, and depending on the environment, the leap second will be added as an extra “23:59:59” second (time goes backward by one second), or the time will freeze for one second at midnight.

Extra precaution: if you’re running NTP make sure your /etc/localtime does not include leap seconds by running “file /etc/localtime” and confirming it lists message “no leap seconds”.

4. NTP is NOT used

If NTP is not used the time is managed locally by the server. The time is most likely off already, so I really do recommend enabling NTP in slew mode as described above, this is the right moment to do so.

If you have tzdata-2015a or higher installed, the information about the leap second on June 30th, 2015 is also available locally on the server, but it doesn’t mean yet it’s going to be added. Also if NTP is not used and the leap second is added locally, it will appear as “23:59:60”, which is an unsupported value for DATE and TIMESTAMP columns, so this is the configuration you don’t want to use. Here are the different conditions:

  • You’re below tzdata-2015a – the leap second will not be added.
  • You’re on tzdata-2015a or higher and “file /etc/localtime” includes message “X leap seconds”, where X is a number – the leap second will be added as “23:59:60” and will cause problems for your DATE/TIMESTAMP datatypes. You don’t want this configuration. Disable leap second by copying the appropriate timezone file from /usr/share/zoneinfo over /etc/localtime. It’s a dynamic change, no reboots needed. (Timezone files including the leap seconds are located in /usr/share/zoneinfo<strong>/right</strong>)
  • “file /etc/localtime” includes message “no leap seconds” – the leap second will not be added.

The recommendations

Again I must say this is a theoretical summary on how to avoid leap second issues on Linux, based on what’s written above. Make sure you think about it before implementing as you’re the one who knows your own systems:

  • Single node servers, or clusters where time between nodes can differ – Upgrade to ntp-4.2.2p1-9 or higher and configure it in slew mode (option “-x”). This should avoid the kernel bugs too, but due to lack of accurate documentation it’s still safer to be on kernel 2.6.39-200.29.3 or higher.
  • Clusters or applications with very accurate time requirements – NTP with slew mode is not suitable as it’s unpredictable when it will start adjusting the time on each server. You want to be on kernel 2.6.39-200.29.3 or higher. NTP should be enabled. Leap second will be added as an extra “23:59:59” second (the time will go backward by one second). Oracle Database/Clusterware should detect time drifting and should deal with it. Check MOS for any bugs related to time drifting for the versions you’re running.
  • I don’t care about the time accuracy, I can’t update any packages, but need my systems up at any cost – The simplest solution to this is stopping the NTP on June 29th and starting it up on July 1st, so that the server was left unaware of the leap second. Also, you need to make sure the /etc/localtime does not contain the leap second for June 30th, 2015 as explained above.
    -- on June 29th (UTC)
    # /etc/init.d/ntpd stop
    # date -s "`date`"    (reset the system clock)
    -- on July 1st (UTC)
    # /etc/init.d/ntpd start
  • Very accurate time requirements + time reduction is not allowed – I don’t know. I can’t see how this can be implemented. Does anyone have any ideas?

Post Scriptum

Initially I couldn’t understand why this extra second caused so much trouble. Don’t we change the time by a round hour twice a year without any issues? I found the answers during the research, and it’s obvious. Servers work in UTC time, which does not have daylight saving time changes. The timezone information is added just for representation purposes later on. UTC Time is continuous and predictable, but the leap second is something which breaks this normal continuity and that’s why it is so difficult to handle it. It’s also a known fact that Oracle Databases rely heavily on gettimeofday() system calls and these work in UTC too.

 

Discover more about Pythian’s Oracle Ace Maris Elsins.

Is Oracle Smart Flash Cache a “SPOF”?

$
0
0

 

Oracle Smart Flash Cache (OSFC) is a nice feature that was introduced in Oracle 11g Release 2. As only recently I had a real use case for it, I looked into it with the main goal of determining if adding this additional caching layer would not introduce a new Single Point Of Failure (SPOF). This was a concern, because the solid-state cards/disks used for caching would normally have no redundancy to maximize the available space, and I couldn’t find what happens if any of the devices fail by looking in the documentation or My Oracle Support, so my decision was to test it!
The idea behind the OSFC is to provide a second level of “buffer cache” on solid-state devices that would have better response times compared to re-reading data blocks from spinning disks. When buffer cache runs out of space clean blocks (not “dirty”) would be evicted from it and written to the OSFC. The dirty blocks would be written by DBWR to the data files first, and only then would be copied to OSFC and evicted from the buffer cache. You can read more about what it is, how it works and how to configure OSFC in Oracle Database Administrator’s Guide for 11.2 and 12.1 and in this Oracle white paper “Oracle Database Smart Flash Cache“.

In my case the OSFC was considered for a database running on an Amazon AWS EC2 instance. We used EBS volumes for ASM disks for data files, and as EBS volumes are basically attached by networks behind the scenes, we wanted to remove that little bit of I/O latency by using the instance store (ephemeral SSDs) for the Smart Flash Cache. The additional benefit from using this would be reduction of IOPS done on the EBS volumes, and that’s a big deal, as it’s not that difficult to reach the IOPS thresholds on EBS volumes.

 

Configuration

I did the testing on my VirtualBox VM, which ran Oracle Linux 7.2 and Oracle Database 12.1.0.2 EE. In my case I simply added another VirtualBox disk, that I used for OSFC (reminder, not looking for performance testing here). The device was presented to the database via a separate ASM disk group named “FLASH”. Enabling the OCFS was done by setting the following parameters in the parameter file:

  • db_flash_cache_file=’+FLASH/flash.dat’
  • db_flash_cache_size=’8G’

The 1st surprise came when I bounced the database to enable the new settings, the DB didn’t start and an error was presented “ORA-00439: feature not enabled: Server Flash Cache”. Luckily, I found a known issue in a MOS note “Database Startup Failing With ORA-00439 After Enabling Flash Cache (Doc ID 1550735.1)”, and after forcefully installing two RPMs from OL5 (enterprise-release and redhat-release-5Server), the database came up.

 

Testing

The test I chose was a really simple. These are the preparation steps I did:

  • Reduced the buffer cache of the DB to approximately 700Mb.
  • Created table T1 of size ~1598Mb.
  • Set parameter _serial_direct_read=NEVER (to avoid direct path reads when scanning large tables. I really want to cache everything this time).

The next step was Full-scanning the table by running “select count(*) from T1”, and as I was also tracing the operation to see what was happening:

    • During the 1st execution I observed the following wait events (all multi-block reads from data files, as expected), however, I new the buffer cache was too small to fit all blocks, so a large volume of the blocks would end up in OSFC when they were flushed out from the buffer cache:
      WAIT #140182517664832: nam='db file scattered read' ela= 6057 file#=10 block#=90244 blocks=128 obj#=92736 tim=19152107066
      WAIT #140182517664832: nam='db file scattered read' ela= 4674 file#=10 block#=90372 blocks=128 obj#=92736 tim=19152113919
      WAIT #140182517664832: nam='db file scattered read' ela= 5486 file#=10 block#=90500 blocks=128 obj#=92736 tim=19152121510
      WAIT #140182517664832: nam='db file scattered read' ela= 4888 file#=10 block#=90628 blocks=128 obj#=92736 tim=19152129096
      WAIT #140182517664832: nam='db file scattered read' ela= 3754 file#=10 block#=90756 blocks=128 obj#=92736 tim=19152133997
      WAIT #140182517664832: nam='db file scattered read' ela= 8515 file#=10 block#=90884 blocks=124 obj#=92736 tim=19152143891
      WAIT #140182517664832: nam='db file scattered read' ela= 7177 file#=10 block#=91012 blocks=128 obj#=92736 tim=19152152344
      WAIT #140182517664832: nam='db file scattered read' ela= 6173 file#=10 block#=91140 blocks=128 obj#=92736 tim=19152161837
      
    • The 2nd execution of the query confirmed the reads from the OSFC:
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 989 p1=0 p2=0 p3=0 obj#=92736 tim=19288463835
      WAIT #140182517664832: nam='db file scattered read' ela= 931 file#=10 block#=176987 blocks=3 obj#=92736 tim=19288465203
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 589 p1=0 p2=0 p3=0 obj#=92736 tim=19288466044
      WAIT #140182517664832: nam='db file scattered read' ela= 2895 file#=10 block#=176991 blocks=3 obj#=92736 tim=19288469577
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 1582 p1=0 p2=0 p3=0 obj#=92736 tim=19288471506
      WAIT #140182517664832: nam='db file scattered read' ela= 1877 file#=10 block#=176995 blocks=3 obj#=92736 tim=19288473665
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 687 p1=0 p2=0 p3=0 obj#=92736 tim=19288474615
      

 

Crashing it?

Once the OSFC was in use I decided to “pull out the SSD” by removing the device /dev/asm-disk03-flash that I created using udev rules and that the FLASH disk group consisted of.
Once I did it, nothing happened, so I executed the query against the T1 table again, as it would access the data in OSFC. This is what I saw:

    1. The query didn’t fail, it completed normally. The OSFC was not used, and the query transparently fell back to the normal disk IOs.
    2. I/O errors for the removed disk were logged in the alert log, followed by messages about disabling of the Flash Cache. It didn’t crash the instance!
      Tue Dec 15 17:07:49 2015
      Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
      ORA-15025: could not open disk "/dev/asm-disk03-flash"
      ORA-27041: unable to open file
      Linux-x86_64 Error: 2: No such file or directory
      Additional information: 3
      Tue Dec 15 17:07:49 2015
      WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
      path:Unknown disk
               incarnation:0x0 synchronous result:'I/O error'
               subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
               IO elapsed time: 0 usec Time waited on I/O: 0 usec
      WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
      Tue Dec 15 17:07:49 2015
      Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
      ORA-15025: could not open disk "/dev/asm-disk03-flash"
      ORA-27041: unable to open file
      Linux-x86_64 Error: 2: No such file or directory
      Additional information: 3
      ORA-15081: failed to submit an I/O operation to a disk
      WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
      path:Unknown disk
               incarnation:0x0 synchronous result:'I/O error'
               subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
               IO elapsed time: 0 usec Time waited on I/O: 0 usec
      WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
      Tue Dec 15 17:07:49 2015
      Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
      ORA-15025: could not open disk "/dev/asm-disk03-flash"
      ORA-27041: unable to open file
      Linux-x86_64 Error: 2: No such file or directory
      Additional information: 3
      ORA-15081: failed to submit an I/O operation to a disk
      ORA-15081: failed to submit an I/O operation to a disk
      WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
      path:Unknown disk
               incarnation:0x0 synchronous result:'I/O error'
               subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
               IO elapsed time: 0 usec Time waited on I/O: 0 usec
      WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
      Tue Dec 15 17:07:49 2015
      Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
      ORA-15081: failed to submit an I/O operation to a disk
      ORA-15081: failed to submit an I/O operation to a disk
      ORA-15081: failed to submit an I/O operation to a disk
      Encounter unknown issue while accessing Flash Cache. Potentially a hardware issue
      Flash Cache: disabling started for file
      0
      
      Flash cache: future write-issues disabled
      Start disabling flash cache writes..
      Tue Dec 15 17:07:49 2015
      Flash cache: DBW0 stopping flash writes...
      Flash cache: DBW0 garbage-collecting for issued writes..
      Flash cache: DBW0 invalidating existing flash buffers..
      Flash cache: DBW0 done with write disabling. Checking other DBWs..
      Flash Cache file +FLASH/flash.dat (3, 0) closed by dbwr 0
      

     

    Re-enabling the OSFC

    Once the OSFC was automatically disabled I wanted to know if it can be re-enabled without bouncing the database. I added back the missing ASM disk, but it didn’t trigger the re-enabling of the OSFC automatically.
    I had to set the db_flash_cache_size=’8G’ parameter again, and then the cache was re-enabled, which was also confirmed by a message in the alert log:

    Tue Dec 15 17:09:46 2015
    Dynamically re-enabling db_flash_cache_file 0
    Tue Dec 15 17:09:46 2015
    ALTER SYSTEM SET db_flash_cache_size=8G SCOPE=MEMORY;
    

    Conclusions

    Good news! It appears to be safe (and also logical) to configure Oracle Smart Flash Cache on non-redundant solid-state devices, as their failures don’t affect the availability of the database. However, you may experience a performance impact at the time the OSFC is disabled. I did the testing on 12.1.0.2 only, so this may behave differently in order versions.

     

    Discover more about our expertise in the world of Oracle.

    Internals of Querying the Concurrent Requests’ Queue – Revisited for R12.2

    $
    0
    0

    Once upon a time I wrote about the Internal Workflow of an E-Business Suite Concurrent Manager Process. Many things have changed since that blog post, the most obvious change being the release of Oracle e-Business Suite R12.2. I decided to check if the way the concurrent manager queues were processed by concurrent manager processes were still the same. My main goal was to see if the manager processes still don’t attempt any way of coordination to distribute the requests among them.

    This is how I did the testing:

    • I used the VM templates provided by Oracle to build my R12.2.4 test environment. By the way, I didn’t expect that the process of getting the environment up would be so simple! Downloading the media files from edelivery.oracle.com was the most time-consuming step, once done – it took me just 1 hour to un-compress everything, import the Virtual Assembly file and bring up the R12.2.4 environment on my laptop.
    • 3 Standard managers are defined by default
    • Sleep seconds were left as is = 30 seconds
    • Cache size was increased from 1 to 5.
    • Identified the 3 DB processes that belong to the Standard managers:
      select sid, serial# from v$session where module='e:FND:cp:STANDARD'
    • I enabled tracing with binds and waits for each of them like this:
      exec dbms_monitor.session_trace_enable(sid,serial#,true,true);
    • Once that was done I submitted one concurrent program – “Active users” and waited for it to complete.
    • I disabled the tracing and collected the trace files.
      exec dbms_monitor.session_trace_disable(sid,serial#);
    • Collected the trace files

    I found 2 of the trace files to be very interesting. To make things more simple, the manager process “A” will be the one that executed the concurrent request, and process “B” will be the one that didn’t.

    Before the “Active Users” Request Was Submitted

    No other requests were running at the time I did the testing, so I clearly observed how both Managers A and B queried the FND_CONCURRENT_REQUESTS table BOTH of the trace files displayed the same method of how requests are picked up from the queue. Note, I’m showing only the lines relevant to display the main query only, and I have formatted the query text to make it more readable:

    PARSING IN CURSOR #139643743645920 len=1149 dep=0 uid=100 oct=3 lid=100 tim=1460211399835915 hv=3722997734 ad='d275f750' sqlid='cd23u4zfyhvz6'
    SELECT R.Rowid
    FROM Fnd_Concurrent_Requests R
    WHERE R.Hold_Flag                             = 'N'
    AND R.Status_Code                             = 'I'
    AND R.Requested_Start_Date                   &amp;amp;amp;amp;amp;lt;= Sysdate
    AND (R.Node_Name1                            IS NULL
    OR (R.Node_Name1                             IS NOT NULL
    AND FND_DCP.target_node_mgr_chk(R.request_id) = 1))
    AND (R.Edition_Name                          IS NULL
    OR R.Edition_Name                            &amp;amp;amp;amp;amp;lt;= sys_context('userenv', 'current_edition_name'))
    AND EXISTS
      (SELECT NULL
      FROM Fnd_Concurrent_Programs P
      WHERE P.Enabled_Flag         = 'Y'
      AND R.Program_Application_Id = P.Application_Id
      AND R.Concurrent_Program_Id  = P.Concurrent_Program_Id
      AND EXISTS
        (SELECT NULL
        FROM Fnd_Oracle_Userid O
        WHERE R.Oracle_Id = O.Oracle_Id
        AND EXISTS
          (SELECT NULL
          FROM Fnd_Conflicts_Domain C
          WHERE P.Run_Alone_Flag = C.RunAlone_Flag
          AND R.CD_Id            = C.CD_Id
          )
        )
      AND (P.Execution_Method_Code                          != 'S'
      OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757)))
      )
    AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628)))
    ORDER BY NVL(R.priority, 999999999),
      R.Priority_Request_ID,
      R.Request_ID
    END OF STMT
    EXEC #139643743645920:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399835910
    FETCH #139643743645920:c=0,e=546,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399836507
    WAIT #139643743645920: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211399836572
    
    *** 2016-04-09 10:17:09.837
    WAIT #139643743645920: nam='SQL*Net message from client' ela= 30000367 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429836965
    ...
    EXEC #139643743645920:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429838767
    FETCH #139643743645920:c=0,e=689,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429839587
    WAIT #139643743645920: nam='SQL*Net message to client' ela= 4 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429839652
    
    *** 2016-04-09 10:17:39.840
    WAIT #139643743645920: nam='SQL*Net message from client' ela= 30000325 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211459840003
    ...
    

    It’s important to observe that:

    • All manager’s processes still compete for the same requests. If the query is executed at the same time, the same list of concurrent requests will be retrieved by all processes.
    • The constants literals used in lines 30-32 mean that the query for checking the queue is still built when the concurrent manager process starts up. These constants are mainly used to implement the specializations rules in the query.
    • Only rowid for the pending requests’ rows in FND_CONCURRENT_REQUESTS are fetched.
    • The sleep time is clearly visible on lines 41,42 and 48,49

    After the “Active Users” Request Was Submitted – Starting the Concurrent Request

    The manager process A was the first to pick up the submitted requests and it could be observed by the “r=1” (1 row fetched) in the FETCH call for the query we just reviewed:

    FETCH #139643743645920:c=0,e=437,p=0,cr=113,cu=0,mis=0,r=1,dep=0,og=1,plh=3984653669,tim=1460211519844640
    

    Immediately after this, the manager process A locked the row in FND_CONCURRENT_REQUESTS table, this way, the request got assigned to this process. Notice the similar where predicates used in this query, these are actually required to make sure that the request is still not picked up by another manager process. However the main thing here is the fact that the request row is accessed by the “rowid” retrieved earlier (row 45, the value of the bind variable “:reqname” is “AAAjnSAA/AAAyn1AAH” in this case). Locking of the row is done by the “FOR UPDATE OF R.status_code NoWait” clause on line 49:

    PARSING IN CURSOR #139643743640368 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519864113 hv=4239777398 ad='cde86338' sqlid='6ya6bzgybbrmq'
    SELECT R.Conc_Login_Id,
      R.Request_Id,
      ... excluded other 156 columns for brevity...
    FROM fnd_concurrent_requests R,
      fnd_concurrent_programs P,
      fnd_application A,
      fnd_user U,
      fnd_oracle_userid O,
      fnd_conflicts_domain C,
      fnd_concurrent_queues Q,
      fnd_application A2,
      fnd_executables E,
      fnd_conc_request_arguments X
    WHERE R.Status_code             = 'I'
    AND (R.Edition_Name            IS NULL
    OR R.Edition_Name              &amp;amp;amp;amp;amp;lt;= sys_context('userenv', 'current_edition_name'))
    AND R.Request_ID                = X.Request_ID(+)
    AND R.Program_Application_Id    = P.Application_Id(+)
    AND R.Concurrent_Program_Id     = P.Concurrent_Program_Id(+)
    AND R.Program_Application_Id    = A.Application_Id(+)
    AND P.Executable_Application_Id = E.Application_Id(+)
    AND P.Executable_Id             = E.Executable_Id(+)
    AND P.Executable_Application_Id = A2.Application_Id(+)
    AND R.Requested_By              = U.User_Id(+)
    AND R.Cd_Id                     = C.Cd_Id(+)
    AND R.Oracle_Id                 = O.Oracle_Id(+)
    AND Q.Application_Id            = :q_applid
    AND Q.Concurrent_Queue_Id       = :queue_id
    AND (P.Enabled_Flag            IS NULL
    OR P.Enabled_Flag               = 'Y')
    AND R.Hold_Flag                 = 'N'
    AND R.Requested_Start_Date     &amp;amp;amp;amp;amp;lt;= Sysdate
    AND ( R.Enforce_Seriality_Flag  = 'N'
    OR ( C.RunAlone_Flag            = P.Run_Alone_Flag
    AND (P.Run_Alone_Flag           = 'N'
    OR NOT EXISTS
      (SELECT NULL
      FROM Fnd_Concurrent_Requests Sr
      WHERE Sr.Status_Code         IN ('R', 'T')
      AND Sr.Enforce_Seriality_Flag = 'Y'
      AND Sr.CD_id                  = C.CD_Id
      ))))
    AND Q.Running_Processes                                     &amp;amp;amp;amp;amp;lt;= Q.Max_Processes
    AND R.Rowid                                                  = :reqname
    AND ((P.Execution_Method_Code                               != 'S'
    OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID)       IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757))))
    AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628)))
    FOR UPDATE OF R.status_code NoWait
    

    The behavior of the manager process B was a little bit more interesting. It too managed to fetch the same rowid from FND_CONCURRENT_PROCESSES table belonging to the submitted “Active Users” processes. However, when it tried to lock the row in FND_CONCURRENT_REQUESTS (By using exactly the same query), this happened:

    PARSING IN CURSOR #139690311998256 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519900924 hv=4239777398 ad='cde86338' sqlid='6ya6bzgybbrmq'
    ...
    BINDS #139690311998256:
    ...
    Bind#2
      oacdty=01 mxl=32(18) mxlc=00 mal=00 scl=00 pre=00
      oacflg=20 fl2=1000001 frm=01 csi=873 siz=0 off=64
      kxsbbbfp=7f0c2f713f20  bln=32  avl=18  flg=01
      value="AAAjnSAA/AAAyn1AAH"
    EXEC #139690311998256:c=1000,e=1525,p=0,cr=25,cu=1,mis=0,r=0,dep=0,og=1,plh=4044729389,tim=1460211519902727
    ERROR #139690311998256:err=54 tim=1460211519902750
    

    The query failed with “ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired”.
    This is how the access to pending concurrent requests is serialized to make sure only one of the manager processes can run it. And, I think, relying on the well-tuned and highly efficient locking mechanism of Oracle Database is a very very smart idea.

    Conclusions

    • The coordination between manager processes is still not happening to distribute the requests, but the managers all query the queue the same way and then compete between themselves to lock the requests’ entries on the table 1st. The process that gets the lock also gets to execute the concurrent request.
    • The cache size variable couldn’t be observed in the trace files, but as far as I remember from my previous research the process would only fetch “cache size”-number of rowids using the 1st query in this post. This could be tested by submitting larger volume of requests simultaneously.
    • The “sleep seconds” kicks in only when the manager process didn’t fetch any rowids from the queue. After all the cached requests are attempted/executed by the manager process, the queue is checked again immediately without waiting for the “sleep seconds” (Not explained in detail in this post, but it’s revealed in the trace files)
    • The DMLs used to query the FND_CONCURRENT_REQUESTS and to lock the row are very very similar to Pre-R12.2 releases of e-Business Suite (Another sign that the process hasn’t changed, though one change that I see is the addition of where clause predicates for Checking the Editions).

    The getMOSPatch V2 Is Here!

    $
    0
    0

    A while ago I created a Bash script called getMOSPatch.sh. It’s purpose was to allow downloading Oracle patches directly from My Oracle Support to your server without having to visit the support site (of course if you know the patch number already). Today, I announce a new version of the tool “getMOSPatch V2”!

    I received mostly very good feedback about the first version of the script, and it probably saved few hours I’d spent on MOS site downloading patches for myself too i.e. when I was working on a project where 900+ needed to be downloaded. But there were some issues too, like, sensitivity to different versions of the utilities the script uses (curl, wget, egrep), and let’s be honest, it worked only on Linux (even though one could have bash on other platforms too). So, I really wanted to make it less dependent on utilities’ versions and wanted it to be more portable, which lead me to understanding that there’s no future for the bash script. 3 Billion Devices “Run” Java, so I couldn’t do anything else but use java for the new version of the getMOSPatch. That was actually an easy decision, because 1) I needed portability, and now you can run the tool with good chances on any platform if you have JRE 1.6 or higher, and 2) JRE is normally bundled with most of Oracle’ software (at least software I work with). So, java it was, and that’s why the new name goes without “.sh” at the end.

    The new home for getMOSPatch is here on GitHub: https://github.com/MarisElsins/getMOSPatch.
    The parameters you can use for getMOSPatch are the same as before, but if you need you’ll find the usage instructions here.
    One thing that changed for the new tool is the way how it’s executed. After downloading the getMOSPatch.jar you’ll use “java -jar getMOSPatch.jar” for JRE 1.7 and later, or “java -Dhttps.protocols=TLSv1 -jar getMOSPatch.jar” for most versions of JRE 1.6 (because the support for TLSv1.1 or TLSv1.2 was not added yet).

    Let’s take a look at a typical use case of the utility. I’ll be using the JRE bundled in the 12c Oracle Home on Linux x86-64 to download the newest version of OPatch:

    [oracle@lab12c ~]$ wget "https://github.com/MarisElsins/getMOSPatch/raw/master/getMOSPatch.jar" -q
    
    [oracle@lab12c ~]$ . oraenv
    ORACLE_SID = [LAB12c] ? LAB12c
    The Oracle base remains unchanged with value /u01/app/oracle
    
    [oracle@lab12c ~]$ export PATH=$ORACLE_HOME/jdk/jre/bin:$PATH
    
    [oracle@lab12c ~]$ java -version
    java version "1.6.0_75"
    Java(TM) SE Runtime Environment (build 1.6.0_75-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 20.75-b01, mixed mode)
    
    [oracle@lab12c ~]$ java -Dhttps.protocols=TLSv1 -jar getMOSPatch.jar patch=6880880 MOSUser=elsins@pythian.com
    Enter your MOS password:
    Platforms and languages need to be reset.
    Obtaining the list of platforms and languages:
    ...
    46P - Linux x86
    226P - Linux x86-64
    ...
    Enter Comma separated platforms to list: 226P
    
    We're going to download patches for the following Platforms/Languages:
     226P - Linux x86-64
    
    Processing patch 6880880 for Linux x86-64 and applying regexp .* to the filenames:
     1 - p6880880_112000_Linux-x86-64.zip
     2 - p6880880_121010_Linux-x86-64.zip
     3 - p6880880_132000_Generic.zip
     4 - p6880880_111000_Linux-x86-64.zip
     5 - p6880880_131000_Generic.zip
     6 - p6880880_101000_Linux-x86-64.zip
     7 - p6880880_102000_Linux-x86-64.zip
     Enter Comma separated files to download: 2
    
    Downloading all selected files:
     Downloading p6880880_121010_Linux-x86-64.zip: 120MB at average speed of 5671KB/s - DONE!
    

    And here’s another example of this tool running in Windows Powershell environmnet:

    getMOSPatch on Windows

    getMOSPatch on Windows

    If you spot any issues or have ideas for improvements, let me know by commenting this post, or submit the issues directly on GitHub here: https://github.com/MarisElsins/getMOSPatch/issues

    Investigating IO Performance on Amazon RDS for Oracle

    $
    0
    0

    I’ve recently been involved in quite a few database migrations to Oracle RDS. One thing that I had noticed when dealing with post-migration performance issues was related to queries that used TABLE SCAN FULL in their execution. It seemed, that in many cases, it just took a single query to max out the allocated IOPS (IOs per second) or bandwidth, which in turn would caused overall slowness of the RDS instance.

    The search in documentation showed that it could have been caused by how IO operations are counted on Amazon RDS, as it’s quite different from what a routine Oracle DBA like me would expect. For multi-block reads the database (depending on storage) would typically issue IOs of size up to 1MB, so if an 8K block size was used the table scans would read up to 128 blocks in a single IO of db file scattered read or direct path read.

    Now, pay attention to what the AWS documentation says:
    While Provisioned IOPS (io1 storage) can work with I/O sizes up to 256 KB, most databases do not typically use such large I/O. An I/O request smaller than 32 KB is handled as one I/O; for example, 1000 16 KB I/O requests are treated the same as 1000 32 KB requests. I/O requests larger than 32 KB consume more than one I/O request; Provisioned IOPS consumption is a linear function of I/O request size above 32 KB. For example, a 48 KB I/O request consumes 1.5 I/O requests of storage capacity; a 64 KB I/O request consumes 2 I/O requests, etc. … Note that I/O size does not affect the IOPS values reported by the metrics, which are based solely on the number of I/Os over time. This means that it is possible to consume all of the IOPS provisioned with fewer I/Os than specified if the I/O sizes are larger than 32 KB. For example, a system provisioned for 5,000 IOPS can attain a maximum of 2,500 IOPS with 64 KB I/O or 1,250 IOPS with 128 KB IO.
    … and …
    I/O requests larger than 32 KB are treated as more than one I/O for the purposes of PIOPS capacity consumption. A 40 KB I/O request will consume 1.25 I/Os, a 48 KB request will consume 1.5 I/Os, a 64 KB request will consume 2 I/Os, and so on. The I/O request is not split into separate I/Os; all I/O requests are presented to the storage device unchanged. For example, if the database submits a 128 KB I/O request, it goes to the storage device as a single 128 KB I/O request, but it will consume the same amount of PIOPS capacity as four 32 KB I/O requests.

    Based on the statements above it looked like the large 1M IOs issued by the DB would be accounted as 32 separate IO operations, which would obviously exhaust the allocated IOPS much sooner than expected. The documentation talks only about Provisioned IOPS, but I think this would apply to General Purpose SSDs (gp2 storage) too, for which the IOPS baseline is 3 IOPS/GB (i.e. 300 IOPS if allocated size is 100GB of gp2).

    I decided to do some testing to find out how RDS for Oracle handles large IOs.

    The Testing

    For testing purposes I used the following code to create a 1G table (Thanks Connor McDonald and AskTom):

    ORCL> create table t(a number, b varchar2(100)) pctfree 99 pctused 1;
    
    Table T created.
    
    ORCL> insert into t  values (1,lpad('x',100));
    
    1 row inserted.
    
    ORCL> commit;
    
    Commit complete.
    
    ORCL> alter table t minimize records_per_block;
    
    Table T altered.
    
    ORCL> insert into t select rownum+1,lpad('x',100) from dual connect by level<131072; 131,071 rows inserted. ORCL> commit;
    
    Commit complete.
    
    ORCL> exec dbms_stats.gather_table_stats(user,'T');
    
    PL/ORCL procedure successfully completed.
    
    ORCL> select sum(bytes)/1024/1024 sizemb from user_segments where segment_name='T';
    SIZEMB
    1088
    
    
    ORCL> select value/1024/1024 buffer_cache from v$sga where name='Database Buffers';
    BUFFER_CACHE
    1184
    

    The code for testing will be:

    exec rdsadmin.rdsadmin_util.flush_buffer_cache;
    alter session set "_serial_direct_read"=always;
    alter session set db_file_multiblock_read_count=&1;
    
    -- Run FTS against T table forever.
    declare
      n number:=1;
    begin
      while n>0
      loop
        select /*+ full(t) */ count(*) into n from t;
      end loop;
    end;
    /
    
    

    Basically, I’ll flush the buffer cache, which will force the direct path reads by setting _serial_direct_read to “ALWAYS”, and then, will choose the db_file_multiblock_read_count based on how big IOs I want to issue (note, by default the db_file_multiblock_read_count is not set on RDS, and it resolves to 128, so the maximum size of an IO from the DB is 1 MB), I’ll test with different sizes of IOs, and will Capture the throughput and effective IOPS by using the “Enhanced Monitoring” of the RDS instance.

    Side-note: the testing I had to do turned out to be more complex than I had expected before I started. In few cases, I was limited by the instance throughput before I could reach the maximum allocated IOPS, and due to this, the main testing needed to be one on large enough instance (db.m4.4xlarge), that had more of the dedicated EBS-throughput.

    The Results

    Provisioned IOPS storage

    Testing was done on a db.m4.4xlarge instance that was allocated 100GB of io1 storage of 1000 Provisioned IOPS. The EBS-optimized throughput for such instance is 256000 KB/s.
    The tests were completed by using db_file_multiblock_read_count of 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 16, 32, 64 and 128.
    For each test the Throughput and IO/s were captured (from RDS CloudWatch graphs), and also the efficient IO size was derived.
    The DB instance was idle, but still, there could be few small IO happening during the test.

    Provisioned IOPS Measured Throughput

    Provisioned IOPS Measured Throughput

    Provisioned IOPS Measured IO/s

    Provisioned IOPS Measured IO/s

    From the graphs above the following features that are not documented can be observed:

    • The RDS instance is dynamically choosing the physical IO size (I’ll call them “physical“, just to differentiate that these are the IOs to storage, while in fact, that’s only what I see in the CLoudWatch graphs, the real physical IO could be something different) based on the size of the IO request from the database. The possible physical IO sizes appear to be 16K, 32K, 64K, 128K and probably also 8K (this could also be in fact 16K physical IO reading just 8K of data)
    • The IOPS limit applies only to smaller physical IOs sizes (up to 32K), for larger physical IOs (64K, 128K) the throughput is the limiting factor of the IO capability. The throughput limit appears to be quite close to the maximum throughput that the instance is capable of delivering, but at this point, it’s not clear how the throughput limit for particular situation is calculated.

    Throughput Limits for Provisioned IOPS

    I ran additional tests on differently sized instances with io1 storage to understand better how the maximum throughput was determined. The graph below represents the throughput achieved on different instances, but all had the same 100G of 1000 PIOPS io1 storage. The throughput was done by using db_file_multiblock_read_count=128:

    PIOPS Throughput by Instance Type

    PIOPS Throughput by Instance Type

    it appears that the maximum throughput is indeed limited by the instance type, except for the very largest instance db.m4.10xlarge (For this instance the situation is somewhat weird even in the documentation because the maximum throughput is mentioned as 500 MB/s, but the maximum throughput for a single io1 EBS volume, which should be there underneath the RDS, is just 320 MB/s, and I was unable to reach any of these limits)

    General Purpose SSD storage

    Testing was done on a db.m4.4xlarge instance that was allocated 100GB of gp2 storage with 300 IOPS baseline. The EBS-optimized throughput for such instance is 256000 KB/s.
    The tests were conducted similarly to how they were done for Provisioned IOPS above (note, this is the baseline performance, not burst performance)

    General Purpose SSD Measured Throughput

    General Purpose SSD Measured Throughput

    General Purpose SSD Measured IO/s

    General Purpose SSD Measured IO/s

    Similarly to Provisioned IOPS, the General Purpose SSD storage behaves differently from what’s explained in the documentation:

    • The physical IO size again is calculated dynamically based on the size of the IO request from the database. The possible sizes appear to be the same as for io1: (8K), 16K, 32K, 64K and 128K.
    • The IOPS limit (to baseline level) appears to apply to IO sizes only up to 16K (compared to 32K in case of Provisioned IOPS), for larger physical IOs starting from 32K, the limit appears to be throughput-driven.
    • It’s not clear how the throughput limit is determined for the particular instance/storage combination, but in this case, it appeared to be around 30% of the maximum throughput for the instance, however, I didn’t confirm the same ratio for db.m4.large where the maximum achievable throughput depended on the allocated size of the gp2 storage.

    Burst Performance

    I haven’t collected enough data to derive anything concrete, but during my testing I observed that Burst performance applied to both maximum IOPS and also the maximum throughput. For example, while testing on db.m4.large (max instance throughput of 57600 KB/s) with 30G of 90 IOPS baseline performance, I saw that for small physical IOs it allowed bursting up to 3059 IOPS for short periods of time, while normally it indeed allowed only 300 IOPS. For larger IOs (32K+), the baseline maximum throughput was around 24500 KB/s, but the burst throughput was 55000 KB/s

    Throughput Limits for General Purpose SSD storage

    I don’t really know how the maximum allowed throughput is calculated for different instance type and storage configuration for gp2 instances, but one thing is clear: that instance size, and size of the allocated gp2 storage are considered in determining the maximum throughput. I was able to achieve the following throughput measures when gp2 storage was used:

    • 75144 KB/s (133776 KB/s burst) on db.m4.4xlarge (100G gp2)
    • 54500 KB/s (same as burst, this is close to the instance limit) on db.m4.large (100G gp2)
    • 24537 KB/s (54872 KB/s burst) on db.m4.large (30G gp2)
    • 29116 KB/s (burst was not measured) on db.m4.large (40G gp2)
    • 37291 KB/s (burst was not measured) on db.m4.large (50G gp2)

    Conclusions

    The testing provided some insight into how the maximum performance of IO is determined on Amazon RDS for Oracle, based on the instance type, storage type, and volume size. Despite finding some clues I also understood that managing IO performance on RDS is far more difficult than expected for mixed size IO requests that are typically issued by Oracle databases. There are many questions that still need to be answered (i.e. how the maximum throughput is calculated for gp2 storage instances) and it’d take many many hours to find all the answers.

    On the other-hand, the testing already revealed a few valuable findings:

    1. Opposite to the documentation that states that all IOs are measured and accounted in 32KB units, we found that IO units reported by Amazon can be of sizes 8K (probably), 16K, 32K, 64K and 128K
    2. For small physical IOs (up to 32K in case of Provisioned IOPS and up to 16K in case of General Purpose SSD) the allocated IOPS is used as the limit for the max performance.
    3. For larger physical IOs (from 64K in case of Provisioned IOPS and from 32K in case of General Purpose SSD) the throughput is used as the limit for the max performance, and the IOPS limit no longer applies.
    4. The Burst performance applies to both IOPS and throughput

    P.S. As to my original issue of a single TABLE SCAN FULL severely impacting the overall performance, I found that in many cases we were using small RDS instances db.m3.large or db.m4.large, for which the maximum throughput was ridiculously small, and we were hitting the throughput limitation, not the IOPS limit that actually didn’t apply to the larger physical IOs on gp2 storage.

    Investigating IO performance on Amazon EC2

    $
    0
    0

    I published a blog post called “Investigating IO Performance on Amazon RDS for Oracle” recently, and soon after posting it I received several questions asking if IO worked the same way on EC2 instances. My immediate though was it did, mostly because RDS for Oracle is basically an EC2 instance with Oracle Database on top of it, where the configuration is fully managed by Amazon. But as always, it’s better to test than assume, so here we go!

    Although the testing was done by running a workload in an Oracle database, the results will apply to any other type of workload because the performance characteristics purely depend on the type of instance, type of EBS volume and the size of the IO requests, and it doesn’t matter how the data is processed after it’s retrieved from the storage.

    The Testing

    The testing was done exactly the same way as described in the previous blog post, the only difference was that I had to create an oracle database manually by myself. I used the 12.1.0.2.0 database Enterprise Edition and ASM, and the EBS volume was used as an ASM disk.

    Measuring the IO performance

    On RDS we had the nice Enhanced Monitoring which I set up with a refresh interval of a few seconds and I used it to collect performance statistics quickly. For EC2 (specifically for EBS volumes), there is no such thing as enhanced monitoring, so I needed to use the standard CloudWatch monitoring with the minimum refresh rate of 5 minutes (very inconvenient, because a single test case would have to be run for 5 to 10 minutes to collect reliable data). This was not acceptable, so I looked for alternatives and found that iostat displayed the same values the monitoring graphs did:

    [root@ip-172-31-21-241 ~]# iostat 5 500
    ...
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.32    0.00    0.11    0.11    0.43   99.04
    
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.00         0.00         0.00          0          0
    xvdf              0.40         0.00         8.00          0         40
    xvdg           3060.00     24496.00        14.40     122480         72
    ...

    the “tps” showed IO per second, and “kB_read/s”+”kB_wrtn/s” allowed me to calculate the throughput (I actually ended up using just the kB_read/s as my workload was 100% read only and the values in kB_wrtn/s were tiny).
    iostat is even more convenient to use than the enhanced monitoring, it didn’t take long to see the first benefit of EC2 over RDS!

    The Results

    It was no surprise, the outcome of testing on EC2 was quite similar to the results from testing on RDS.

    Provisioned IOPS Storage on an EC2 Instance

    As on RDS, the testing was done on db.m4.4xlarge with 100G of io1 with 1000 provisioned IO/s. Also the results are very very similar, the only notable difference that I could observe (although I can’t explain it, and I’m not sure there is a pattern in it or not, as I did’t do too many tests), was the fact that the throughput for 64K-96K IOs didn’t reach the same level as 128K+ IOs.

    Provisioned IOPS Throughput (on EC2)

    Provisioned IOPS Measured IO/s (on EC2)

    These results confirm that (the same as with RDS), there are several sizes of physical IOs: (8), 16, 32, 64 and 128, and starting with 64K, the performance is throughput-bound, but with IOs of smaller size, it’s IOPS-bound.

    General Purpose Storage on an EC2 Instance

    The testing with General Purpose SSDs (100G with 300 baseline IOPS) didn’t provide any surprises and the results were exactly the same as for RDS.
    The only difference in the graphs is the “bust performance” measures for IOs of different sizes that I’ve added to outline how the “bursting” improves both IO/s and Throughput.

    General Purpose SSD Throughput (on EC2)

    General Purpose SSD Measured IO/s (on EC2)

    These results also confirm that (the same as with RDS), there are several sizes of physical IOs: 16, 32, 64 and 128, and starting with 32K, the performance is throughput-bound, but with IOs of smaller size, it’s IOPS-bound.

    Additional Flexibility with EC2

    Using Multiple gp2 Volumes

    Opposite to RDS, I can configure my storage and instance more freely, so instead of having just a single gp2 volume attached to it I added five 1G-sized (yes tiny) volumes to the +DATA disk group. the minimum IOPS for a gp2 volume is 100, so my 5 volumes gave cumulative 500 baseline IOPS. As ASM was used, the IOs were +/- evenly distributed between the volumes.

    I didn’t do too thorough testing, but still I noticed a few things.
    Take a look at these iostat outputs from testing done with 8K reads (this is burst performance):

    [root@ip-172-31-21-241 ~]# iostat 5 500
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.00         0.00         0.00          0          0
    xvdf              3.40         0.00        16.80          0         84
    xvdg           1203.40      9632.00         1.60      48160          8
    xvdi           1199.60      9596.80         0.00      47984          0
    xvdj           1211.60      9691.20         3.20      48456         16
    xvdk           1208.60      9670.40         0.00      48352          0
    xvdl           1203.00      9625.60         3.20      48128         16

     

    • Bursting performance applies to each volume separately. It should allow getting up to 3000 IOPS per volume, but I reached only ~1200 per volume with cumulative throughput of 48214 KB/s (not even close to the limit). So there’s some other limit or threshold that applies to this configuration (and it’s not the CPU). But look! I’ve got 6024 IO/s burst performance, which is quite remarkable for just 5G.
    • As I was not hitting the maximum 3000 bursting IOPS per volume, the burst credit was running out much slower. if it lasts normally ~40 minutes at 3000 IOPS, it lasts ~3 times longer at ~1200 IOPS, which would allow running at better performance longer (i.e if one used 5x2G volumes instead of 1x10G volume)

    This iostat output is from testing done with 1M reads (this is burst performance):

    [root@ip-172-31-21-241 ~]# iostat 5 500
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.00         0.00         0.00          0          0
    xvdf              3.40         0.00        16.80          0         84
    xvdg            384.40     48820.80         0.80     244104          4
    xvdi            385.80     49155.20         0.00     245776          0
    xvdj            385.00     49014.40         6.40     245072         32
    xvdk            386.80     49225.60         0.00     246128          0
    xvdl            385.00     48897.60         6.40     244488         32

     

    • The cumulative throughput is 245111 KB/s, which is very close to the throughput limit of the instance. I wasn’t able to reach such throughput on a single volume of gp2, where the maximum I observed was just 133824 KB/s, and 163840 KB/s is a throughput limit for a single gp2 volume which was bypassed too. It appears that configuring multiple volumes allows reaching the instance throughput limit that was not possible with a single volume.

    I didn’t run any non-burst tests as it required too much time (2 hours of waiting to exhaust the burst credits).

    Database with a 32K Block Size

    We have observed that starting with 32K block reads the EBS volume become’s throughput-bound, not IOPS-bound. Obviously I wanted to see how it performed if the database was created with a 32K block size.
    I ran a few very simple tests using 1 data block sized IOs (32K) on these two configurations:

    1. db.m4.4xlarge with 100G / 1000 PIOPS (io1)
    2. db.m4.4xlarge with 20G / 100 IOPS (gp2)

    There were no surprises on the Provisioned IOPS storage and I got the 1000 IOPS that were provisioned (actually it was slightly better – 1020 IO/s), and the throughput was 32576.00 KB/s
    On General Purpose SSD, the story was different – we know that starting from 32K-sized IOs, the performance becomes throughput-bound, and it was confirmed here too:

    • During burst period I measured up to 4180 IO/s at 133779 KB/s, which was 4 times faster than Provisioned SSD.
    • During non-burst period I measured up to 764 IOs at 24748 KBs/s throughput. Which is somewhat slower than Provisioned SSD. Also 24748 KBs/s, was slower than the throughput I measured on a 100G gp2 volume (we already ow that the non-burst throughput limit for gp2 depends on the size of the disk). If I used a 100G gp2 volume, I’d get 2359 IO/s at 75433 KB/s (this is from the graph above), which is also better that what one can get from a Provisioned SSD volume, and costs less.

    Conclusions

    Most of the conclusions were already outlined in the previous blog post, and they also apply to the EC2 instances when a single EBS volume is used for storage.

    On the other side, the EC2 instance allows System administrators and DBAs (or should I say “Cloud Administrator”) to work around some of the limitations by changing the “variables” that can’t be altered on RDS – like, the block size of the database (which is 8K on RDS), and the number of EBS volumes behind the RDS configuration. Using a 32K block size for a database residing on General Purpose volume allows bypassing the IOPS limitation completely, and only throughput limits stay in effect. However, if 32K block size is not an option (as for Oracle e-Business Suite), then the IOPS and throughput can still be maximized by using a configuration of multiple GP2 volumes.

    After having all these tests done, I think the only reason for using RDS instead of EC2 is the database management that is provided by Amazon. If that is something very critical for your requirements, it’s the way to go. If it’s not something you require – the EC2 can be configured to perform better for the same price, but you need to think about it’s maintenance by yourself.


    Automating Password Rotation for Oracle Databases

    $
    0
    0

    Password rotation is not the most exciting task in the world, and that’s exactly why it’s a perfect candidate for automation. Automating routine tasks like this are good for everyone – DBAs can work on something that’s more exciting, companies save costs as less time is spent on changing the passwords, and there’s no place for human error, either. At Pythian, we typically use Ansible for task automation, and I like it mainly because of its non-intrusive configuration (no agents need to be installed on the target servers), and its scalability (tasks are executed in parallel on the target servers). This post will briefly describe how I automated password rotation for oracle database users using Ansible.

    Overview

    This blog post is not an intro to what is Ansible and how to use it, but it’s rather an example of how a simple task can be automated using Ansible in a way that’s scalable, flexible and easily reusable, and also provides the ability for other tasks to pick up the new passwords from a secure password store.

    • Scalability – I’d like to take advantage of Ansible’s ability of executing tasks on multiple servers at the same time. For example, in a large environments of tens or hundreds of machines, a solution that executes password change tasks serially would not be suitable. This would be an example of a “serial” task (it’s not a real thing, but just an illustration that it “hardcodes” a few “attributes” (environment file, the username and the hostname), and creating a separate task for every user/database you’d want to change the password for would be required:
      - hosts: ora-serv01
        remote_user: oracle
        tasks:
        - name: change password for SYS
          shell: |
            . TEST1.env && \
            sqlplus / as sysdba @change_pasword.sql SYS \
            \"{{lookup('password','/dev/null length=8')}}\"
      
    • Flexible – I want to be able to adjust the list of users for which the passwords are changed, and the list of servers/databases that the user passwords are changed for in a simple way, that doesn’t include changing the main task list.
    • Reusable – this comes together with flexibility. The idea is that the playbook would be so generic, that it wouldn’t require any changes when it’s implemented in a completely separate environment (i.e. for another client of Pythian)
    • Secure password store – the new passwords are to be generated by the automated password rotation tool, and a method of storing password securely is required so that the new passwords could be picked up by the DBAs, application owners or the next automated task that would reconfigure the application

    The implementation

    Prerequisites

    I chose to do the implementation using Ansible 2.3, because it introduces the passwordstore lookup, which enables interaction with the pass utility (read more about it in Passwordstore.org). pass is very cool. It store passwords in gpg-encrypted files, and it can also be configured to automatically update the changes to a git repository, which relieves us of the headache of password distribution. The password can be retrieved from git on the servers that need the access to the new passwords.

    Ansible 2.3 runs on python 2.6, unfortunately, the passwordstore lookup requires Python 2.7, which can be an issue if the control host for Ansible runs on Oracle Linux 6 or RHEL 6, as they don’t provide Python 2.7 in the official yum repositories. Still, there are ways of getting it done, and I’ll write another blog post about it.

    So, what we’ll need is:

    • Ansible 2.3
    • jmespath plugin on Ansible control host (pip install jmespath)
    • jinja2 plugin on Ansible control host (I had to update it using pip install -U jinja2 in few cases)
    • Python 2.7 (or Python 3.5)
    • pass utility

    The Playbook

    This is the whole list of files that are included in the playbook:

    ./chpwd.yml
    ./inventory/hosts
    ./inventory/orcl1-vagrant-private_key
    ./inventory/orcl2-vagrant-private_key
    ./roles/db_users/files/change_password.sql
    ./roles/db_users/files/exists_user.sql
    ./roles/db_users/defaults/main.yml
    ./roles/db_users/tasks/main.yml
    

    Let’s take a quick look at all of them:

    • ./chpwd.yml – is the playbook and (in this case) it’s extremely simple as I want to run the password change against all defined hosts:
      $ cat ./chpwd.yml
      ---
      
        - name: password change automation
          hosts: all
          roles:
            - db_users
      
    • ./inventory/hosts, ./inventory/orcl1-vagrant-private_key, ./inventory/orcl2-vagrant-private_key – these files define the hosts and the connectivity. In this case we have 2 hosts – orcl1 and orcl2, and we’ll connect to vagrant user using the private keys.
      $ cat ./inventory/hosts
      [orahosts]
      orcl1 ansible_host=127.0.0.1 ansible_port=2201 ansible_ssh_private_key_file=inventory/orcl1-vagrant-private_key ansible_user=vagrant
      orcl2 ansible_host=127.0.0.1 ansible_port=2202 ansible_ssh_private_key_file=inventory/orcl2-vagrant-private_key ansible_user=vagrant
    • ./roles/db_users/files/change_password.sql – A sql script that I’ll execute on the database to change the passwords. It takes 2 parameters the username and the password:
      $ cat ./roles/db_users/files/change_password.sql
      set ver off pages 0
      alter user &1 identified by "&2";
      exit;
    • ./roles/db_users/files/exists_user.sql – A sql script that allows verifying the existence of the users. It takes 1 argument – the username. It outputs “User exists.” when the user is there, and “User {username} does not exist.” – when it’s not.
      $ cat ./roles/db_users/files/exists_user.sql
      set ver off pages 0
      select 'User exists.' from all_users where username=upper('&1')
      union all
      select 'User '||upper('&1')||' does not exist.' from (select upper('&1') from dual minus select username from all_users);
      exit;
    • ./roles/db_users/defaults/main.yml – is the default file for the db_users role. I use this file to define the users for each host and database for which the passwords need to be changed:
      $ cat ./roles/db_users/defaults/main.yml
      ---
      
        db_users:
          - name: TEST1
            host: orcl1
            env: ". ~/.bash_profile && . ~/TEST1.env > /dev/null"
            pwdstore: "orcl1/TEST1/"
            os_user: oracle
            become_os_user: yes
            users:
              - dbsnmp
              - system
          - name: TEST2
            host: orcl2
            env: ". ~/.bash_profile && . ~/TEST2.env > /dev/null"
            pwdstore: "orcl2/TEST2/"
            os_user: oracle
            become_os_user: yes
            users:
              - sys
              - system
              - ctxsys
          - name: TEST3
            host: orcl2
            env: ". ~/.bash_profile && . ~/TEST3.env > /dev/null"
            pwdstore: "orcl2/TEST3/"
            os_user: oracle
            become_os_user: yes
            users:
              - dbsnmp

      In this data structure, we define everything that’s needed to be known to connect to the database and change the passwords. each entry to the list contains the following data:

      • name – just a descriptive name of the entry in this list, normally it would be the name of the database that’s described below.
      • host – the host on which the database resides. It should match one of the hosts defined in ./inventory/hosts.
      • env – how to set the correct environment to be able to connect to the DB (currently it requires sysdba connectivity).
      • pwdstore – the path to the folder in the passwordstore where the new passwords will be stored.
      • os_user and become_os_user – these are used in case sudo to another user on the target host is required. In a typical configuration, I connect to the target host using a dedicated user for ansible, and then sudo to the DB owner. if ansible connects to the DB onwer directly, then become_os_user should be set to “no”.
      • users – this is the list of all users for which the passwords need to be changed.

      As you see, this structure greatly enhances the flexibility and reusability, because adding new databases, hosts or users to the list would be done by a simple change to the “db_users:” structure in this defaults file. In this example, dbsnmp and system passwords are rotated for TEST1@orcl1, sys, system and ctxsys passwords are rotated for TEST2@orcl2, and dbsnmp on TEST3@orcl2

    • ./roles/db_users/tasks/main.yml – this is the task file of the db_users role. The soul of the playbook and the main part that does the password change depending on the contents in the defaults file described above. Instead of pasting the whole at once, I’ll break it up task by task, and will provide some comments about what’s being done.
      • populate host_db_users – This task simply filters the whole db_users data structure that’s defined in the defaults file, and creates host_db_users fact with only the DBs that belong to the host the task is currently run on. Using the ansible “when” conditional would also be possible to filter the list, however in such case there’s a lot of “skipped” entries displayed when the task is executed, so I prefer filtering the list before it’s even passed to the Ansible task.
        ---
        
          - name: populate host_db_users
            set_fact: host_db_users="{{ db_users | selectattr('host','equalto',ansible_hostname) | list }}"
        
      • create directory for target on db hosts – for each unique combination of os_user and become_os_user on the target host, and “ansible” directly is created. A json_query is used here, to filter just the os_user and become_os_user attributes that are needed. It would also work with with_items: "{{ host_db_users }}", but in that case, the outputs become cluttered as the attributes are displayed during the execution.
          - name: create directory for target on db hosts
            file:
              path: "ansible"
              state: directory
            become_user: "{{ item.os_user }}"
            become: "{{ item.become_os_user }}"
            with_items: "{{ host_db_users | json_query('[*].{os_user: os_user, become_os_user: become_os_user }') | unique | list }}"
        
      • copy sql scripts to db_hosts – the missing scripts are copied from Ansible control host to the target “ansible” directories. “with_nested” is the method to create a loop in Ansible.
          - name: copy sql scripts to db_hosts
            copy:
              src="{{ item[1] }}"
              dest=ansible/
              mode=0644
            become_user: "{{ item[0].os_user }}"
            become: "{{ item[0].become_os_user }}"
            with_nested:
              - "{{ host_db_users | json_query('[*].{os_user: os_user, become_os_user: become_os_user }') | unique | list }}"
              - ['files/change_password.sql','files/exists_user.sql']
        
      • verify user existence – I’m using a shell module to execute the sql script after setting the environment. The outputs are collected in “exists_output” variable. This task will not fail and will not show as “changed” because of failed_when and changed_when settings of “false”.
          - name: verify user existence
            shell: |
               {{ item[0].env }} && \
               sqlplus -S / as sysdba \
               @ansible/exists_user.sql {{ item[1] }}
            register: exists_output
            become_user: "{{ item[0].os_user }}"
            become: "{{ item[0].become_os_user }}"
            with_subelements:
              - "{{ host_db_users |json_query('[*].{env: env, os_user: os_user, users: users, become_os_user: become_os_user }') }}"
              - users
            failed_when: false
            changed_when: false
        
      • User existence results – this task will fail when any of the users didn’t exist, and will display which user it was. This is done in a separate task to produce cleaner output, and in case it’s not wanted to fail if any of the users don’t exist (continue to change passwords for the existing users), this task can simply be commented or the “failed_when: false” can be uncommented.
          - name: User existence results
            fail: msg="{{ item }}"
            with_items: "{{ exists_output.results|rejectattr('stdout','equalto','User exists.')|map(attribute='stdout')|list }}"
            #failed_when: false
        
      • generate and change the user passwords – finally, this is the task that actually changes the passwords. The successful password change is detected by checking the output from the sqlscript, which should produce “User altered.” The rather complex use of lookups is there for a reason: the passwordstore lookup can also generate passwords, but it’s not possible to define the character classes that the new password should contain, however the “password” lookup allows defining these. Additionally, the 1st character is generated only containing “ascii_letters”, as there are usually some applications that “don’t like” passwords that start with numbers (this is why generating the 1st letter of the password is separated from the remaining 11 characters. And lastly, the “passwordstore” lookup is used with the “userpass=” parameter to pass and store the generated password into the passwordstore (and it also keeps the previous passwords). This part could use some improvement as in some cases different rules for the generated password complexity may be required. The password change outputs are recorded in “change_output” that’s checked in the last task.
          - name: generate and change the user passwords
            shell: |
               {{ item[0].env }} && \
               sqlplus -S / as sysdba \
               @ansible/change_password.sql \
               {{ item[1] }} \"{{ lookup('passwordstore',item[0].pwdstore + item[1] + ' create=true overwrite=true userpass=' +
                                         lookup('password','/dev/null chars=ascii_letters length=1') +
                                         lookup('password','/dev/null chars=ascii_letters,digits,hexdigits length=11')) }}\"
            register: change_output
            become_user: "{{ item[0].os_user }}"
            become: "{{ item[0].become_os_user }}"
            with_subelements:
              - "{{ host_db_users |json_query('[*].{env: env, os_user: os_user, users: users, pwdstore: pwdstore, become_os_user: become_os_user}') }}"
              - users
            failed_when: false
            changed_when: "'User altered.' in change_output.stdout"
        
      • Password change errors – The “change_output” data are verified here, and failed password changes are reported.
           # fail if the password change failed.
          - name: Password change errors
            fail: msg="{{ item }}"
            with_items: "{{ change_output.results|rejectattr('stdout','equalto','\nUser altered.')|map(attribute='stdout')|list }}"
        

    It really works!

    Now, when you know how it’s built – it’s time to show how it works!
    Please pay attention to the following:

    • The password store is empty at first
    • The whole password change playbook completes in 12 seconds
    • The tasks on both hosts are executed in parallel (see the order of execution feedback for each task)
    • The passwordstore contains the password entries after the playbook completes, and they can be retrieved by using the pass command
    $ pass
    Password Store
    
    $ time ansible-playbook -i inventory/hosts chpwd.yml
    
    PLAY [pasword change automation] *******************************************************
    
    TASK [Gathering Facts] *****************************************************************
    ok: [orcl1]
    ok: [orcl2]
    
    TASK [db_users : populate host_db_users] ***********************************************
    ok: [orcl1]
    ok: [orcl2]
    
    TASK [db_users : create directory for target on db hosts] ******************************
    changed: [orcl1] => (item={'become_os_user': True, 'os_user': u'oracle'})
    changed: [orcl2] => (item={'become_os_user': True, 'os_user': u'oracle'})
    
    TASK [db_users : copy sql scripts to db_hosts] *****************************************
    changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql'])
    changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql'])
    changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql'])
    changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql'])
    
    TASK [db_users : verify user existance] ************************************************
    ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys'))
    ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp'))
    ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system'))
    ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system'))
    ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys'))
    ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp'))
    
    TASK [db_users : User existance results] ***********************************************
    
    TASK [db_users : generate and change the user passwords] *******************************
    changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys'))
    changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp'))
    changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system'))
    changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system'))
    changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys'))
    changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST3/', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp'))
    
    TASK [db_users : Password change errors] ***********************************************
    
    PLAY RECAP *****************************************************************************
    orcl1                      : ok=6    changed=3    unreachable=0    failed=0
    orcl2                      : ok=6    changed=3    unreachable=0    failed=0
    
    real    0m12.418s
    user    0m8.590s
    sys     0m3.900s
    
    $ pass
    Password Store
    |-- orcl1
    |   |-- TEST1
    |       |-- dbsnmp
    |       |-- system
    |-- orcl2
        |-- TEST2
        |   |-- ctxsys
        |   |-- sys
        |   |-- system
        |-- TEST3
            |-- dbsnmp
    
    $ pass orcl1/TEST1/system
    HDecEbjc6xoO
    lookup_pass: First generated by ansible on 26/05/2017 14:28:50
    

    Conclusions

    For past 2 months I’ve been learning Ansible and trying it for various DBA tasks. It hasn’t always been a smooth ride, as I had to learn quite a lot, because I wasn’t exposed much to beasts like jinja2, json_query, YAML, python (very handy for troubleshooting) and Ansible itself before. I feel that my former PL/SQL coder’s experience had created some expectations from Ansible, that turned out not to be true. The biggest challenges to me were getting used to the linear execution of the playbook (while with PL/SQL I can call packages, functions, etc. to process the data “outside” the main linear code line), and the lack of execution feedback, because one has to learn creating Ansible tasks in a way that they either succeed or fail (no middle states like ‘this is a special case – process it differently’), as well as the amount of visual output is close to none – which does make sense to some degree, it’s “automation” after all, right? Nobody should be watching :)
    A separate struggle for me was working with the complex data structure that I created for storing the host/database/user information. It’s a mix of yaml “dictionary” and “list”, and it turned out to be difficult to process it in a way I wanted – this is why I used the json_query at times (although not in a very complex way in this case). There are probably simpler ways I didn’t know of (didn’t manage finding), and I’d be glad if you’d let me know of possible improvements or even other approaches to such tasks that you have worked on and implemented.
    Despite all the complaining above, I think it’s really worth investing time in automating tasks like this, it really works and once done it doesn’t require much attention. Happy Automating!

    Internals of querying the concurrent requests’ queue – revisited for R12.2

    $
    0
    0

    Once upon a time I wrote about the Internal Workflow of an E-Business Suite Concurrent Manager Process. Many things have changed since that blog post, the most obvious change being the release of Oracle e-Business Suite R12.2. I decided to check if the way the concurrent manager queues were processed by concurrent manager processes were still the same. My main goal was to see if the manager processes still don’t attempt any way of coordination to distribute the requests among them.

    This is how I did the testing:

    • I used the VM templates provided by Oracle to build my R12.2.4 test environment. By the way, I didn’t expect that the process of getting the environment up would be so simple! Downloading the media files from edelivery.oracle.com was the most time-consuming step, once done – it took me just 1 hour to un-compress everything, import the Virtual Assembly file and bring up the R12.2.4 environment on my laptop.
    • 3 Standard managers are defined by default
    • Sleep seconds were left as is = 30 seconds
    • Cache size was increased from 1 to 5.
    • Identified the 3 DB processes that belong to the Standard managers:
      select sid, serial# from v$session where module='e:FND:cp:STANDARD'
    • I enabled tracing with binds and waits for each of them like this:
      exec dbms_monitor.session_trace_enable(sid,serial#,true,true);
    • Once that was done I submitted one concurrent program – “Active users” and waited for it to complete.
    • I disabled the tracing and collected the trace files.
      exec dbms_monitor.session_trace_disable(sid,serial#);
    • Collected the trace files

    I found 2 of the trace files to be very interesting. To make things more simple, the manager process “A” will be the one that executed the concurrent request, and process “B” will be the one that didn’t.

    Before the “Active Users” Request Was Submitted

    No other requests were running at the time I did the testing, so I clearly observed how both Managers A and B queried the FND_CONCURRENT_REQUESTS table BOTH of the trace files displayed the same method of how requests are picked up from the queue. Note, I’m showing only the lines relevant to display the main query only, and I have formatted the query text to make it more readable:

    PARSING IN CURSOR #139643743645920 len=1149 dep=0 uid=100 oct=3 lid=100 tim=1460211399835915 hv=3722997734 ad=’d275f750′ sqlid=’cd23u4zfyhvz6′
    SELECT R.Rowid
    FROM Fnd_Concurrent_Requests R
    WHERE R.Hold_Flag                             = ‘N’
    AND R.Status_Code                             = ‘I’
    AND R.Requested_Start_Date                   &amp;amp;amp;amp;amp;lt;= Sysdate
    AND (R.Node_Name1                            IS NULL
    OR (R.Node_Name1                             IS NOT NULL
    AND FND_DCP.target_node_mgr_chk(R.request_id) = 1))
    AND (R.Edition_Name                          IS NULL
    OR R.Edition_Name                            &amp;amp;amp;amp;amp;lt;= sys_context(‘userenv’, ‘current_edition_name’))
    AND EXISTS
      (SELECT NULL
      FROM Fnd_Concurrent_Programs P
      WHERE P.Enabled_Flag         = ‘Y’
      AND R.Program_Application_Id = P.Application_Id
      AND R.Concurrent_Program_Id  = P.Concurrent_Program_Id
      AND EXISTS
        (SELECT NULL
        FROM Fnd_Oracle_Userid O
        WHERE R.Oracle_Id = O.Oracle_Id
        AND EXISTS
          (SELECT NULL
          FROM Fnd_Conflicts_Domain C
          WHERE P.Run_Alone_Flag = C.RunAlone_Flag
          AND R.CD_Id            = C.CD_Id
          )
        )
      AND (P.Execution_Method_Code                          != ‘S’
      OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757)))
      )
    AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628)))
    ORDER BY NVL(R.priority, 999999999),
      R.Priority_Request_ID,
      R.Request_ID
    END OF STMT
    EXEC #139643743645920:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399835910
    FETCH #139643743645920:c=0,e=546,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211399836507
    WAIT #139643743645920: nam=’SQL*Net message to client’ ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211399836572
    *** 2016-04-09 10:17:09.837
    WAIT #139643743645920: nam=’SQL*Net message from client’ ela= 30000367 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429836965
    …
    EXEC #139643743645920:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429838767
    FETCH #139643743645920:c=0,e=689,p=0,cr=106,cu=0,mis=0,r=0,dep=0,og=1,plh=3984653669,tim=1460211429839587
    WAIT #139643743645920: nam=’SQL*Net message to client’ ela= 4 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211429839652
    *** 2016-04-09 10:17:39.840
    WAIT #139643743645920: nam=’SQL*Net message from client’ ela= 30000325 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1460211459840003
    …
    

    It’s important to observe that:

    • All manager’s processes still compete for the same requests. If the query is executed at the same time, the same list of concurrent requests will be retrieved by all processes.
    • The constants literals used in lines 30-32 mean that the query for checking the queue is still built when the concurrent manager process starts up. These constants are mainly used to implement the specializations rules in the query.
    • Only rowid for the pending requests’ rows in FND_CONCURRENT_REQUESTS are fetched.
    • The sleep time is clearly visible on lines 41,42 and 48,49

    After the “Active Users” Request Was Submitted – Starting the Concurrent Request

    The manager process A was the first to pick up the submitted requests and it could be observed by the “r=1” (1 row fetched) in the FETCH call for the query we just reviewed:

    FETCH #139643743645920:c=0,e=437,p=0,cr=113,cu=0,mis=0,r=1,dep=0,og=1,plh=3984653669,tim=1460211519844640
    

    Immediately after this, the manager process A locked the row in FND_CONCURRENT_REQUESTS table, this way, the request got assigned to this process. Notice the similar where predicates used in this query, these are actually required to make sure that the request is still not picked up by another manager process. However the main thing here is the fact that the request row is accessed by the “rowid” retrieved earlier (row 45, the value of the bind variable “:reqname” is “AAAjnSAA/AAAyn1AAH” in this case). Locking of the row is done by the “FOR UPDATE OF R.status_code NoWait” clause on line 49:

    PARSING IN CURSOR #139643743640368 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519864113 hv=4239777398 ad=’cde86338′ sqlid=’6ya6bzgybbrmq’
    SELECT R.Conc_Login_Id,
      R.Request_Id,
      … excluded other 156 columns for brevity…
    FROM fnd_concurrent_requests R,
      fnd_concurrent_programs P,
      fnd_application A,
      fnd_user U,
      fnd_oracle_userid O,
      fnd_conflicts_domain C,
      fnd_concurrent_queues Q,
      fnd_application A2,
      fnd_executables E,
      fnd_conc_request_arguments X
    WHERE R.Status_code             = ‘I’
    AND (R.Edition_Name            IS NULL
    OR R.Edition_Name              &amp;amp;amp;amp;amp;lt;= sys_context(‘userenv’, ‘current_edition_name’))
    AND R.Request_ID                = X.Request_ID(+)
    AND R.Program_Application_Id    = P.Application_Id(+)
    AND R.Concurrent_Program_Id     = P.Concurrent_Program_Id(+)
    AND R.Program_Application_Id    = A.Application_Id(+)
    AND P.Executable_Application_Id = E.Application_Id(+)
    AND P.Executable_Id             = E.Executable_Id(+)
    AND P.Executable_Application_Id = A2.Application_Id(+)
    AND R.Requested_By              = U.User_Id(+)
    AND R.Cd_Id                     = C.Cd_Id(+)
    AND R.Oracle_Id                 = O.Oracle_Id(+)
    AND Q.Application_Id            = :q_applid
    AND Q.Concurrent_Queue_Id       = :queue_id
    AND (P.Enabled_Flag            IS NULL
    OR P.Enabled_Flag               = ‘Y’)
    AND R.Hold_Flag                 = ‘N’
    AND R.Requested_Start_Date     &amp;amp;amp;amp;amp;lt;= Sysdate
    AND ( R.Enforce_Seriality_Flag  = ‘N’
    OR ( C.RunAlone_Flag            = P.Run_Alone_Flag
    AND (P.Run_Alone_Flag           = ‘N’
    OR NOT EXISTS
      (SELECT NULL
      FROM Fnd_Concurrent_Requests Sr
      WHERE Sr.Status_Code         IN (‘R’, ‘T’)
      AND Sr.Enforce_Seriality_Flag = ‘Y’
      AND Sr.CD_id                  = C.CD_Id
      ))))
    AND Q.Running_Processes                                     &amp;amp;amp;amp;amp;lt;= Q.Max_Processes
    AND R.Rowid                                                  = :reqname
    AND ((P.Execution_Method_Code                               != ‘S’
    OR (R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID)       IN ((0,98),(0,100),(0,31721),(0,31722),(0,31757))))
    AND ((R.PROGRAM_APPLICATION_ID,R.CONCURRENT_PROGRAM_ID) NOT IN ((510,40032),(510,40033),(510,42156),(510,42157),(530,43793),(530,43794),(535,42626),(535,42627),(535,42628)))
    FOR UPDATE OF R.status_code NoWait
    

    The behavior of the manager process B was a little bit more interesting. It too managed to fetch the same rowid from FND_CONCURRENT_PROCESSES table belonging to the submitted “Active Users” processes. However, when it tried to lock the row in FND_CONCURRENT_REQUESTS (By using exactly the same query), this happened:

    PARSING IN CURSOR #139690311998256 len=4530 dep=0 uid=100 oct=3 lid=100 tim=1460211519900924 hv=4239777398 ad=’cde86338′ sqlid=’6ya6bzgybbrmq’
    …
    BINDS #139690311998256:
    …
    Bind#2
      oacdty=01 mxl=32(18) mxlc=00 mal=00 scl=00 pre=00
      oacflg=20 fl2=1000001 frm=01 csi=873 siz=0 off=64
      kxsbbbfp=7f0c2f713f20  bln=32  avl=18  flg=01
      value="AAAjnSAA/AAAyn1AAH"
    EXEC #139690311998256:c=1000,e=1525,p=0,cr=25,cu=1,mis=0,r=0,dep=0,og=1,plh=4044729389,tim=1460211519902727
    ERROR #139690311998256:err=54 tim=1460211519902750
    

    The query failed with “ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired”.
    This is how the access to pending concurrent requests is serialized to make sure only one of the manager processes can run it. And, I think, relying on the well-tuned and highly efficient locking mechanism of Oracle Database is a very very smart idea.

    Conclusions

    • The coordination between manager processes is still not happening to distribute the requests, but the managers all query the queue the same way and then compete between themselves to lock the requests’ entries on the table 1st. The process that gets the lock also gets to execute the concurrent request.
    • The cache size variable couldn’t be observed in the trace files, but as far as I remember from my previous research the process would only fetch “cache size”-number of rowids using the 1st query in this post. This could be tested by submitting larger volume of requests simultaneously.
    • The “sleep seconds” kicks in only when the manager process didn’t fetch any rowids from the queue. After all the cached requests are attempted/executed by the manager process, the queue is checked again immediately without waiting for the “sleep seconds” (Not explained in detail in this post, but it’s revealed in the trace files)
    • The DMLs used to query the FND_CONCURRENT_REQUESTS and to lock the row are very very similar to Pre-R12.2 releases of e-Business Suite (Another sign that the process hasn’t changed, though one change that I see is the addition of where clause predicates for Checking the Editions).

    The getMOSPatch V2 is here!

    $
    0
    0

    A while ago I created a Bash script called getMOSPatch.sh. It’s purpose was to allow downloading Oracle patches directly from My Oracle Support to your server without having to visit the support site (of course if you know the patch number already). Today, I announce a new version of the tool “getMOSPatch V2”!

    I received mostly very good feedback about the first version of the script, and it probably saved few hours I’d spent on MOS site downloading patches for myself too i.e. when I was working on a project where 900+ needed to be downloaded. But there were some issues too, like, sensitivity to different versions of the utilities the script uses (curl, wget, egrep), and let’s be honest, it worked only on Linux (even though one could have bash on other platforms too). So, I really wanted to make it less dependent on utilities’ versions and wanted it to be more portable, which lead me to understanding that there’s no future for the bash script. 3 Billion Devices “Run” Java, so I couldn’t do anything else but use java for the new version of the getMOSPatch. That was actually an easy decision, because 1) I needed portability, and now you can run the tool with good chances on any platform if you have JRE 1.6 or higher, and 2) JRE is normally bundled with most of Oracle’ software (at least software I work with). So, java it was, and that’s why the new name goes without “.sh” at the end.

    The new home for getMOSPatch is here on GitHub: https://github.com/MarisElsins/getMOSPatch.
    The parameters you can use for getMOSPatch are the same as before, but if you need you’ll find the usage instructions here.
    One thing that changed for the new tool is the way how it’s executed. After downloading the getMOSPatch.jar you’ll use “java -jar getMOSPatch.jar” for JRE 1.7 and later, or “java -Dhttps.protocols=TLSv1 -jar getMOSPatch.jar” for most versions of JRE 1.6 (because the support for TLSv1.1 or TLSv1.2 was not added yet).

    Let’s take a look at a typical use case of the utility. I’ll be using the JRE bundled in the 12c Oracle Home on Linux x86-64 to download the newest version of OPatch:

    [oracle@lab12c ~]$ wget "https://github.com/MarisElsins/getMOSPatch/raw/master/getMOSPatch.jar" -q
    [oracle@lab12c ~]$ . oraenv
    ORACLE_SID = [LAB12c] ? LAB12c
    The Oracle base remains unchanged with value /u01/app/oracle
    [oracle@lab12c ~]$ export PATH=$ORACLE_HOME/jdk/jre/bin:$PATH
    [oracle@lab12c ~]$ java -version
    java version "1.6.0_75"
    Java(TM) SE Runtime Environment (build 1.6.0_75-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 20.75-b01, mixed mode)
    [oracle@lab12c ~]$ java -Dhttps.protocols=TLSv1 -jar getMOSPatch.jar patch=6880880 MOSUser=elsins@pythian.com
    Enter your MOS password:
    Platforms and languages need to be reset.
    Obtaining the list of platforms and languages:
    …
    46P – Linux x86
    226P – Linux x86-64
    …
    Enter Comma separated platforms to list: 226P
    We’re going to download patches for the following Platforms/Languages:
     226P – Linux x86-64
    Processing patch 6880880 for Linux x86-64 and applying regexp .* to the filenames:
     1 – p6880880_112000_Linux-x86-64.zip
     2 – p6880880_121010_Linux-x86-64.zip
     3 – p6880880_132000_Generic.zip
     4 – p6880880_111000_Linux-x86-64.zip
     5 – p6880880_131000_Generic.zip
     6 – p6880880_101000_Linux-x86-64.zip
     7 – p6880880_102000_Linux-x86-64.zip
     Enter Comma separated files to download: 2
    Downloading all selected files:
     Downloading p6880880_121010_Linux-x86-64.zip: 120MB at average speed of 5671KB/s – DONE!
    

    And here’s another example of this tool running in Windows Powershell environmnet:

    getMOSPatch on Windows

    getMOSPatch on Windows

    If you spot any issues or have ideas for improvements, let me know by commenting this post, or submit the issues directly on GitHub here: https://github.com/MarisElsins/getMOSPatch/issues

    Investigating IO performance on Amazon RDS for Oracle

    $
    0
    0

    I’ve recently been involved in quite a few database migrations to Oracle RDS. One thing that I had noticed when dealing with post-migration performance issues was related to queries that used TABLE SCAN FULL in their execution. It seemed, that in many cases, it just took a single query to max out the allocated IOPS (IOs per second) or bandwidth, which in turn would caused overall slowness of the RDS instance.

    The search in documentation showed that it could have been caused by how IO operations are counted on Amazon RDS, as it’s quite different from what a routine Oracle DBA like me would expect. For multi-block reads the database (depending on storage) would typically issue IOs of size up to 1MB, so if an 8K block size was used the table scans would read up to 128 blocks in a single IO of db file scattered read or direct path read.

    Now, pay attention to what the AWS documentation says:
    While Provisioned IOPS (io1 storage) can work with I/O sizes up to 256 KB, most databases do not typically use such large I/O. An I/O request smaller than 32 KB is handled as one I/O; for example, 1000 16 KB I/O requests are treated the same as 1000 32 KB requests. I/O requests larger than 32 KB consume more than one I/O request; Provisioned IOPS consumption is a linear function of I/O request size above 32 KB. For example, a 48 KB I/O request consumes 1.5 I/O requests of storage capacity; a 64 KB I/O request consumes 2 I/O requests, etc. … Note that I/O size does not affect the IOPS values reported by the metrics, which are based solely on the number of I/Os over time. This means that it is possible to consume all of the IOPS provisioned with fewer I/Os than specified if the I/O sizes are larger than 32 KB. For example, a system provisioned for 5,000 IOPS can attain a maximum of 2,500 IOPS with 64 KB I/O or 1,250 IOPS with 128 KB IO.
    … and …
    I/O requests larger than 32 KB are treated as more than one I/O for the purposes of PIOPS capacity consumption. A 40 KB I/O request will consume 1.25 I/Os, a 48 KB request will consume 1.5 I/Os, a 64 KB request will consume 2 I/Os, and so on. The I/O request is not split into separate I/Os; all I/O requests are presented to the storage device unchanged. For example, if the database submits a 128 KB I/O request, it goes to the storage device as a single 128 KB I/O request, but it will consume the same amount of PIOPS capacity as four 32 KB I/O requests.

    Based on the statements above it looked like the large 1M IOs issued by the DB would be accounted as 32 separate IO operations, which would obviously exhaust the allocated IOPS much sooner than expected. The documentation talks only about Provisioned IOPS, but I think this would apply to General Purpose SSDs (gp2 storage) too, for which the IOPS baseline is 3 IOPS/GB (i.e. 300 IOPS if allocated size is 100GB of gp2).

    I decided to do some testing to find out how RDS for Oracle handles large IOs.

    The Testing

    For testing purposes I used the following code to create a 1G table (Thanks Connor McDonald and AskTom):

    ORCL> create table t(a number, b varchar2(100)) pctfree 99 pctused 1;
    Table T created.
    ORCL> insert into t  values (1,lpad('x',100));
    1 row inserted.
    ORCL> commit;
    Commit complete.
    ORCL> alter table t minimize records_per_block;
    Table T altered.
    ORCL> insert into t select rownum+1,lpad('x',100) from dual connect by level<131072; 131,071 rows inserted. ORCL> commit;
    Commit complete.
    ORCL> exec dbms_stats.gather_table_stats(user,'T');
    PL/ORCL procedure successfully completed.
    ORCL> select sum(bytes)/1024/1024 sizemb from user_segments where segment_name='T';
    SIZEMB
    1088
    ORCL> select value/1024/1024 buffer_cache from v$sga where name='Database Buffers';
    BUFFER_CACHE
    1184
    

    The code for testing will be:

    exec rdsadmin.rdsadmin_util.flush_buffer_cache;
    alter session set "_serial_direct_read"=always;
    alter session set db_file_multiblock_read_count=&1;
    -- Run FTS against T table forever.
    declare
      n number:=1;
    begin
      while n>0
      loop
        select /*+ full(t) */ count(*) into n from t;
      end loop;
    end;
    /
    

    Basically, I’ll flush the buffer cache, which will force the direct path reads by setting _serial_direct_read to “ALWAYS”, and then, will choose the db_file_multiblock_read_count based on how big IOs I want to issue (note, by default the db_file_multiblock_read_count is not set on RDS, and it resolves to 128, so the maximum size of an IO from the DB is 1 MB), I’ll test with different sizes of IOs, and will Capture the throughput and effective IOPS by using the “Enhanced Monitoring” of the RDS instance.

    Side-note: the testing I had to do turned out to be more complex than I had expected before I started. In few cases, I was limited by the instance throughput before I could reach the maximum allocated IOPS, and due to this, the main testing needed to be one on large enough instance (db.m4.4xlarge), that had more of the dedicated EBS-throughput.

    The Results

    Provisioned IOPS storage

    Testing was done on a db.m4.4xlarge instance that was allocated 100GB of io1 storage of 1000 Provisioned IOPS. The EBS-optimized throughput for such instance is 256000 KB/s.
    The tests were completed by using db_file_multiblock_read_count of 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 16, 32, 64 and 128.
    For each test the Throughput and IO/s were captured (from RDS CloudWatch graphs), and also the efficient IO size was derived.
    The DB instance was idle, but still, there could be few small IO happening during the test.

    Provisioned IOPS Measured Throughput

    Provisioned IOPS Measured Throughput

    Provisioned IOPS Measured IO/s

    Provisioned IOPS Measured IO/s

    From the graphs above the following features that are not documented can be observed:

    • The RDS instance is dynamically choosing the physical IO size (I’ll call them “physical“, just to differentiate that these are the IOs to storage, while in fact, that’s only what I see in the CLoudWatch graphs, the real physical IO could be something different) based on the size of the IO request from the database. The possible physical IO sizes appear to be 16K, 32K, 64K, 128K and probably also 8K (this could also be in fact 16K physical IO reading just 8K of data)
    • The IOPS limit applies only to smaller physical IOs sizes (up to 32K), for larger physical IOs (64K, 128K) the throughput is the limiting factor of the IO capability. The throughput limit appears to be quite close to the maximum throughput that the instance is capable of delivering, but at this point, it’s not clear how the throughput limit for particular situation is calculated.

    Throughput Limits for Provisioned IOPS

    I ran additional tests on differently sized instances with io1 storage to understand better how the maximum throughput was determined. The graph below represents the throughput achieved on different instances, but all had the same 100G of 1000 PIOPS io1 storage. The throughput was done by using db_file_multiblock_read_count=128:

    PIOPS Throughput by Instance Type

    PIOPS Throughput by Instance Type

    it appears that the maximum throughput is indeed limited by the instance type, except for the very largest instance db.m4.10xlarge (For this instance the situation is somewhat weird even in the documentation because the maximum throughput is mentioned as 500 MB/s, but the maximum throughput for a single io1 EBS volume, which should be there underneath the RDS, is just 320 MB/s, and I was unable to reach any of these limits)

    General Purpose SSD storage

    Testing was done on a db.m4.4xlarge instance that was allocated 100GB of gp2 storage with 300 IOPS baseline. The EBS-optimized throughput for such instance is 256000 KB/s.
    The tests were conducted similarly to how they were done for Provisioned IOPS above (note, this is the baseline performance, not burst performance)

    General Purpose SSD Measured Throughput

    General Purpose SSD Measured Throughput

    General Purpose SSD Measured IO/s

    General Purpose SSD Measured IO/s

    Similarly to Provisioned IOPS, the General Purpose SSD storage behaves differently from what’s explained in the documentation:

    • The physical IO size again is calculated dynamically based on the size of the IO request from the database. The possible sizes appear to be the same as for io1: (8K), 16K, 32K, 64K and 128K.
    • The IOPS limit (to baseline level) appears to apply to IO sizes only up to 16K (compared to 32K in case of Provisioned IOPS), for larger physical IOs starting from 32K, the limit appears to be throughput-driven.
    • It’s not clear how the throughput limit is determined for the particular instance/storage combination, but in this case, it appeared to be around 30% of the maximum throughput for the instance, however, I didn’t confirm the same ratio for db.m4.large where the maximum achievable throughput depended on the allocated size of the gp2 storage.

    Burst Performance

    I haven’t collected enough data to derive anything concrete, but during my testing I observed that Burst performance applied to both maximum IOPS and also the maximum throughput. For example, while testing on db.m4.large (max instance throughput of 57600 KB/s) with 30G of 90 IOPS baseline performance, I saw that for small physical IOs it allowed bursting up to 3059 IOPS for short periods of time, while normally it indeed allowed only 300 IOPS. For larger IOs (32K+), the baseline maximum throughput was around 24500 KB/s, but the burst throughput was 55000 KB/s

    Throughput Limits for General Purpose SSD storage

    I don’t really know how the maximum allowed throughput is calculated for different instance type and storage configuration for gp2 instances, but one thing is clear: that instance size, and size of the allocated gp2 storage are considered in determining the maximum throughput. I was able to achieve the following throughput measures when gp2 storage was used:

    • 75144 KB/s (133776 KB/s burst) on db.m4.4xlarge (100G gp2)
    • 54500 KB/s (same as burst, this is close to the instance limit) on db.m4.large (100G gp2)
    • 24537 KB/s (54872 KB/s burst) on db.m4.large (30G gp2)
    • 29116 KB/s (burst was not measured) on db.m4.large (40G gp2)
    • 37291 KB/s (burst was not measured) on db.m4.large (50G gp2)

    Conclusions

    The testing provided some insight into how the maximum performance of IO is determined on Amazon RDS for Oracle, based on the instance type, storage type, and volume size. Despite finding some clues I also understood that managing IO performance on RDS is far more difficult than expected for mixed size IO requests that are typically issued by Oracle databases. There are many questions that still need to be answered (i.e. how the maximum throughput is calculated for gp2 storage instances) and it’d take many many hours to find all the answers.

    On the other-hand, the testing already revealed a few valuable findings:

    1. Opposite to the documentation that states that all IOs are measured and accounted in 32KB units, we found that IO units reported by Amazon can be of sizes 8K (probably), 16K, 32K, 64K and 128K
    2. For small physical IOs (up to 32K in case of Provisioned IOPS and up to 16K in case of General Purpose SSD) the allocated IOPS is used as the limit for the max performance.
    3. For larger physical IOs (from 64K in case of Provisioned IOPS and from 32K in case of General Purpose SSD) the throughput is used as the limit for the max performance, and the IOPS limit no longer applies.
    4. The Burst performance applies to both IOPS and throughput

    P.S. As to my original issue of a single TABLE SCAN FULL severely impacting the overall performance, I found that in many cases we were using small RDS instances db.m3.large or db.m4.large, for which the maximum throughput was ridiculously small, and we were hitting the throughput limitation, not the IOPS limit that actually didn’t apply to the larger physical IOs on gp2 storage.

    Investigating IO performance on Amazon EC2

    $
    0
    0

    I published a blog post called “Investigating IO Performance on Amazon RDS for Oracle” recently, and soon after posting it I received several questions asking if IO worked the same way on EC2 instances. My immediate though was it did, mostly because RDS for Oracle is basically an EC2 instance with Oracle Database on top of it, where the configuration is fully managed by Amazon. But as always, it’s better to test than assume, so here we go!

    Although the testing was done by running a workload in an Oracle database, the results will apply to any other type of workload because the performance characteristics purely depend on the type of instance, type of EBS volume and the size of the IO requests, and it doesn’t matter how the data is processed after it’s retrieved from the storage.

    The Testing

    The testing was done exactly the same way as described in the previous blog post, the only difference was that I had to create an oracle database manually by myself. I used the 12.1.0.2.0 database Enterprise Edition and ASM, and the EBS volume was used as an ASM disk.

    Measuring the IO performance

    On RDS we had the nice Enhanced Monitoring which I set up with a refresh interval of a few seconds and I used it to collect performance statistics quickly. For EC2 (specifically for EBS volumes), there is no such thing as enhanced monitoring, so I needed to use the standard CloudWatch monitoring with the minimum refresh rate of 5 minutes (very inconvenient, because a single test case would have to be run for 5 to 10 minutes to collect reliable data). This was not acceptable, so I looked for alternatives and found that iostat displayed the same values the monitoring graphs did:

    [root@ip-172-31-21-241 ~]# iostat 5 500
    ...
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.32    0.00    0.11    0.11    0.43   99.04
    
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.00         0.00         0.00          0          0
    xvdf              0.40         0.00         8.00          0         40
    xvdg           3060.00     24496.00        14.40     122480         72
    ...

    the “tps” showed IO per second, and “kB_read/s”+”kB_wrtn/s” allowed me to calculate the throughput (I actually ended up using just the kB_read/s as my workload was 100% read only and the values in kB_wrtn/s were tiny).
    iostat is even more convenient to use than the enhanced monitoring, it didn’t take long to see the first benefit of EC2 over RDS!

    The Results

    It was no surprise, the outcome of testing on EC2 was quite similar to the results from testing on RDS.

    Provisioned IOPS Storage on an EC2 Instance

    As on RDS, the testing was done on db.m4.4xlarge with 100G of io1 with 1000 provisioned IO/s. Also the results are very very similar, the only notable difference that I could observe (although I can’t explain it, and I’m not sure there is a pattern in it or not, as I did’t do too many tests), was the fact that the throughput for 64K-96K IOs didn’t reach the same level as 128K+ IOs.

    Provisioned IOPS Throughput (on EC2)

    Provisioned IOPS Measured IO/s (on EC2)

    These results confirm that (the same as with RDS), there are several sizes of physical IOs: (8), 16, 32, 64 and 128, and starting with 64K, the performance is throughput-bound, but with IOs of smaller size, it’s IOPS-bound.

    General Purpose Storage on an EC2 Instance

    The testing with General Purpose SSDs (100G with 300 baseline IOPS) didn’t provide any surprises and the results were exactly the same as for RDS.
    The only difference in the graphs is the “bust performance” measures for IOs of different sizes that I’ve added to outline how the “bursting” improves both IO/s and Throughput.

    General Purpose SSD Throughput (on EC2)

    General Purpose SSD Measured IO/s (on EC2)

    These results also confirm that (the same as with RDS), there are several sizes of physical IOs: 16, 32, 64 and 128, and starting with 32K, the performance is throughput-bound, but with IOs of smaller size, it’s IOPS-bound.

    Additional Flexibility with EC2

    Using Multiple gp2 Volumes

    Opposite to RDS, I can configure my storage and instance more freely, so instead of having just a single gp2 volume attached to it I added five 1G-sized (yes tiny) volumes to the +DATA disk group. the minimum IOPS for a gp2 volume is 100, so my 5 volumes gave cumulative 500 baseline IOPS. As ASM was used, the IOs were +/- evenly distributed between the volumes.

    I didn’t do too thorough testing, but still I noticed a few things.
    Take a look at these iostat outputs from testing done with 8K reads (this is burst performance):

    [root@ip-172-31-21-241 ~]# iostat 5 500
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.00         0.00         0.00          0          0
    xvdf              3.40         0.00        16.80          0         84
    xvdg           1203.40      9632.00         1.60      48160          8
    xvdi           1199.60      9596.80         0.00      47984          0
    xvdj           1211.60      9691.20         3.20      48456         16
    xvdk           1208.60      9670.40         0.00      48352          0
    xvdl           1203.00      9625.60         3.20      48128         16

     

    • Bursting performance applies to each volume separately. It should allow getting up to 3000 IOPS per volume, but I reached only ~1200 per volume with cumulative throughput of 48214 KB/s (not even close to the limit). So there’s some other limit or threshold that applies to this configuration (and it’s not the CPU). But look! I’ve got 6024 IO/s burst performance, which is quite remarkable for just 5G.
    • As I was not hitting the maximum 3000 bursting IOPS per volume, the burst credit was running out much slower. if it lasts normally ~40 minutes at 3000 IOPS, it lasts ~3 times longer at ~1200 IOPS, which would allow running at better performance longer (i.e if one used 5x2G volumes instead of 1x10G volume)

    This iostat output is from testing done with 1M reads (this is burst performance):

    [root@ip-172-31-21-241 ~]# iostat 5 500
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvda              0.00         0.00         0.00          0          0
    xvdf              3.40         0.00        16.80          0         84
    xvdg            384.40     48820.80         0.80     244104          4
    xvdi            385.80     49155.20         0.00     245776          0
    xvdj            385.00     49014.40         6.40     245072         32
    xvdk            386.80     49225.60         0.00     246128          0
    xvdl            385.00     48897.60         6.40     244488         32

     

    • The cumulative throughput is 245111 KB/s, which is very close to the throughput limit of the instance. I wasn’t able to reach such throughput on a single volume of gp2, where the maximum I observed was just 133824 KB/s, and 163840 KB/s is a throughput limit for a single gp2 volume which was bypassed too. It appears that configuring multiple volumes allows reaching the instance throughput limit that was not possible with a single volume.

    I didn’t run any non-burst tests as it required too much time (2 hours of waiting to exhaust the burst credits).

    Database with a 32K Block Size

    We have observed that starting with 32K block reads the EBS volume become’s throughput-bound, not IOPS-bound. Obviously I wanted to see how it performed if the database was created with a 32K block size.
    I ran a few very simple tests using 1 data block sized IOs (32K) on these two configurations:

    1. db.m4.4xlarge with 100G / 1000 PIOPS (io1)
    2. db.m4.4xlarge with 20G / 100 IOPS (gp2)

    There were no surprises on the Provisioned IOPS storage and I got the 1000 IOPS that were provisioned (actually it was slightly better – 1020 IO/s), and the throughput was 32576.00 KB/s
    On General Purpose SSD, the story was different – we know that starting from 32K-sized IOs, the performance becomes throughput-bound, and it was confirmed here too:

    • During burst period I measured up to 4180 IO/s at 133779 KB/s, which was 4 times faster than Provisioned SSD.
    • During non-burst period I measured up to 764 IOs at 24748 KBs/s throughput. Which is somewhat slower than Provisioned SSD. Also 24748 KBs/s, was slower than the throughput I measured on a 100G gp2 volume (we already ow that the non-burst throughput limit for gp2 depends on the size of the disk). If I used a 100G gp2 volume, I’d get 2359 IO/s at 75433 KB/s (this is from the graph above), which is also better that what one can get from a Provisioned SSD volume, and costs less.

    Conclusions

    Most of the conclusions were already outlined in the previous blog post, and they also apply to the EC2 instances when a single EBS volume is used for storage.

    On the other side, the EC2 instance allows System administrators and DBAs (or should I say “Cloud Administrator”) to work around some of the limitations by changing the “variables” that can’t be altered on RDS – like, the block size of the database (which is 8K on RDS), and the number of EBS volumes behind the RDS configuration. Using a 32K block size for a database residing on General Purpose volume allows bypassing the IOPS limitation completely, and only throughput limits stay in effect. However, if 32K block size is not an option (as for Oracle e-Business Suite), then the IOPS and throughput can still be maximized by using a configuration of multiple GP2 volumes.

    After having all these tests done, I think the only reason for using RDS instead of EC2 is the database management that is provided by Amazon. If that is something very critical for your requirements, it’s the way to go. If it’s not something you require – the EC2 can be configured to perform better for the same price, but you need to think about it’s maintenance by yourself.

    Viewing all 26 articles
    Browse latest View live