sky 的个人资料DBA日记照片日志列表更多 工具 帮助
2009/6/18

RAC节点一挂起,日志报pmon failed to accuire latch

 

hpux+10.2.0.4RAC

数据库挂起

alert日志如下:

Thu Jun 18 03:28:17 2009
Thread 2 advanced to log sequence 3758 (LGWR switch)
  Current log# 5 seq# 3758 mem# 0: /dev/vg6_dss_data03/rredo212
  Current log# 5 seq# 3758 mem# 1: /dev/vg6_dss_data04/rredo222
Thu Jun 18 09:19:25 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:20:27 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:21:30 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:22:59 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:24:01 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:25:03 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:26:06 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:27:09 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:28:12 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 09:36:39 2009
Errors in file /oracle/app/oracle/admin/jldw/bdump/jldw2_q000_5811.trc:
ORA-00018: 超出最大会话数
Thu Jun 18 09:37:01 2009
Errors in file /oracle/app/oracle/admin/jldw/bdump/jldw2_q000_7101.trc:
ORA-00018: 超出最大会话数
………………………………………………………
………………………………………………………
PMON failed to acquire latch, see PMON dump
Thu Jun 18 10:26:44 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 10:26:49 2009
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=131
System State dumped to trace file /oracle/app/oracle/admin/jldw/bdump/jldw2_q002_4466.trc
Thu Jun 18 10:28:29 2009
PMON failed to acquire latch, see PMON dump
Thu Jun 18 10:30:04 2009
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=885
System State dumped to trace file /oracle/app/oracle/admin/jldw/udump/jldw2_ora_10341.trc
Thu Jun 18 10:30:10 2009
PMON failed to acquire latch, see PMON dump

 

/oracle/app/oracle/admin/jldw/bdump/jldw2_q000_5811.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/10.2.0/db10g
System name:    HP-UX
Node name:      jldss2
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: jldw2
Redo thread mounted by this instance: 2
Oracle process number: 20
Unix process pid: 5811, image: oracle@jldss2 (q000)

*** 2009-06-18 09:36:39.162
*** SERVICE NAME:(SYS$BACKGROUND) 2009-06-18 09:36:39.161
*** SESSION ID:(1628.3) 2009-06-18 09:36:39.161
KSV 18 error in slave process
*** 2009-06-18 09:36:39.162
ORA-00018: 超出最大会话数
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process
ORA-00018: maximum number of sessions exceeded
~

/oracle/app/oracle/admin/jldw/udump/jldw2_ora_10341.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/10.2.0/db10g
System name:    HP-UX
Node name:      jldss2
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: jldw2
Redo thread mounted by this instance: 2
Oracle process number: 885
Unix process pid: 10341, image: oracle@jldss2

*** 2009-06-18 10:30:04.901
*** SERVICE NAME:(SYS$USERS) 2009-06-18 10:30:04.900
*** SESSION ID:(1628.199) 2009-06-18 10:30:04.900
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
row cache enqueue: session: c00000023ee7f110, mode: N, request: X
===================================================
SYSTEM STATE
------------
System global information:
     processes: base c00000023e864db0, size 1500, cleanup c00000023e8c77f0
     allocation: free sessions c00000023ba51890, free calls 0000000000000000

………………………………………………………
………………………………………………………

AST_Q:
PENDING_Q:
GRANTED_Q:
----------------------------------------
SO: c00000023ef54ec0, type: 3, owner: c00000023b8d6b70, flag: INIT/-/-/0x00
(call) sess: cur c00000023ee7dba8, rec c00000023be95418, usr c00000023ee7dba8; depth: 0
   ----------------------------------------
   SO: c00000023ef58638, type: 3, owner: c00000023ef54ec0, flag: INIT/-/-/0x00
   (call) sess: cur c00000023be95418, rec c00000023be95418, usr c00000023ee7dba8; depth: 1
   ----------------------------------------
   SO: c00000023be95418, type: 4, owner: c00000023ef54ec0, flag: INIT/-/-/0x00
   (session) sid: 1633 trans: 0000000000000000, creator: 0000000000000000, flag: (2) -/REC -/-/-/-/-/-
             DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
             txn branch: 0000000000000000
             oct: 0, prv: 0, sql: c00000023d7e47a0, psql: c00000023d7e47a0, user: 0/SYS
   temporary object counter: 0
   ----------------------------------------
   SO: c0000001585b1ec8, type: 50, owner: c00000023ef54ec0, flag: INIT/-/-/0x00
   row cache enqueue: count=1 session=c00000023ee7dba8 object=c00000023cd495a8, mode=S
   savepoint=0x5
   row cache parent object: address=c00000023cd495a8 cid=7(dc_users)
   hash=b0e2d11f typ=11 transaction=0000000000000000 flags=00000002
   own=c00000023cd49678[c000000158800c30,c00000022e71a820] wat=c00000023cd49688[c00000023cd49688,c00000023cd49688] mode=S
   status=VALID/-/-/-/-/-/-/-/-
   request=N release=FALSE flags=0
   instance lock id=QH a2279543 5fd5248a
   set=0, complete=FALSE
   set=1, complete=FALSE
   set=2, complete=FALSE
   data=
   00000042 00044457 5f560000 00000000 00000000 00000000 00000000 00000000
   00000000 00103732 32443042 33323241 41453231 35330000 00000000 00000000
   00000000 00164445 4641554c 545f434f 4e53554d 45525f47 524f5550 00000000
   00000000 0000000a 00000003 00000000 00000101 786d0401 0d181e00 00000000
   00000000 00000000 00000000 00000000 00000000 00000000
----------------------------------------
SO: c00000022e5c2820, type: 16, owner: c00000023b8d6b70, flag: INIT/-/-/0x00
(osp req holder)

虽然bug的版本差点,不过看现象差不多。重启节点2后正常。

Subject:
Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "

Doc ID:
278316.1
Type:
BULLETIN

Modified Date :
31-JAN-2008
Status:
PUBLISHED

*************************************************************
This article is being delivered in Draft form and may contain
errors.  Please use the MetaLink "Feedback" button to advise
Oracle of any issues related to this article.
*************************************************************

PURPOSE

Suggest potential reasons behind "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "

SCOPE & APPLICATION

DBAs and Support Engineers

Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "

When Row cache contention occurs, if the enqueue cannot be gotten within a certain time period,
a trace file will be generated in the <Parameter:user_dump_dest> location with some trace details.
The trace file tends to contain the words:
  >> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<
in it at the start.
This trace can provide some useful information for diagnosing the cause of the contention.
It is possible to check the row cache enqueue trace to determine which enqueue has the contention: For example:
...
>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<
row cache enqueue: session: 70000001b542d48, mode: N, request: S
  row cache parent object: address=700000036f27628 cid=0(dc_tablespaces)
  hash=a6840aa5 typ=9 transaction=0 flags=00008000
...

The trace will often contain a systemstate dump, although most useful information is in the header section. Typically a session holding the row cache resource will either be on cpu or blocked by another session. If it is on cpu then errorstacks are likely to be required to diagnose, unless tuning can be done to reduce the enqueue hold time. Remember that on a multi node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.

For each enqueue type, there are a limited number of operations that require each enqueue.
DC_TABLESPACES
Probably the most likely is allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
DC_SEQUENCES
Check for appropriate caching of sequences for the application requirements.
DC_USERS
Deadlock and resulting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
This is Bug 2615271, fixed in 9.2.0.6
DC_OBJECTS
Look for any object compilation activity which might require an exclusive lock and thus block online activity.
DC_SEGMENTS
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.

RELATED DOCUMENTS