2009/3/25
RAC节点报GES: Potential blocker (pid=373) on resource TX-0x370008-0x147
环境:10.2.0.1 RAC
RHEL 5.2
用户发现系统性能下降厉害,检查日志发现:
节点2报错:
| Tue Mar 24 17:16:44 2009 GES: Potential blocker (pid=373) on resource TX-0x370008-0x147; enqueue info in file /u01/app/oracle/admin/jldb/udump/jldb2_ora_24975.trc and D IAG trace file Tue Mar 24 17:20:41 2009 GES: Potential blocker (pid=7664) on resource TX-0x490011-0x2a; enqueue info in file /u01/app/oracle/admin/jldb/bdump/jldb2_lmd0_29076.trc and DIAG trace file |
trc文件中内容如下
| *** 2009-03-23 21:24:58.021 user session for deadlock lock 0x1807f86f0 pid=217 serial=431 audsid=89970 user: 64/JLDBCS O/S info: user: Administrator, term: unknown, ospid: , machine: APP3 program: JDBC Thin Client application name: JDBC Thin Client, hash value=0 Current SQL Statement: update multisalvationobject mso set mso_order = :"SYS_B_0", mso_sign = :"SYS_B_1", mso_time = sysdate, mso_emp = :"SYS_B_2", mso_result = :"SYS_B_3", mso.mso_money =(select o.tempmoney from opttemp o where o.sign = :"SYS_B_4" and mso.mso_ot_id=o.tempid) where exists (select tempid from opttemp o where o.sign = :"SYS_B_5" and so.mso_ot_id=tempid ) and mso.sb_id = :"SYS_B_6" and mso.st_id= :"SYS_B_7" and mso.mso_objecttype = :"SYS_B_8" and mso.mso_flag = :"SYS_B_9" Global blockers dump start:--------------------------------- DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x620020][0x15],[TX] ----------resource 0x0x16cb8cdc8---------------------- resname : [0x620020][0x15],[TX] Local node : 1 dir_node : 1 master_node : 1 hv idx : 9 hv last r.inc : 6 current inc : 12 hv status : 0 hv master : 0 open options : dd grant_bits : KJUSERNL KJUSEREX grant mode : KJUSERNL KJUSERCR KJUSERCW KJUSERPR KJUSERPW KJUSEREX count : 1 0 0 0 0 1 val_state : KJUSERVS_NOVALUE valblk : 0x00000000000000000000000000000000 . access_node : 1 vbreq_state : 0 state : x0 resp : 0x16cb8cdc8 On Scan_q? : N Total accesses: 77 Imm. accesses: 58 Granted_locks : 1 Cvting_locks : 1 value_block: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 GRANTED_Q : lp 0x1807f85a0 gl KJUSEREX rp 0x16cb8cdc8 [0x620020][0x15],[TX] master 1 gl owner 0x18ada82e0 possible pid 12343 xid DB000-0002-00000008 bast 0 rseq 1 mseq 0 history 0x14951495 open opt KJUSERDEADLOCK CONVERT_Q: lp 0x1807f86f0 gl KJUSERNL rl KJUSEREX rp 0x16cb8cdc8 [0x620020][0x15],[TX] master 1 gl owner 0x183db6e80 possible pid 11575 xid D9000-0002-0000002F bast 0 rseq 1 mseq 0 history 0x1495149a convert opt KJUSERGETVALUE ----------enqueue 0x0x1807f85a0------------------------ lock version : 309 Owner node : 1 grant_level : KJUSEREX req_level : KJUSEREX bast_level : KJUSERNL notify_func : (nil) resp : 0x16cb8cdc8 procp : 0x1801090e8 ................. kjdrchkdrm: found an RM request in the request queue |
同时,另外一个节点报:
| Tue Mar 24 17:26:41 2009 Global Enqueue Services Deadlock detected. More info in file /u01/app/oracle/admin/jldb/bdump/jldb1_lmd0_11622.trc. Tue Mar 24 17:28:04 2009 Global Enqueue Services Deadlock detected. More info in file |
trace文件内容:
| *** 2009-03-23 11:23:17.051 user session for deadlock lock 0x17f6c90a8 pid=362 serial=3 audsid=86656 user: 64/JLDBCS O/S info: user: Administrator, term: unknown, ospid: , machine: APP3 program: JDBC Thin Client application name: JDBC Thin Client, hash value=0 Current SQL Statement: delete multisalvationobject ss where ss.mso_order= :"SYS_B_0" and exists (sel ect :"SYS_B_1" from family f where f.f_familyid = ss.mso_ot_id and f.on_no like :"SYS_B_2" ) ENQUEUE DUMP REQUEST: from 1.22126 on [0x2f0029][0x10a],[TX] for reason 3 mtype 0 DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x2f0029][0x10a],[TX] ----------resource 0x0x16dab1890---------------------- resname : [0x2f0029][0x10a],[TX] Local node : 0 dir_node : 1 master_node : 1 hv idx : 7 hv last r.inc : 6 current inc : 8 hv status : 0 hv master : 0 open options : dd Held mode : KJUSERNL Cvt mode : KJUSEREX Next Cvt mode : KJUSERNL msg_seq : 0x1 res_seq : 3 grant_bits : KJUSERNL grant mode : KJUSERNL KJUSERCR KJUSERCW KJUSERPR KJUSERPW KJUSEREX count : 1 0 0 0 0 0 val_state : KJUSERVS_NOVALUE valblk : 0x00000000000000000000000000000000 . access_node : 1 vbreq_state : 0 state : x8 resp : 0x16dab1890 On Scan_q? : N Total accesses: 73 Imm. accesses: 62 Granted_locks : 0 Cvting_locks : 1 value_block: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 GRANTED_Q : CONVERT_Q: lp 0x17f6c90a8 gl KJUSERNL rl KJUSEREX rp 0x16dab1890 [0x2f0029][0x10a],[TX] master 1 gl owner 0x18ad7ab00 possible pid 32590 xid 16A000-0001-00000002 bast 0 rseq 3 mseq 0 history 0x1495149a convert opt KJUSERGETVALUE ----------enqueue 0x0x17f6c90a8------------------------ lock version : 3709 Owner node : 0 grant_level : KJUSERNL req_level : KJUSEREX bast_level : KJUSERNL notify_func : (nil) group lock owner : 0x18ad7ab00 possible pid : 32590 xid : 16A000-0001-00000002 dd_time : 74.0 secs dd_count : 3 timeout : 0.0 secs On_timer_q? : N On_dd_q? : Y lock_state : OPENING CONVERTING Open Options : KJUSERDEADLOCK Convert options : KJUSERGETVALUE History : 0x1495149a Msg_Seq : 0x0 res_seq : 3 valblk : 0x00000000000000000000000000000000 . DUMP LOCAL BLOCKER: initiate state dump for possible owner[362.32590] Submitting asynchronized dump request [28] Global blockers dump start:--------------------------------- DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x2f0029][0x10a],[TX] ----------resource 0x0x16dab1890---------------------- resname : [0x2f0029][0x10a],[TX] Local node : 0 dir_node : 1 master_node : 1 hv idx : 7 hv last r.inc : 6 current inc : 8 hv status : 0 hv master : 0 open options : dd Held mode : KJUSERNL Cvt mode : KJUSEREX Next Cvt mode : KJUSERNL msg_seq : 0x1 res_seq : 3 grant_bits : KJUSERNL grant mode : KJUSERNL KJUSERCR KJUSERCW KJUSERPR KJUSERPW KJUSEREX count : 1 0 0 0 0 0 val_state : KJUSERVS_NOVALUE valblk : 0x00000000000000000000000000000000 . access_node : 1 vbreq_state : 0 state : x8 resp : 0x16dab1890 On Scan_q? : N Total accesses: 75 Imm. accesses: 63 Granted_locks : 0 Cvting_locks : 1 value_block: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 GRANTED_Q : CONVERT_Q: lp 0x17f6c90a8 gl KJUSERNL rl KJUSEREX rp 0x16dab1890 [0x2f0029][0x10a],[TX] master 1 gl owner 0x18ad7ab00 possible pid 32590 xid 16A000-0001-00000002 bast 0 rseq 3 mseq 0 history 0x1495149a convert opt KJUSERGETVALUE ----------enqueue 0x0x17f6c90a8------------------------ |
metalink上查询了trace文件中很多关键字,出来都是为解决的bug信息。
因为只有这个模块会报错,所以考虑调整这个模块的性能,希望能减少死锁的发生。昨天做了如下工作:
| 1. | 删除opttemp表内遗留的错误数据(80多万条),并在sign列上加索引 |
| 2. | 在mso表上增加索引2个 |
完成以上工作,update 语句的执行计划不错,希望今天不会再次出现该问题.