Author Message
vinothrajendran2
Joined: Nov 12, 2013
Messages: 101
Offline
Hi,

We have OD 7 application deployed in Websphere 8.5

WebSphere server went into hung state on 26/03/2017.

Websphere team analysed the dump and here is their findings.

I have attached the aesconnector logs. Please advise.


"From Javacore, I see a possible thread leaks in "DistributeCSTAEvent" and "Timer-xxxxxxx" thread pools.
747 threads of "DistributeCSTAEvent" threads were found and these all named same.
758 of threads of type "Timer-xxxxxxx" were found and these mostly blocked on log4j code.
Native memory is associated with each thread and the more the number of the threads the more native memory and if there is a thread pool leak, eventually the available native memory is exhuasted and native OOM is thrown.
Please find complete analysis below.

nvironment:

WebSphere version: BASE 8.5.5.11
Operating system version: Windows Server 2008 R2 6.1 build 7601 Service Pack 1 Java version: JRE 1.7.0 Windows Server 2008 R2 amd64-64 build
(pwa6470sr8fp10-20141219_01(SR8 FP10) )

From Javacore, found a native OOM issue due to large number of application threads and logging code.

Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError"
"Failed to allocate JNIEnv" received
Date:2017/03/26 at 17:22:18
Javacore filename:D:\Program Files (x86)\IBM\WebSphere\AppServer
\profiles\AppSrv01\javacore.20170326.172105.6160.0003.txt
OS Level: Windows Server 2008 R2 6.1 build 7601 Service Pack 1 JRE 1.7.0 Windows Server 2008 R2 amd64-64 build
(pwa6470sr8fp10-20141219_01(SR8 FP10) )

2CIUSERARG -verbose:gc
2CIUSERARG -Xms512m
2CIUSERARG -Xmx1536m
2CIUSERARG -Xcompressedrefs
22CIUSERARG -javaagent:D:\Quest_Software
\Foglight_Agent_Manager\agents\JavaEE\5.9.8-20160405-1524
\lib\performasure-agent.jar

-Threads: 1,392,605,664 bytes / 24299 allocations
+--Trace: 40,289,568 bytes / 13710 allocations

Total memory: 1318912000 <-- 1257MB
Total memory in use: 450925576 <-- 430MB
Total memory free: 867986424 <-- 827MB

Current thread
"Thread-83" J9VMThread:0x000000006BEA2E00, j9thread_t:0x000000007796BA80, java/lang/Thread:0x000000000B1D33F0,
state:R, prio=5 (java/lang/Thread getId:0x9B, isDaemon:true) (native thread ID:0x1D1C, native priority:0x5, native policy:UNKNOWN,
vmstate:R, vm thread flags:0x00000020) CPU usage total: 179.026747600 secs, user: 144.690927500 secs,
system: 34.335820100 secs Heap bytes allocated since last GC cycle=0 (0x0) Java callstack:
at java/lang/Thread.startImpl(Native Method) at java/lang/Thread.start(Thread.java:964(Compiled Code)) (entered lock: java/lang/Thread$ThreadLock@0x0000000049A05308,
entry count: 1) (entered lock: com/avaya/sce/cticonnector/servlet/server
/CTIConnectorManager$startObserver@0x0000000049803158, entry count: 1) at com/avaya/sce/cticonnector/servlet/server/CTIConnectorManager.
prepareObservers(CTIConnectorManager.java:1793(Compiled Code)) at com/avaya/sce/cticonnector/servlet/server/CTIConnectorManager.
configureFromApplicationConfig(CTIConnectorManager.java:1846)
at com/avaya/sce/cticonnector/servlet/server/CTIConnectorManager.
initProvider(CTIConnectorManager.java:2158)
at com/avaya/sce/cticonnector/servlet/server/CTIConnectorManager$
ProviderMonitor.run(CTIConnectorManager.java:1499)
747 threads of type "DistributeCSTAEvent"
758 of threads of type "Timer-xxxxxxx"

com/ibm/ejs/ras/SystemOutStream@0x0000000002301418:
owner "Thread-13477552" (J9VMThread:0x000000007528B700),
entry count 1 Waiting to enter:
"WebContainer : 39" (J9VMThread:0x000000006DC8CE00)

java/util/Hashtable@0x00000000496927E8: owner "Thread-13477552"
(J9VMThread:0x000000007528B700), entry count 1 Waiting to enter:
"Finalizer thread" (J9VMThread:0x0000000069552300) "Thread-13477533" (J9VMThread:0x00000000738D3E00) "Thread-13477540" (J9VMThread:0x0000000075309600) "Thread-13477541" (J9VMThread:0x00000000759F9600) "Thread-13477549" (J9VMThread:0x00000000741B7400) "Thread-13477550" (J9VMThread:0x00000000748CF900) "Thread-13477551" (J9VMThread:0x0000000075107B00) "Thread-13477558" (J9VMThread:0x0000000075534700)

org/apache/log4j/spi/RootLogger@0x000000000B1F5B00:
owner "Thread-13477552" (J9VMThread:0x000000007528B700), entry count 1 Waiting to enter:
"WsSessionInvalidatorThread" (J9VMThread:0x000000006B6D3B00) "WebContainer : 2" (J9VMThread:0x000000006C513900) "WebContainer : 5" (J9VMThread:0x000000006C27DB00) "WebContainer : 6" (J9VMThread:0x000000006AF9A600) "WebContainer : 10" (J9VMThread:0x000000006C4F3600) "WebContainer : 12" (J9VMThread:0x000000006B738E00) <total 945 threads>

The code stack of thread - "Thread-13477552"
is missing from Javacore.

All the blocked threads are blocked on same code as below.
Blocked on: org/apache/log4j/spi/RootLogger@0x000000000B1F5B00
Owned by: "Thread-13477552"
(J9VMThread:0x000000007528B700, java/lang/Thread:0x0000000049802B48)
Heap bytes allocated since last GC cycle=0 (0x0) Java callstack:
at org/apache/log4j/Category.callAppenders
(Category.java:204(Compiled Code)) at org/apache/log4j/Category.forcedLog
(Category.java:391(Compiled Code)) at org/apache/log4j/Category.debug(Category.java:260(Compiled Code)) at com/avaya/jtapi/tsapi/tsapiInterface/TsapiSession.handleRead
(TsapiSession.java(Compiled Code)) at com/avaya/jtapi/tsapi/tsapiInterface/oio
/TsapiChannelOio$OioChannelThread.run
(TsapiChannelOio.java:259(Compiled Code))

You need to engage log4j vendor for any known issues for native memory issues in "org/apache/log4j/spi/RootLogger"
"org/apache/log4j/Category.callAppenders"
and also need to engage code owners of
com/avaya/jtapi/tsapi/tsapiInterface/TsapiEventQueue.get
(TsapiEventQue ue .java(Compiled Code)) (entered lock:
com/avaya/jtapi/tsapi/tsapiInterface/TsapiEventQueue@0x0000000008185D18,
entry count: 1) at com/avaya/jtapi/tsapi/tsapiInterface/TsapiEventQueue.run(TsapiEventQueue
.java(Compiled Code)) to find out why 747 these threads -"DistributeCSTAEvent" are getting created.
Another work around is, turn off all the logging and restart JVM and try to recreate the issue.

Thanks
Filename trace.log.2017-03-26 [Disk] Download
RossYakulis
Joined: Nov 6, 2013
Messages: 2652
Offline
It looks like memory issue. You need to monitor the JVM and see if the issue occurs when calls are at the peak. It also depends on how many ports you have. Also, reducing the threadstacksize can help. You may need to adjust the garbage collection parameters as well.
vinothrajendran2
Joined: Nov 12, 2013
Messages: 101
Offline
Hi Ross,

Thanks.
We have two WAS servers.
This issue happened only on WAS2. I think load may not be the issue as WAS1 is primary and it takes more load.
There was network issue and looks like threads were blocked . is it not releasing the threads?. Does it require restart during such scenarios?

Can you advise on the reasons for below errors . Does it indicate there was network error?

26/03/2017 01:02:33:685 ERROR - CTICallObserver.CTICallObserver: failed to add an observer on extension: 31096 error is client not in service
26/03/2017 01:02:33:685 ERROR - CTICallObserver.CTICallObserver: failed to add an observer on extension: 31404 error is client not in service

26/03/2017 01:10:59:219 ERROR - No observer on extension:31330
26/03/2017 01:11:13:228 ERROR - No dateTime found. No connection to server

26/03/2017 01:14:46:745 ERROR - BlindCall.doBlindCall 31233: error performing blind call. This error can be valid if callee is busy or unreachable java.lang.NullPointerException
26/03/2017 01:14:46:761 ERROR -
26/03/2017 01:14:46:761 ERROR - BlindCall.doBlindCall 31233: error attempting to disconnect caller com.avaya.sce.cticonnector.servlet.CTIConnectorException


Thanks
RossYakulis
Joined: Nov 6, 2013
Messages: 2652
Offline
You should only need to restart the connector aesconnector.war/ear.
Go to:   
Mobile view