Author Message
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
Hello,

I have developed an application where i monitor incoming calls and events for 100 VDNs. I'm using DMCC version 6.3.3
I am having an issue where once in 48 hours the connection to server goes down.

from my application logs:

2018-10-27 09:32:00.322 +02:00 [ERR] AES Server connection error. Error: Number of Consecutive keep alives missed: 1. Error type: Missed At Least One Keep Alive Event

2018-10-27 09:32:00.323 +02:00 [INF] Trying to reconnect to the session with ID = 3C1452555FDE19F95B0AD6AE139D635F-9100
2018-10-27 09:32:00.351 +02:00 [ERR] AES Server connection error. Error: Unable to read from socket. Socket to server has been closed.. Error type: Server Connection Down Event
2018-10-27 09:32:00.352 +02:00 [INF] Trying to reconnect to the session with ID = 3C1452555FDE19F95B0AD6AE139D635F-9100
2018-10-27 09:32:00.353 +02:00 [ERR] AES Server connection error. Error: Unable to read from socket. Socket to server has been closed.. Error type: Server Connection Down Event
2018-10-27 09:32:00.355 +02:00 [INF] Trying to reconnect to the session with ID = 3C1452555FDE19F95B0AD6AE139D635F-9100
2018-10-27 09:32:00.415 +02:00 [INF] Number Returned From Reconnectiog = 767
2018-10-27 09:32:00.417 +02:00 [ERR] AES Server connection error. Error: Unable to read from socket. Socket to server has been closed.. Error type: Server Connection Down Event
2018-10-27 09:32:00.418 +02:00 [INF] Trying to reconnect to the session with ID = 3C1452555FDE19F95B0AD6AE139D635F-9100
2018-10-27 09:32:00.509 +02:00 [INF] Number Returned From Reconnectiog = 769



And this happens endlessly until the session gets cleaned up (after 60 secs).
I am trying to reconnect using the Reconnect method, passing to it the current session ID. My method looks something like this :

private void Dmcc_OnConnectionError(object sender, ConnectionErrorArgs e)

{
logger.Write($"AES Server connection error. Error: {e.Message}. Error type: {e.ErrorType}", LogEventLevel.Error);
dmcc.Reconnect(e.SessionId);
}



And how i am trying to reconnect to the current instance of service provider:

public void Reconnect(string sessionId)

{
logger.Write($"Trying to reconnect to the session with ID = {sessionId}", LogEventLevel.Information);
var numberReturnedFromReconnectiog = serviceProvider?.Reconnect(sessionId, null);
logger.Write($"Number Returned From Reconnectiog = {numberReturnedFromReconnectiog}", LogEventLevel.Information);
}



I want to understand 2 things:
1. Why do i get this connection error so often?
2. Why isn't the reconnect method working? Am i using it wrong?


Thank you in advance.
Esmeralda
JohnBiggs
Joined: Jun 20, 2005
Messages: 1139
Location: Rural, Virginia
Offline
it is hard to say what may be happening on the wire or in your application that causes the session keep alives to not arrive at AE Services when all you provided is the AE Services view of the keep alives timing out.
1) are you using the latest version of the SDK (or at least doing a test with it)? I would try that. there was a very old bug where the invokeIDs did not wrap around properly in the DMCC .NET SDK (did you say which SDK variant you were using).
2) does a packet sniff show the keep alive being sent?
3) does a trace from the SDK show the keep alive being sent?

As to why recovery doesn't work for you
1) you showed us your code, but have you reviewed what the AES has to say when handling those requests (set logging to FINEST)?
2) have you used the DMCC Dashboard and educated yourself relative to the session recovery process (you may need to temporally disable the NIC on the PC to test session failure, or pull the LAN cable)?
3) session recovery is available to the application for a limited window of time. Are you doing the recovery within that window. IIRP the window is specified when you initially created the session.
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
1. I am using verison 6.3.3. Anything wrong with using this version?
2. I am not using any packet sniffer to see the requests being sent
3. I am not logging the keep alives being sent since it is something the sdk takes care off. is there any example how can i log those requests?

1. Unfortunately i have no access to server logs but i asked to have them so i believe i will have them soon so i can study them
2. No i have't done that
3. When you say window of time, you're talking about SessionCleanupDelay and SessionDuration? And yes i believe i am doing the request in that time intreval. i have set SessionCleanupDelay = 60 secs and SessionDuration = 180 secs
JohnBiggs
Joined: Jun 20, 2005
Messages: 1139
Location: Rural, Virginia
Offline
1. I am using verison 6.3.3. Anything wrong with using this version? - Not that I know of/ recall 6.3 is a good while ago release wise, you can use newer versions of the SDK with older releases of AE Services at least for a test to make sure there isnt an SDK issue.

2. I am not using any packet sniffer to see the requests being sent - that would help ensure the AKA message is leaving your server.
3. I am not logging the keep alives being sent since it is something the sdk takes care off. is there any example how can i log those requests? - enabling SDK logging is described in the programmer's guides for the SDK versions.

1. Unfortunately i have no access to server logs but i asked to have them so i believe i will have them soon so i can study them
2. No i have't done that
3. When you say window of time, you're talking about SessionCleanupDelay and SessionDuration? And yes i believe i am doing the request in that time intreval. i have set SessionCleanupDelay = 60 secs and SessionDuration = 180 secs - yes the sessionCleanupDelay is what I was referring to. How do you know you are initiating recovery in that 1 minute window? are you reacting to the SDK informing you that it didnt get responses to the KA messages? How do you know the recovery request made it to AE Services (you need the AES side logging).
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
you can use newer versions of the SDK with older releases of AE Services at least for a test to make sure there isnt an SDK issue.
=> i will try to do it

yes the sessionCleanupDelay is what I was referring to. How do you know you are initiating recovery in that 1 minute window? are you reacting to the SDK informing you that it didnt get responses to the KA messages? How do you know the recovery request made it to AE Services (you need the AES side logging).
=> I am actually reacting to OnMissedAtLeastOneKeepAliveEvent event and OnServerConnectionNotActiveEvent.
I know i am making the request in those 60 seconds before the session gets cleaned up because i saw from the logs that for 1 minute my application was sending Reconnect requests.

Is it ok to handle the OnMissedAtLeastOneKeepAliveEvent by trying to reconnect to the session? from documentation that's how i understood it should be done.

Would it be possible, that when connection is down, to disconnect and connect again (new session), without stopping my console application?
JohnBiggs
Joined: Jun 20, 2005
Messages: 1139
Location: Rural, Virginia
Offline
If you swap out the AES SDK, make sure you set the protocol version to whatever was appropriate for the 6.3 version of the SDK in your code when creating the session.

Disconnecting the session will require you to re-establish the monitors on the VDNs. Session recovery works, if that fails, then creating a new session is appropriate because the prior session will have been terminated.

You have to wait for the session to be down (3rd missed KA) before you can reconnect, but given there appears to be a issue with the KAs arriving at the AES that lasts 3 minutes I am not sure that the recover session request is making it to the AES in the window where it would be valid (those 60 seconds after it goes down (not the first missed KA, but rather after the third). Again the AES logs relative to the window of time when the last KA is missed and the recover message should be getting received would be quite informative, so would enabling logging in the SDK to make sure you have the timing right.

A session failure and recovery can be simulated in a lab environment by disconnecting the LAN connection at the AES (don't have to wait 2 days) to prove you code is sound.
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
I enabled sdk logging, it logs all xml messages exchanged betweenmy application and server, but isn't it going to load my application too much when it goes in production? I can see a huge flow of data being written, considering we are monitoring 100 VDNs.

A session failure and recovery can be simulated in a lab environment by disconnecting the LAN connection at the AES (don't have to wait 2 days) to prove you code is sound. ---> You mean to disconnect the server from the LAN? I can't do it since it is being used in production for different products..
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
Now when i try to disconnect and then reconnect , i get the following error :
<StartApplicationSessionNegResponse xmlns="http://www.ecma-international.org/standards/ecma-354/appl_session"><errorCode><applError>Terminated session can not be reconnected. Reason: Client socket closed</applError></errorCode></StartApplicationSessionNegResponse>


Why do i get that, if im actually creating a new session and trying to connect to it?
JohnBiggs
Joined: Jun 20, 2005
Messages: 1139
Location: Rural, Virginia
Offline
1) don't you have a lab testing environment? If you do your work against production systems you will be hard pressed to isolate issues with your application
2) if you send a session disconnect, the session is gone. You can not reconnect to it.
3) you may be able to use a remote lab session to do this testing: https://www.devconnectprogram.com/site/global/products_resources/avaya_aura_application_enablement_services/development_tools_configurations/aes_cm_remote_lab/index.gsp
MartinFlynn
Joined: Nov 30, 2009
Messages: 1922
Online
Hi Esmeralda,

Please take a look at the code in your callback/listener methods that receive events from AE Services. If these methods perform significant processing (e.g. make database accesses, send DMCC requests to AE Services, make file accesses etc.) then this is almost certainly the cause of your problems.

It is very important that applications exit the callback/listener methods as quickly as possible. Otherwise the SDK will not be able to process other incoming events and, eventually, AE Services outgoing buffers will overflow and it will be forced to bring down the connection.

Make sure that all listeners/callbacks take the incoming event and place it on an internal queue. You should perform the actual event processing using your own, internal, threads.

Martin
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
JohnBiggs wrote:1) don't you have a lab testing environment? If you do your work against production systems you will be hard pressed to isolate issues with your application
2) if you send a session disconnect, the session is gone. You can not reconnect to it.
3) you may be able to use a remote lab session to do this testing: https://www.devconnectprogram.com/site/global/products_resources/avaya_aura_application_enablement_services/development_tools_configurations/aes_cm_remote_lab/index.gsp


1) no i don't unfortunately, i will try to use the remote lab session you specified , thank you very much
2) but i am creating a new session (new ServiceProvider instance), i am not trying to reconnect to the last one, that's why it doesn't make sense to me
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
MartinFlynn wrote:Hi Esmeralda,

Please take a look at the code in your callback/listener methods that receive events from AE Services. If these methods perform significant processing (e.g. make database accesses, send DMCC requests to AE Services, make file accesses etc.) then this is almost certainly the cause of your problems.

It is very important that applications exit the callback/listener methods as quickly as possible. Otherwise the SDK will not be able to process other incoming events and, eventually, AE Services outgoing buffers will overflow and it will be forced to bring down the connection.

Make sure that all listeners/callbacks take the incoming event and place it on an internal queue. You should perform the actual event processing using your own, internal, threads.

Martin


Hi Martin, thanks for your advices.
The only thing i am doing is file accessing for logging, maybe i should take care of logging when handling those events later on. Would you suggest making different processings in diferent threads? so as not to block the threads communicating with the server?
MartinFlynn
Joined: Nov 30, 2009
Messages: 1922
Online
> The only thing i am doing is file accessing for logging, maybe i should take care of logging when handling those events later on

If your logging is directly to a file, this could possibly get blocked for short periods. If, however, you use something like log4j, this may buffer the output before writing to file so that may be OK.

> Would you suggest making different processings in diferent threads? so as not to block the threads communicating with the server?

I am not sure what you are asking here. Once the SDK thread is not blocked, you are free to use any threading policy that you like. Just remember:

a. If you use one worker thread, you may still have a case where incoming events are getting backlogged. This time, they are on your queue so the connection to AE Services should be OK but this may still cause a degradation in the performance of the application (e.g. the application updates the agent's screen 10's of seconds after the call arrives).

b. If you process each event with a different thread (or use a pool of threads), you can end up with a case where the application is processing two events for a call at the same time. This can be difficult to handle. e.g. you can have one thread processing a Delivered event and another processing the Established event for the same call.

Martin
JohnBiggs
Joined: Jun 20, 2005
Messages: 1139
Location: Rural, Virginia
Offline
In looking back at all of this thread, maybe I overlooked the obvious. Are you reacting to the first missed keep alive and trying to do session recovery then, or are you waiting till you get the server connection down event from the library? you should wait for the latter. before initiating session recovery. Please review the session recovery section in the programmer's guide.
EsmeraldaH
Joined: May 24, 2018
Messages: 28
Offline
Hi John, you are right, i am reacting for each of the 3 events : ServerConnectionDownEvents, ServerConnectionNotActiveEvents, and MissedAtLeastOneKeepAliveEvents , trying to reconnect for each of those events.

And when i receive the StartApplicationSessionNegResponse i stop the current session and i initiate a new ServiceProvider.StartApplicationSession request.

Now im thinking that i should handle each negative response from server in different way.
When getting MissedAtLeastOneKeepAlive i shouldn't react to it?
Go to:   
Mobile view