We are developing a custom device based on the nRF9160 SiP Revision 2 with multiple connections over LTE-M and we are experiencing problems with keeping our only TCP connection connected. The server with which the device connects has TCP keepalive enabled with an interval of 1 minute. Additionally, we have implemented another layer of keepalives on top of the TCP layer with an interval of 10 minutes.
What we noticed is that at seemingly random times the device stops responding to the TCP keepalives, while the other connections are still active & up and the modem stayed registered to the network. Sometimes the connection stays up for multiple hours and at other times the connection is lost after a mere minute.
We have found no way of also enabling TCP keepalive on the device itself, so the device is unaware that the connection has been closed by the server. The devices becomes aware of an issue with the connection is when the 10 minutes keepalive timer expires, then closes its socket and reconnects to the server. When the server closes the connection while the device is still connected, the device is aware that the connection is closed. We are aware of the issue with TCP connections in combination with eDRX and PSM (https://devzone.nordicsemi.com/f/nordic-q-a/55473/fundamental-edrx-design-choices-by-verizon-breaking-functionality) and have therefore disabled these functionalities (and we are not using Verizon.
One of the suggestions made here was to subscribe to the CSCON notifications, but that only would work if there are no other connections using the LTE-M (as they would wake up the modem).
We made a test, with multiple devices, were the device sends 1 byte packets to the server every 59 seconds to circumvent the TCP keepalives and reverse the direction of the traffic. This way the connections stayed up for over a month. This is not the ideal solution as this increases data usage.
Currently, we have enabled socket offloading and are using nRF Connect SDK v1.7.0, and modem firmware 1.3.0. Newer versions of both are available, but would required rewriting the AT command layer as this interface was changed in SDK v1.8.0 and the release notes don't suggest that it would fix our problem.
Are there other suggestion that we might have missed?
Is it possible (in the newer version) to send 0 bytes (TCP keepalive message) via the socket interface? In v1.7.0 sending 0 bytes returns an error.