4G debugging, all the common problems
Post date: Aug 2, 2014 3:24:09 PM
Yet another frustrating debugging session for things that should have been extremely simple.
Situation.
1. Sim cards from three different providers A,B,C
2. Embedded device with 4G data
3. TCP connections over 4G data to embedded device suppier
Shouldn't be too hard, should it?
1. Stick SIM card in
2. Enter PIN
3. Check APN infomation
4. Fire away
Yes, that's the ideal scenario. I guess that should take about maximum of a few minutes per device. Truth? It will take months. Because someone want's to use provider A. So does it make any difference which provider you use? No, it shouldn't. But yet, it does. Provider A service simply doesn't work. After contacting provider A they tell that there's nothing wrong with their connections. Ok, that's fine. Just as I suspected. Well then I contact device provider, they claim that their devices do work, and there must be something wrong with the serice provider.
Yet with service providers B and C it does work. And most curiously, the device does also work with service provider A SIM card when it's roaming in other country. So yes, it's clear that it has something to do with service provider A's network. But what, that's the great question.
Aheem, the classical setup which I've been describing so many times in this blog and has seen in different integration projects countless times over and over again. There's nothing wrong, it just doens't work. As technical guy, I can say, someone is now lying and hard. Because it doesn't work, there has to be something wrong. That's the 'secret' I know. And that's exactly why I'm the guy who's being sent when others fail.
After doing some analysis, my conclusion was that it seems probable that the device misinterprets the address assignment from service provider. I checked with the service provider that they assign address X and the device doesn't show it. Most interestingly the device shows gateway address Y which is assigned by the service provider. This is curious, because it points out that the device does receive at least some address information from service provider but doesn't seem to decode it all correctly.
Well, the help desk of the device provider told me, that the IP address seems to be ok. When I reported them my findings, their reply was that there seems to be an IP address, and it's configured by the service provider. Fail, it wasn't the IP address assigned by the service provider as well as the address it self was invalid and subnet mask was completely missing and gateway address was in the 10. network when device IP was in 192. network. And the service provider was assigning a globally addressable ipv4 address for the device.
Next I did some analysis purely based on educated guess. What are the changes that small embedded device provider provides a problematic firmware that doesn't work properly with providers around the world, versus the change that large national 4G service provider doesn't send correct address assignment information. I would say that it's 90% sure that the problem is on firmware / device configuration side. Yet they just had told me that it's not their fault. How do they know that it's not their fault, unless they have checked the packet capture and confirmed that their software receives and decodes the packets correctly? Unfortunately because it's embedded device with internal 4G adapter, I don't have any easy changes to packet capture the traffic between service provider and the device, so I could pinpoint the problem or modify packets on the fly so that it would work. Even if the problem would be on operator side, they could easily tell what the exact problem is.
How the problem would have been solved very quickly? The embedded device provider should load debug firmware to the device and collect required logs and see what happens. Or if that's too hard, simply send someone competent with testing equipment and verify this matter. I'm sure that detecting, locating and fixing this kind of issue won't take more than 15 minutes if they're competent. And instead of lying and really do something about it. Unfortunately this scenario is just way too familiar to me.
Finally problem was resolved with the operators technical department. Operator said that the problem is that the embedded device is using MTU size of 1500 bytes. But it should not be larger than MTU 1428 because of L2TP/PPP. That's it, I just which I would have known that earlier. As a bonus, of course that parameter can't be changed using user interface at all.
Yet resolving that issue clearly required that their techies look a good look at debug logs, and the embedded device provider didn't provide any practical help. After all, the operator implemented workaround which allows the solution to work, and the embedded device remains (at least now) as it was earlier, using the 1500 MTU.
Other things which are interesting. The device debug printout still even with working connection, mist reports IP address and SUBNET. Really wonderful. These kind of minor things can unfortunately throw off problem resolution. Especially in cases when you're resolving the problem without deeper knowledge how the device / system behaves in situations which are abnormal. So you first have to run debugging session and creating problems on purpose, to see how things look and behave, and only after then start really looking into the problem. Because otherwise you can't trust debug prints, or even worse, you can get badly mislead by received disinformation.
Yet it remains untested, if the APN and systems would work with smaller mty 1428 bytes, without other changes. I would always love to very things like that.