Updating firmware is a key element of any scalable IoT device deployment. IoT devices - as small and simple as they appear - remain complex systems under the hood, containing many 1000's of lines of code. IoT device operators need their devices to be able to adapt to changing conditions, device functionality, field trial learning and customer feedback, and security threat mitigation after deployment to the field. Updating firmware over-the-air (OTA) - over the very same long range low power wireless links that the devices rely on for normal operation - is the only way to achieve the necessary level of adaptive scalability that all IoT device deployments require.
Many IoT devices are designed to operate in the so-called "smart city" environment where low-power and long-range wireless links are the only means to communicate with the device. An emerging technology to connect many inexpensive, low-power devices through a long range link is LoRaWAN from the LoRa Alliance. LoRaWAN is characterized by the use of inexpensive tiny radios on devices, relatively inexpensive gateways, and a network protocol stack that that implements the MAC - medium access control - layer in the cloud. The LoRaWAN MAC, operating on networked servers, is responsible for scheduling transmissions to the devices including acknowlegements and link control messages.
The Things Network (TTN) is one such LoRaWAN operator. Beyond championing a global grassroots LoRaWAN deployment, they also design, build, and deploy LoRaWAN technology including software and hardware. One of TTN's initiatives is a proposal for updating device firmware over a LoRaWAN network. The technical aspects of this proposal is discussed here.
One of the main challenges with delivering data in bulk to a large number of devices is that LoRaWAN is heavily biased towards sending data from devices to gateways. Communication is initiated from the devices because they are supposed to be sleeping - powered down - for most of the time. When a device does initiate a transmission, the LoRaWAN protocol offers up to two receive time slots (windows) that may be used to send data from the gateway to the device. However, one of these slots is typically used for acknowledgements, and the other slot is still aligned or assigned to just one device, and often only a few bytes can be expected to be delivered in this window. With this existing paradigm, an OTA firmware update would require each device to initiate a transmission (consuming energy), wait for and collect a few bytes of the OTA update data in the ensuing response. With 1000's of devices all competing ad-hoc to access the radio medium and download their own copy of the same OTA update data, it becomes clear how inefficient such an update protocol would be.
TTN's OTA update over LoRaWAN proposal is designed to address these deficiencies by allowing groups of devices to simulaneously enter a special receive state upon command called a multicast session. While in this receive state, the device's radios are on, and the LoRaWAN network can schedule a bulk transfer of data in the form of many sequential messages. These messages contain the OTA firmware update data that is targeted at all the listening devices simultaneously.
However, it could take some time to get all devices setup and ready because the network must wait for each device to wake up and transmit at its regular interval. A device manufactuer would be wise to ensure devices do wake up regularly but not at the same time to avoid congestion. For example, devices could be designed to report once a day regardless of other stimuli. In this scenario, it could still take hours to get all targeted devices ready for a bulk download session. This is because all devices would have to be scheduled to transmit in a coordinated fashion once all devices had reported in so that the device radios are not unnecesarily activated. Furthermore, devices will need a means to coordinate their activity and thus will need to set their internal real-time clocks (RTCs) to a common and accurate time base.
During the bulk transmission of OTA update data, it is expected that some messages will not be successfully received by some devices due to interference, etc. To deal with the lossy nature of the link, the TTN OTA update protocol tells devices to enter a second bulk receive window during which the gateway will send a series of data correction packets that can be used by each device to fill in the gaps remaining in the OTA update image from the first bulk receive window. These correction packets will cycle until all devices send a message and report that they have a complete and valid firmware image, or a timeout period elapses.
Though not discussed in the proposal, it is conceivable that the OTA update process will timeout before all devices signal that they have received a complete update. In this scenario, the OTA update process would need to be scheduled again. It is possible that only the devices that did not complete the last update could be commanded to participate in the next update cycle. This could be viewed as an OTA update retry at the higher level, which may not be part of TTN's core protocol and would have to implemented by the customer's cloud application.
Securing the OTA firmware update process is part of TTN's proposal. Each device establishes its own unique secure session with the LoRaWAN network for normal communications. This secure session is used to securely transmit a common OTA download key protecting the OTA download packets to each device that is commanded to participate in the multicast session. To protect against packet injection by an attacker that has compromised a device and obtained the multicast session key, the firmware binary itself is protected by a checksum that must be sent to the server with the completion message. The server responds if the checksum reported by the device matches that computed by the server, over the device's unique secure session (independent of the multicast session so that the attacker's knowledge of the multicast key cannot be used to spoof server communications with any other device). Only upon a successful response by the server shall a device actually commit the update.
However, this in and of itself is not sufficient to protect from malicious packet injection because it can be possible to construct an altered firmware image that generates the same checksum (the ease of which this can be done is dependent on the checksum algorithm). To combat this potential issue, the TTN proposal offers that both the device and the server compute a MIC - message authentication code - that is based on both the unique private session key and the firmware image checksum, which is very difficult to forge by an attacker without knowledge of the device's unique session key.
The update process combined with the lossy nature of the network requires portions of firmware data to be held as it is received. Additionally, the security requirements of the update process need complete verified images to be present on the device before being committed. As a result, devices must be designed to include persistent storage space for a complete firmware update image in addition to the actual, running, firmware image. In practise this means that device microcontrollers need to house internal flash - or be connected to external flash - that is at least twice the size of the expected future firmware image size requirements for the device.
Concept Introduction: Firmware Patches
To further reduce transfer time, save power and boost the likelihood of a successful update on all devices, the concept of delivering firmware patches over-the-air is introduced: only the differences between a device's current firmware and the new firmware is transmitted rather than the entire new firmware image. This is an option that is possible for customers of Firmware Modules' IoT Core OTA update technology, and would be transparent to TTN's firmware update implementation.
In addition, the Firmware Modules IoT Core OTA update technology encrypts firmware update images - beyond any encryption that TTN performs - for the purpose of anti-cloning and IP theft protection and therefore customers can use low-cost unsecured areas such as external SPI flash to store update images should they choose to use reduced cost MCUs.