When you have a small number of easily accessible IoT devices, updating the firmware or device configuration is possible with direct connections to the device but, as the number of IoT devices grows or are geographically dispersed, it becomes less feasible from a logistical and cost point of view to perform firmware updates in this fashion.
This is where Over the Air (OTA) updates become crucial. It enables devices to upgrade themselves when they are given information that a new firmware is available to download. Although this is not new to remote machines, more often we see devices that have been compromised due to poor security measures put in place during OTA upgrades or the device being bricked due to poor robustness procedures.
1. Encrypt firmware updates
Embedded software is expensive to develop. The process is costly and time consuming to not only build the firmware but develop and run test cases against. In some cases, the research and Intellectual property that gives the company the monetary or marketing value of the device would be diminished if it becomes publicly available. This value is worth protecting by ensuring that the firmware at rest is properly encrypted.
Encrypting the firmware makes the binary file appear to be random data to whoever may have unauthorised access to it, making it near impossible to analyse or reverse engineer the file. It’s also part of the device process where, through the use of public and private keys, ensure that the origin of the file is only from the intended manufacturer.
2. Update integrity checks
When a device downloads the firmware from a remote server without any integrity checks then it’s vulnerable to “man in the middle” attacks where a hacker can pose as the server that the firmware is being downloaded from and give an alternate firmware than what was intended by the manufacturer. It would also be susceptible to corrupt firmware due to transmission errors.
To mitigate this problem, we perform a hash on the firmware and encrypt that key along with possible other metadata such as version number and the firmware using the private key. The device decrypts the metadata and performs a similar hash on the firmware, matching that with the metadata hash key. If the hash does not match, then the firmware is considered invalid and the upgrade does not proceed.
This gives us three advantages:
- Firmware that has not been encrypted with the proper private key would be invalidated by the device as it’s unable to decrypt the metadata using the public key. This ensures that the only firmware placed in the device comes from the correct manufacturer.
- Changes in the firmware not created by the manufacturer due to transmission problems or “man in the middle” attacks will be detected by the hash comparison, deeming the firmware invalid and not installed on the device.
- The same hash key should be able to be used by the bootloader before starting the firmware ensuring that the firmware has not been altered locally by corrupt memory or external physical attack of the device.
3. Keep the OTA server secure
Reverse engineering is the simplest way of determining security vulnerabilities in IoT firmware. Sometimes giving public access to the device firmware attracts such practices which can cause not only problems for a particular device but can give leeway into the entire IoT system, as well as giving away company IP.
Where possible, the server that contains the firmware should be made only available to the devices that that interface is intended for. When a device is informed to upgrade its firmware the relevant security credential (username/passwords or signed urls should be provided to allow access to the appropriate firmware).
4. Ensure secure communications when downloading firmware
There are two typical ways in which a device retrieves the appropriate firmware download from the system:
- HTTP: When a device is told to update its firmware it’s given the url of the server in which to retrieve such firmware. Typically this URL would be a signed URL giving the device permission to download and the connection establishment would be over TLS using a https url. This way the firmware in transit is encrypted to avoid sniffing of the packets being downloaded.
- MQTT: When a device makes a request for firmware, the server fragments the firmware into smaller chunks of data and publishes those fragments on an agreeable MQTT topic. The device would subscribe to this typic defragmenting of the firmware as more packets come in. As the MQTT connection should already be an established TLS connection, then the transmission of this data is already secure.
5. Automatic recovery of failed updates
Despite all the in-lab testing that may occur, sometimes updated firmware is either not suitable for the device or causes it to not run or continue to restart. Without some sort of recovery mechanism, this will render the device unusable until it is physically reinstalled with the appropriate firmware.
Bricking a device due to a failure in the upgraded software can be avoided by having a rollback mechanism on the device itself. Typically the device would have two or three flash areas that are dedicated to the firmware. One would be for the baseline firmware installed in the factory, one for the currently running firmware and one for the newly upgraded firmware.
Typically the two areas of flash used for the running and upgrade firmware would alternate every time the firmware is upgraded. If the firmware is running in area A then the upgrade will be installed in area B and the bootloader will restart the firmware from there. Upon the next upgrade, area B now has the running firmware and the upgraded firmware would be installed in area A.
Part of the bootloaders function is to ensure that the running firmware is stable and if not resort to running the firmware from the flash area that held the previous firmware version. If there is no previous version then (if available) returning to the factory installed firmware.
By doing this, the device will always have some means to continue functioning at least to the point where we are able to remotely configure and provide future upgrades.
6. Segregate devices and roll the upgrades
In house testing will usually discover any issues with a newly developed firmware before it’s deployed to devices in the field. However, experience has shown that various environmental conditions, or unforeseen configurations of the device may cause errors in a percentage of devices that have been upgraded which may not spark a rollback but would make the device not function as it was intended and possibly need some manual intervention to put it back into the right state.
With a few devices this can be handled without much damage to the product as a whole, however when there are thousands of devices or they are geographically diverse, such errors can cause considerable damage to the working of the IoT system as a whole. Repairing the problem becomes exponentially harder and costly to manage.
A better practice is to divide the devices into subsets which may be based on geographical locations, operating functions or other business logic. By upgrading each subset one at a time and ensuring that the system is still functioning normally during each upgrade we reduce the blast radius caused by any possible errors in the firmware.
7. Minimise functional impact during the upgrade procedure
For most devices OTA upgrade is an autonomous procedure that can happen at any time. For devices that report sensory data periodically this may be acceptable. But if we consider cases such as manufacturing lines where autonomous machines are functioning or data is being recorded to track production levels during a job run, it’s just not feasible for the device to shutdown to restart the new firmware at any random time. The use of the device must be considered, what human interactions need to take place, or what is the current state of the device ( currently processing or idle ) before allowing OTA to occur.
In summary
OTA is an important part of the IoT ecosystem. Without it there is no feasible means to upgrade firmware when security flaws or possible bugs are discovered in greenfield devices. But without proper consideration to security and robustness, the advantages that this feature gives can become a considerable disadvantage, sometimes causing a great amount of damage to the system, company brand, or monetary budget. Here we have outlined a few considerations that formulate a secure and robust OTA system. How this is achieved greatly depends on many factors including the device’s capability and use as well as the cloud or backend infrastructure available.
DiUS can help you design and develop the best solution based on the company’s needs and capabilities. Get in touch and we’d be happy to help.