Skip to content

[WIP] DFUService Implementation #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

[WIP] DFUService Implementation #13

wants to merge 7 commits into from

Conversation

AGlass0fMilk
Copy link
Member

@AGlass0fMilk AGlass0fMilk commented Nov 16, 2020

BLE Standard Service:

  • Yes
  • No

Proposed UUIDs:

Service Name: DFUService
UUID: 53880000-65fd-4651-ba8e-91527f06c887

Characteristic Name: Update Slot Characteristic
UUID: 53880001-65fd-4651-ba8e-91527f06c887
Mandatory?

  • Yes
  • No
    This characteristic may be omitted if the application only supports a single update candidate slot.
    Mandatory Properties: Read, Write w/ response
    Optional Properties: None
    Security Requirements: Determined by application

Characteristic Name: Offset Pointer Characteristic
UUID: 53880002-65fd-4651-ba8e-91527f06c887
Mandatory?

  • Yes
  • No
    Mandatory Properties: Read, Write w/ response
    Optional Properties: None
    Security Requirements: Determined by application

Characteristic Name: Binary Data Stream Characteristic
UUID: 53880003-65fd-4651-ba8e-91527f06c887
Mandatory?

  • Yes
  • No
    Mandatory Properties: Write w/ response (TBD: for writes that are synchronized with underlying flash programming?)
    Optional Properties: Write w/o response (for fast, buffered operation if device has enough resources/RAM to support)
    Security Requirements: Determined by application

Characteristic Name: DFU Control Characteristic
UUID: 53880004-65fd-4651-ba8e-91527f06c887
Mandatory?

  • Yes
  • No
    Mandatory Properties: Read, Write w/ response, Notify, Indicate
    Optional Properties: None
    Security Requirements: Determined by application

Characteristic Name: DFU Status Characteristic
UUID: 53880005-65fd-4651-ba8e-91527f06c887
Mandatory?

  • Yes
  • No
    Mandatory Properties: Read, Notify, Indicate
    Optional Properties: None
    Security Requirements: Determined by application

Other proposed constants:

Application Layer ATT Error Response Codes:

/**
* As per Bluetooth Core specification V5.2, Vol 3, Part F, Table 3.4 (Error Codes)
* ATT Error Codes between 0x80 and 0x9F are reserved for use by the application
*
* These error codes are valid for the DFUService application layer in addition to those
* defined in the GattAuthCallbackReply_t enum.
*/
enum ApplicationError_t {
AUTH_CALLBACK_REPLY_ATTERR_APP_BUSY = 0x019E, /** DFUService is busy (eg: flush in progress) */
AUTH_CALLBACK_REPLY_ATTERR_APP_INVALID_SLOT_NUM = 0x019F, /** Client requested invalid slot index */
};

Overview:

The DFU Service implements a flexible API enabling in-field updates to be transferred to a GattServer over BLE.

The primary goal of the DFU service is to enable fast, error-free binary transfers. The processes to perform validation and installation of the update binaries are not specified by the DFUService. However, the DFUService does implement features that facilitate communication of update candidate validation and installation results. This allows the DFUService to be used in a wider variety of DFU applications, both secure and non-secure.

TODO - more details... I'm getting tired of typing at the moment...

Detailed Description

Update Slot Characteristic

The Update Slot Characteristic is an optional characteristic of the DFUService. It allows the peer to select from an array of valid update slots offered by the GattServer.

An update slot is a contiguous memory address space where an update candidate binary can be written for validation and installation. The method of validation and installation is outside the scope of this service specification.

Typically, an update slot will be a contiguous region of non-volatile memory. This is not a requirement of the service, but the underlying memory must be mapped such that the update slot address range is contiguous (eg: using a ChainingBlockDevice to join disjoint regions of underlying flash memory together).

The number of slots supported is determined by the application but is limited to between 1 and 255 by the width of the slot characteristic data (uint8_t). A write request attempting to write an invalid slot index (ie: one that does not have a registered/valid backing BlockDevice) will be rejected with the application-defined response code: AUTH_CALLBACK_REPLY_ATTERR_APP_INVALID_SLOT_NUM

The Update Slot Characteristic may be omitted from the DFUService if the application only supports one update slot.

Attempting to write the Update Slot Characteristic while the application is busy processing buffered binary stream data may result in the write being rejected with the application-defined response code: AUTH_CALLBACK_REPLY_ATTERR_APP_BUSY

Implementation Considerations:

  • Upon a successful write to the Update Slot Characteristic, the currently-selected slot BlockDevice is deinitialized and the newly-selected slot BlockDevice is initialized.
  • A write to this characteristic while the binary stream buffer is not empty will flag the buffer to be flushed to the underlying BlockDevice. Writes to this characteristic will be rejected until the buffer is emptied and the flush flag is reset.
  • In delta mode, selecting a new slot WILL NOT result in the remaining data in the slot being written with copied application data. To accomplish this, the peer should write the offset characteristic to the point where data should be copied before changing slots.
  • Valid slots are intended to be empirically determined by the peer (as necessary) through write attempts. In most cases, valid slots are predetermined and known by both the client and server.

Offset Pointer Characteristic

The Offset Pointer Characteristic exposes the write offset pointer of the DFUService. The write offset pointer is the offset address of the underlying slot memory that serial binary data will be sequentially written to. It is incremented automatically when data is written to the Binary Stream Characteristic. The offset may be read by the client to validate the expected number of bytes have been processed by the DFUService. The offset may be written by the client to skip a region of memory or revisit a previously-written region of memory.

Similar to the Update Slot Characteristic, attempting to write the Offset Pointer Characteristic while the application is busy processing buffered binary stream data may result in the write being rejected with the application-defined response code: AUTH_CALLBACK_REPLY_ATTERR_APP_BUSY

Implementation Considerations:

  • A rejected write will initiate flushing the buffer to the selected slot block device.
  • Subsequent writes will be rejected until flushing is complete.
  • Any writes to the binary data stream characteristic while the buffer is being flushed will be ignored
  • If the delta bit is enabled, any memory sections skipped will be written with bytes copied from the primary application. This is not applicable for offset writes that result in the offset pointer moving backwards.

Binary Stream Characteristic

The Binary Stream Characteristic is the characteristic the peer will write to transfer update candidate binary data to the DFUService. Data written to the Binary Stream Characteristic will be subsequently written sequentially to the underlying slot BlockDevice. For each byte written to the Binary Stream Characteristic, the Offset Pointer Characteristic will be incremented.

If the peer has previously initiated a flush of the binary stream buffer, writes to this characteristic will be ignored until the flushing operation is complete. This is to prevent unintended writing after the peer has requested an offset change of slot change.

TODO - When are flush operations initiated? Offset write w/ buffered data, slot write w/ buffered data, DFU commit bit set in DFU ctrl.

Similarly, writes will be ignored while the flow control bit is set (indicating data stream should pause) in the DFU Control Characteristic.

Implementation Considerations:

  • The Binary Stream Characteristic uses write w/o response to improve speed of data transfer. Should we consider using write w/ response to synchronize with latency incurred with flash writes? Perhaps write w/o response could be optional depending on if the underlying DFUService buffers binary stream data or not.

Control Characteristic

The Control Characteristic is a set of bitflags, each with a specified meaning:

Bit 0: DFU_ENABLE_BIT. Setting this bit to 1 enables DFU mode. The application may reject a write attempting to set this bit if a DFU operation is currently disallowed. Upon a successful write, the application may use the change in value of this bit to trigger application-specific actions. Eg: perform various preparations such as erasing a flash region or indicate DFU mode to the device user in some way. Manually clearing the DFU_ENABLE_BIT may be interpreted by the application as a "DFU abort" event.

BIT 1: DFU_COMMIT_BIT. Setting this bit to 1 triggers a commit of the DFU operation. The actions taken by the application upon a DFU_COMMIT event are application-defined but may include (but not be limited to): Validating the update candidate binary, installing the update candidate binary, issuing a device reset to validate and install the update candidate. Upon writing the DFU_COMMIT_BIT to 1, the both the DFU_COMMIT and DFU_ENABLE bits will be cleared. This effectively ends the DFU session.

Bit 2: DELTA_MODE_EN_BIT. Setting this bit to 1 enables the delta mode feature of the DFU service. In some cases, an update candidate binary may have regions where the binary data is the same as the currently installed application. In this case, the duplicate program data does not need to be transferred to the DFUService. If this bit is set when a forward-moving offset write occurs, program data between the old offset and the new offset will be copied to the selected update candidate slot. The use of this feature requires prior knowledge of the installed application binary and the update candidate to be installed. Therefore, the GattClient application bears the burden of using the delta mode feature properly.

Bit 7: FLOW_CONTROL_PAUSE_BIT. This bit is read-only. If set, the client should pause writes to the Binary Stream Characteristic. Any writes to the Binary Stream Characteristic while this bit is set will be ignored.

TBD: other assigned bits. Perhaps this should be a 32-bit field so future additions are less limited?

The Control Characteristic may be written by the client or the server. The client is encouraged to subscribe to notifications/indications from this characteristic if write w/o response (ie: buffered mode) will be used. This is so the client can be notified when the flow control pause bit is set or cleared by the GattServer.

The application can reject requests to write this control characteristic as needed (eg: if DFU mode is not allowed at this time, the write to set DFU_ENABLE_BIT to 1 will be rejected). A write attempting to change any read only bits will be rejected by the DFUService automatically.

TODO define read only bits, define application ATT error for read only bits and DFU unavailable errors.

TBD use std::bitset?

Status Characteristic

The Status Characteristic is intended to report the status of the DFU operation. The DFUService does not require the DFU operation to be completed during the BLE connection facilitating the DFU transfer. ie: Validating and/or installing the transferred update candidate binary may or may not be performed during the server's current power cycle. It is application-defined if the update candidate binary is validated immediately after a DFU commit has been performed or if validation and installation is deferred to another software component (eg: secure bootloader).

In the latter case (deferred validation/installation), the success of the DFU operation may be obtained by reading the Status Characteristic during the next connection after the update has been applied. It may also be inferred if the GattServer implements the DeviceInformationService and exposes the current installed application version(s).

TBD: Status codes

Reviewers

@pan- @paul-szczepanek-arm

This is a WIP PR to open up the DFUService design process for comment. My goal is to have a somewhat generic update service that leaves most decisions up to the application.
Decisions such as:

  • Security requirements (ability to enable encryption, etc on characteristics to prevent interception of application binary over the air)
  • Update validation methods, if any
  • Update installation methods (delta updates, etc)

This is PR constitutes a draft of my first cut at designing such a DFU API.

I would appreciate your comments and feedback!

@AGlass0fMilk AGlass0fMilk added the WIP Work in progress label Nov 16, 2020
@AGlass0fMilk AGlass0fMilk changed the title Hashing out API of DFUService [WIP] DFUService Implementation Nov 16, 2020
@AGlass0fMilk
Copy link
Member Author

Need to define what happens when client attempts to write beyond the size of the underlying slot.

Perhaps this can be communicated using the status field?

In reality, this should never occur... the tools used to generate the binary will probably give errors long before any attempt to use the binary as an update.

Exceeding the address space of the slot can be considered a critical failure and cause the DFU procedure to be aborted. A new binary will have to be written in this case anyways.

@AGlass0fMilk
Copy link
Member Author

A buffer flushing operation is an internal implementation detail. To prevent the client from writing any binary stream data during a buffer flush operation, the server will set the flow control bit in the control characteristic.

@pan-
Copy link
Member

pan- commented Nov 17, 2020

@AGlass0fMilk That is excellent, thank you for starting the discussion on this service 👍 .

Version information

From a very high level view I was seeing the DFU process as follow:

DFU-high-level

It mostly matches what you are proposing at one exception, it is not possible to get information about the installed firmware. These informations are used by the client to decide if the device requires an update or not.
One option could be to use the firmware version from the Device Information Service but it doesn't cover all the cases: For example, in some application it is desirable to update the firmware of a peripheral. A clean solution (in my opinion) could be to have several DFU services one for the peripheral firmware and another for the device itself. The client should be able to access information about the firmware; this can be an optional characteristic, if not available, the client should read the current firmware version in the DIS.

Binary transfer

This could be a service of its own 😁 . I believe the size of the Binary Stream Characteristic should be equal to the size of the MTU to maximize speed. A flow control bit raised when the local device is busy or not available seems appropriate. Why did you choose to add it into the control characteristic rather than the status ?

How does the presented system deals with loss of write w/o response ? It seems that the Offset Pointer Characteristic can be used to that effect but how does it works in practice ? Is it up to the application ?

Control Characteristic

What is the benefit of having a bit set rather than specific commands such as:

  • Enable the update
  • Apply the image
  • Copy memory region (and move write pointer)

Ideally this should be extendable to support application specific control commands, selecting an update slot might be considered as such (but it is so common that it could be defined directly).

Implementation

I don't think the DFU service should be tied to Block device. I'd rather see the DFU service as forwarding events to a companion class that actually does the update. In events, I can think of:

  • Start the update requested
  • Data received
  • Cancel update requested
  • Select slot
  • Copy existing image region

Of course the class realizing the DFU should report things back into the DFU service for example report failures, report state changes, block transfer until buffer are free, etc...

@AGlass0fMilk
Copy link
Member Author

It mostly matches what you are proposing at one exception, it is not possible to get information about the installed firmware. These informations are used by the client to decide if the device requires an update or not.
One option could be to use the firmware version from the Device Information Service but it doesn't cover all the cases: For example, in some application it is desirable to update the firmware of a peripheral. A clean solution (in my opinion) could be to have several DFU services one for the peripheral firmware and another for the device itself. The client should be able to access information about the firmware; this can be an optional characteristic, if not available, the client should read the current firmware version in the DIS.

I see what you're saying. That could work. We could have an optional Firmware Revision String (0x2A26) characteristic in the DFU service. It is required if the information is not available through an existing Device Information Service elsewhere in the GATT Profile.

One unknown now:
How does the GATT Client identify which DFUService goes to which firmware? Should we have some sort of optional Firmware Description characteristic that is required if the above Firmware Revision String characteristic is available in the DFUService? What kind of value should this description store? Should it be a string? Or should we define an enumeration of firmware types (eg: "printer firmware", "mouse firmware", "accelerometer firmware", ...). I think I like the string idea better as it is more readable and flexible.

Binary transfer

This could be a service of its own 😁 .

I suppose it could be, how would you split this off from DFU? We would then be kind of specifying a partial "DFU Profile". We could potentially make the Binary Transfer Characteristic a reusable class in some way. I have a few other ideas along this line. The underlying code is very similar to my UARTService implementation and I feel there is enough commonality that the implementations could be refined into a single reusable class.

I believe the size of the Binary Stream Characteristic should be equal to the size of the MTU to maximize speed.

Agreed. That is currently what the service defaults to. It allows space for the maximum configured MTU:

/**
* Maximum length of data (in bytes) that the DFU service
* can receive at one time.
*
* Typically MTU - 3 bytes for overhead
*/
#ifndef MBED_CONF_CORDIO_DESIRED_ATT_MTU
#define BLE_DFU_SERVICE_MAX_DATA_LEN MBED_CONF_BLE_DFU_SERVICE_BUFFER_SIZE
#else
#define BLE_DFU_SERVICE_MAX_DATA_LEN (MBED_CONF_CORDIO_DESIRED_ATT_MTU - 3)
#endif

A flow control bit raised when the local device is busy or not available seems appropriate. Why did you choose to add it into the control characteristic rather than the status ?

So the control characteristic I was thinking of more like a register.

Originally, the flow control bit was its own characteristic. However, I like to try and keep the number of separate characteristics as low as reasonably possible. I've encountered issues in the past with having too many custom UUID characteristics (Old Nordic softdevice only allowed a certain number) and there is a noticeable delay incurred during service discovery as the number of characteristics goes up... of course this can be solved by using a faster connection interval.

My thoughts were:

The status characteristic is used for major events like a failure or success. There will usually only be one status update per DFU session (at the end). My thought was that this characteristic could be read by the client after a reboot to check the result of the update.

Somewhat of a tangent:

In my specific use case, I am using mcuboot to validate and install firmware updates. This allows me to reduce code size in my application by not adding the various crypto functions to perform digital signature verification and whatever else on the firmware update -- the bootloader takes care of it. This makes the status reporting more complicated because the BLE must be disconnected before the status will be available. I tried to capture this use case as well as the use case where the validation is performed by the application itself in this specification.

A failed update can be detected by checking the firmware version has been updated. If the update failed, the status characteristic will contain the most recent reason why.

How does the presented system deals with loss of write w/o response ? It seems that the Offset Pointer Characteristic can be used to that effect but how does it works in practice ? Is it up to the application ?

In my experience, write w/o response has significantly less latency when compared to a regular write (w/ response). This is mostly true for BLE stacks operating in a higher-level OS (Linux, Android, Mac, Windows, etc).

My thought is that for quick firmware updates, write w/o response can be sufficient as long as the application is okay with the occasional failed update. The application may or may not check the Offset Pointer Characteristic to ensure the data has been received and written properly.

I guess with a large enough MTU, using write w/ response can be fast as well. So perhaps we just get rid of the write w/o response option? What is your experience with this @pan- ?

Control Characteristic

What is the benefit of having a bit set rather than specific commands such as:

* Enable the update

* Apply the image

* Copy memory region (and move write pointer)

Ideally this should be extendable to support application specific control commands, selecting an update slot might be considered as such (but it is so common that it could be defined directly).

I like the idea of application-specific control commands.

Implementation

I don't think the DFU service should be tied to Block device.

Why not? I think it's a pretty beautiful and clean architecture. I'm a huge fan of the BlockDevice API.

I'd rather see the DFU service as forwarding events to a companion class that actually does the update. In events, I can think of:

* Start the update requested

* Data received

* Cancel update requested

* Select slot

* Copy existing image region

Of course the class realizing the DFU should report things back into the DFU service for example report failures, report state changes, block transfer until buffer are free, etc...

If you can expand on this use case it might help convince me not to tie this to BlockDevice 😄

@AGlass0fMilk
Copy link
Member Author

Regarding the block device conversation above: I thought about it a bit and I see why you would want to have some sort of DFU abstraction between the service and the memory which is operated on. It would be useful in the case of updating firmware on a remote device.

However, I also think this:

The BlockDevice API is well established and fairly clean. Perhaps it would be satisfactory to tie the DFUService implementation to BlockDevice. In the case where a DFU targets a remote device in the system, the developer must implement a 'RemoteBlockDevice' that forwards the calls to the remote device as if it were a local BlockDevice.

This prevents us from having to create another API for some sort of "DFU Manager" class.

Thoughts, @pan- ? Can you see a case where this is still insufficient?

@pan-
Copy link
Member

pan- commented Nov 17, 2020

Thanks for sharing your thoughts, that's it is useful for me to understand the reasoning behind design decisions!

How does the GATT Client identify which DFUService goes to which firmware?
Characteristic User Descriptors can be used to identify the firmware endpoint (local device, peripheral, ...). A string could be fine and we can let it to be defined by the application.

Here is how its done for Battery services (multiple battery service can be exposed by a single server):

When a device has more than one instance of the Battery service, each Battery Level characteristic shall include a Characteristic Presentation Format descriptor that has a namespace/description value that is unique for that instance of the Battery service.

There is not format for the namespace/description, it is up to the application. By default if a single service is on the server, the client should not care about it. (Not that even if I brought the subject, I don't think we should think too much about it as long as an option is available for improvement).

I guess with a large enough MTU, using write w/ response can be fast as well. So perhaps we just get rid of the write w/o response option? What is your experience with this @pan- ?

Regular write is not an option, the power of notification and write w/o response is that you can stack severals in a single connection event. In my latest experience, we used a rolling counter in the first byte to identify the fragment id. The server was sending a notification containing the expected fragment ID whenever an an unexpected fragment ID was received so the client can retry the transfer at the fragment indicated by the server. It worked well and the latency to detect an error was good (at most two connection events).

Why not? I think it's a pretty beautiful and clean architecture. I'm a huge fan of the BlockDevice API.

I'm thinking of testing and extendibility but at the same time, a block device implementation would solve 95% of the use cases out there 😅 . I'm probably overthinking it at this stage. If there's a need it should be easy to split BLE handling from the BD logic. There will probably be some DFU Manager like class from users if they integrate things such as image verification in the application but I really like the idea of the default use case where all of these operations are made in the bootloader and therefore not exposed in the base implementation of the DFU service!

@AGlass0fMilk
Copy link
Member Author

AGlass0fMilk commented Nov 18, 2020

Characteristic User Descriptors can be used to identify the firmware endpoint (local device, peripheral, ...). A string could be fine and we can let it to be defined by the application.

Here is how its done for Battery services (multiple battery service can be exposed by a single server):

When a device has more than one instance of the Battery service, each Battery Level characteristic shall include a Characteristic Presentation Format descriptor that has a namespace/description value that is unique for that instance of the Battery service.

There is not format for the namespace/description, it is up to the application. By default if a single service is on the server, the client should not care about it. (Not that even if I brought the subject, I don't think we should think too much about it as long as an option is available for improvement).

This seems like a good solution to me. I will add this to the final specification and add a string name parameter that must be provided when enabling the optional firmware version characteristic. I think having a unique Characteristic User Description Descriptor associated with the firmware version characteristic of a DFUService instance is sufficient. We do not need to complicate things by adding arbitrary namespace/description enums... unless you think this would be useful in some way.

I'm thinking of testing and extendibility but at the same time, a block device implementation would solve 95% of the use cases out there. I'm probably overthinking it at this stage. If there's a need it should be easy to split BLE handling from the BD logic. There will probably be some DFU Manager like class from users if they integrate things such as image verification in the application but I really like the idea of the default use case where all of these operations are made in the bootloader and therefore not exposed in the base implementation of the DFU service!

My goal of the DFUService is to simply act as a transport that covers the most use cases. Things like validation and installation can be triggered and monitored using the DFUService but need not be handled directly by the DFUService itself. After all, BLE is simply a transport for firmware updates. Adding anything more makes the service less reusable/extensible.

I think BlockDevice provides a suitable API for DFU. Most, if not all, DFU implementations aim to accomplish writing a new binary to memory as fast as possible. With the BlockDevice API, there are many possible configurations just with what Mbed OS provides, eg: writing to a RAM-based BlockDevice, joining multiple memory regions together, support for a number of underlying memory devices. If anything more advanced is needed, the user simply needs to implement the BlockDevice API for whatever device/transport they are using.

I think it's a nice abstraction for writing to memory -- which I think all DFU use cases require. @pan- I appreciate your feedback as I may not be able to see a use case where this implementation is not ideal (please let me know if you can think of them)!

If someone wants to implement a DFUService subclass that provides on-the-fly image verification, we could make the data handling function virtual so they can add whatever logic they want before calling the base class implementation (or not). eg: Override the original data handling function, add the incoming bytes to an ongoing SHA hash, then call the base class implementation to write the bytes to memory. Of course this may not be a great idea if packets are lost... SHA hashing should probably be done once the transfer is complete!

I have an immediate need for such a DFUService so I will be working on a first cut of this service as we develop the final specification for V1.

Let me know if you have any other thoughts!

@AGlass0fMilk
Copy link
Member Author

Forgot to respond to this above:

Regular write is not an option,

I figured it would be painfully slow.

the power of notification and write w/o response is that you can stack severals in a single connection event. In my latest experience, we used a rolling counter in the first byte to identify the fragment id. The server was sending a notification containing the expected fragment ID whenever an an unexpected fragment ID was received so the client can retry the transfer at the fragment indicated by the server. It worked well and the latency to detect an error was good (at most two connection events).

This is a clever idea. Maybe it would be possible to implement a similar "retry" mechanism using the offset pointer characteristic:

Perhaps we could make the offset pointer characteristic notifiable so that as the client writes data, it can be notified whenever data is successfully written. If they get out of sync, the client can return the pointer to a position that was known to be in-sync and continue writing from there... One caveat I see is that the write size of the underlying block device may not match (or be a multiple of) the MTU and so the client may not see 1-to-1 offset pointer advances each time it writes the binary stream characteristic...

I think for now I will implement your fragment ID method since it is simpler.

Side conversation @pan- :
I was going to make a separate issue for this but I figured I would just ask it here:

In my speed testing, I have been unable to successfully configure the MTU above 256 bytes. The build succeeds but all the characteristics become blank (corruption of the BLE memory?)

There is a comment in the cordio mbed_lib.json here:

        "rx-acl-buffer-size": {
            "help": "Size of the buffer holding the reassembled complete ACL packet. This will limit the effective ATT_MTU (to its value minus 4 bytes for the header). The size of the buffer must be small enough to be allocated from the existing cordio pool. If this value is increased you may need to adjust the memory pool.",
            "value": 70
        },

Highlighting this part due to extreme scrolling requirement: If this value is increased you may need to adjust the memory pool.

How does one go about adjusting the memory pool of the Cordio stack so the rx-acl-buffer-size can be increased to support an MTU of 512? This should be added to the configuration help string.

@AGlass0fMilk
Copy link
Member Author

So for our implementation here, I think we can use the existing status characteristic to accomplish the fragment ID error correction scheme you described.

If the highest bit of the 8-bit status characteristic value is set, the 7LSB will be the expected sequence ID that did not match what the client sent. The client should then restart transmission from that sequence ID. It rolls over at 127, which should be more than enough to detect errors and return the to appropriate spot in the binary.

@pan- What do you think?

@pan-
Copy link
Member

pan- commented Nov 19, 2020

Perhaps we could make the offset pointer characteristic notifiable so that as the client writes data, it can be notified whenever data is successfully written. If they get out of sync, the client can return the pointer to a position that was known to be in-sync and continue writing from there... One caveat I see is that the write size of the underlying block device may not match (or be a multiple of) the MTU and so the client may not see 1-to-1 offset pointer advances each time it writes the binary stream characteristic...

That's would be hard to get it right because the client does multiple write w/o request in a connection event and won't get an answer before the next connection event. It is much simpler to signal failures than acknowledge success.

If the highest bit of the 8-bit status characteristic value is set, the 7LSB will be the expected sequence ID that did not match what the client sent. The client should then restart transmission from that sequence ID. It rolls over at 127, which should be more than enough to detect errors and return the to appropriate spot in the binary.

Seems good.

How does one go about adjusting the memory pool of the Cordio stack so the rx-acl-buffer-size can be increased to support an MTU of 512? This should be added to the configuration help string.

Your right, the documentation should be amended, the adjustment depends on whether these buffers are rx or tx and how many do you want. The computation is here. Adjustment is (ACL config - default ACL config) * (rx count + tx count).

@AGlass0fMilk
Copy link
Member Author

I suggest we make it mandatory to have a Firmware Revision String Characteristic somewhere on the Gatt Server:

For the DFU Service, it is optional to have the FW Rev Characteristic if and only if the following are all true:

  1. There is only one instance of the DFU Service on the Gatt server and
  2. The appropriate Firmware Revision String Characteristic is available within an existing Device Information Service instance on the Gatt Server

If there is only one Firmware Revision String Characteristic available on the server, the Characteristic User Description descriptor becomes optional.

@pan-
Copy link
Member

pan- commented Nov 24, 2020

@AGlass0fMilk I must find time to review this service in depth but it looks very good. What I want to personally try is transfer performances and write on the internal flash to see how it affect BLE.

What do you think of the service controlling the connection parameters ? Reducing the connection interval should increase transfer speed while increasing slave latency could work around the problem that arise when the internal flash is written (block the MCU, a connection event might be missed).

The service could also negotiate the 2MPhy to be used as well as the MTU size. I see these QoL improvements being guard behind defines so it doesn't increase the footprint if it's not required.

What would be the starting point if we want to help you with this ? How do you test the service today ?


Side note: A service that allows a client to make the server send a connection parameter update request would be a useful addition, mobile platforms generally accepts new connection parameters but do not have options to tune it.

@AGlass0fMilk
Copy link
Member Author

@pan- Writing this as you commented 😁

I have created a very rough test script and Mbed application to test throughput. You can see the working repo here: https://github.com/AGlass0fMilk/mbed-ble-dfu-testing

Currently, the throughput isn't super great but it is acceptable given the poor BLE support of most desktop operating systems.

My laptop's chipset supports BLE 5 but the software apparently doesn't. So my testing is done with the default 1M Phy.

I am on Ubuntu 20.04 and using the python library bleak for cross-platform BLE. The BLE backend of bleak on Linux is through dbus, which I think is the bottleneck at this point. Inter-write delays have to be carefully tuned to avoid having dbus silently drop packets.

With my current setup, transferring 384 bytes at a time (MTU is set to 512 but I didn't feel like tuning the delays to work properly), I was able to reliably transfer a 512kB random binary (created with: dd if=/dev/urandom of=update.bin count=1024) to the QSPI on board an nRF52840DK in about 116 seconds. That's about 4.5 kB/s, which is pretty slow. Definitely can improve this number using a different client OS. Thankfully, 512kB is a reasonable update binary size for many targets.

It might even be worth it to make an Mbed DFU Client that simply copies a binary from on-board flash...

Ultimately it would be nice to make open source clients for the major platforms (Mac/Windows/Linux, iOS, Android)

Some timely reading that was just published by Punch Through: https://punchthrough.com/ble-throughput-part-4/

I read part 3 and was wondering how throughput could be improved by the 2M phy!

@AGlass0fMilk
Copy link
Member Author

@pan- Regarding how your team can help:

The DFUService code itself is okay, I haven't put much thought into mutexes and how to offload flash writes from the BLE process. Ideally it would be compatible with baremetal but it may complicate things if writing to flash can block BLE event processing for a long time.

That being said, I think the next steps would be:
1.) Create a test suite for the DFUService to test corner cases, out-of-order sequence numbers, flow control, etc.
2.) Create throughput testing
3.) Create clients for various operating systems (we need to prioritize which)

I think as part of the automated testing development (1) we can add a test case for throughput to accomplish (2) with minimal extra code.

(3) would come after the DFUService has been fully tested. The bleak client I have should work on Windows and Mac (probably will need a couple modifications) but I have not tested this.

The way I have been testing with the client and server applications mentioned above is:

  1. Build and flash the mbed-ble-dfu-testing app to an nRF52840DK
  2. Generate a random update binary of desired size: dd if=/dev/urandom of=update.bin count=1024 (size = 512 * count)
  3. Run the ble_dfu.py script
    1. I have been developing the script with python 3.9, so ideally you should install that and create a virtualenv
    2. Install bleak: pip install bleak
    3. Execute the script: python ble_dfu.py TestDFU update.bin -v
  4. Wait for binary to be transferred completely. Script will disconnect at the end of the process.
  5. Compare the transferred binary on the QSPI to the original
    • For this, I use nrfjprog to copy the QSPI to an intel hex file... unfortunately it copies the whole 8MB so it takes a while.
    • The entire command I use is: nrfjprog -f nrf52 --readqspi qspi2.hex && arm-none-eabi-objcopy -I ihex qspi2.hex -O binary qspi2.bin && dd if=qspi2.bin of=qspi2-512kb.bin count=1024 && cmp update.bin qspi2-512kb.bin
    • The above command copies the QSPI to an intel hex file, then converts it to a binary, then copies the first 512kB, then uses cmp to check it matches the original update.bin byte-for-byte.

@pan- Let me know what your thoughts are or if you need any more information about my testing process.

I have some projects that need BLE DFU now so I will be working with this "beta" version for now. I can help with the automated/throughput testing as well.

@pan-
Copy link
Member

pan- commented Nov 24, 2020

Thanks for sharing how you test the DFU service, that is really helpful for anyone that wants to be involved!

The BLE backend of bleak on Linux is through dbus, which I think is the bottleneck at this point. Inter-write delays have to be carefully tuned to avoid having dbus silently drop packets.

Performances also depends on the controller used, in our testing some had terrible performances outside of DBUS latency. In the end we were transferring block of data that to not roll over the fragment counter. Blocks were acknowledged with an indication upon full reception (there was some mechanism to set up the size of the transfer too).

I have some projects that need BLE DFU now so I will be working with this "beta" version for now. I can help with the automated/throughput testing as well.

I think it is the best to go with something and iterate but it would be better if services and characteristics are settled before you go in production 😅 .

1.) Create a test suite for the DFUService to test corner cases, out-of-order sequence numbers, flow control, etc.

Unit test usually shine in these area and it is easy to setup. Eventually a compilation of both unit test and integration test would be the best.

I will try for myself the code and test this week to provide more feedback.

Thanks George.


I realized that there is no way in mbed-os to enable the data length extension from the server. There is no calls to DmConnSetDataLen anywhere in the C++ API. That will be addressed by the connectivity team.

@AGlass0fMilk
Copy link
Member Author

My below response is related to your previous comments:

What do you think of the service controlling the connection parameters ? Reducing the connection interval should increase transfer speed while increasing slave latency could work around the problem that arise when the internal flash is written (block the MCU, a connection event might be missed).

The service could also negotiate the 2MPhy to be used as well as the MTU size. I see these QoL improvements being guard behind defines so it doesn't increase the footprint if it's not required.

I'm not sure if the service should control these things as it starts introducing many dependencies and centralizing a lot of functionality all in one place. Perhaps a better way to handle this would be through documentation and guides that detail best practices for DFU throughput and reliability. There must be an example application for this service once it is done and we should document any considerations when setting up connection parameters, etc.

Perhaps as an extension this could be added.

Side note: A service that allows a client to make the server send a connection parameter update request would be a useful addition, mobile platforms generally accepts new connection parameters but do not have options to tune it.

I agree, see below comments.

Performances also depends on the controller used, in our testing some had terrible performances outside of DBUS latency. In the end we were transferring block of data that to not roll over the fragment counter. Blocks were acknowledged with an indication upon full reception (there was some mechanism to set up the size of the transfer too).

I like to take advantage of Bluetooth's existing control mechanisms as much as possible. I think negotiating the MTU is acceptable for this.

However, there seem to be many customizations that are useful in some cases. I would like to design an extensible way to add these features in the future (or have users add their own) while still having some base compatibility. This would let us develop basic, maybe even universal, DFU clients. A client can try to enable an extension and the server can reject the write request if it does not support it.

@pan- What do you think the best way to design this into the DFUService would be? We could extend the control flags characteristic to 32-bits (or more) and reserve some for future use by the DFUService spec and some for use by application.

Incrementally we could add features like delta mode, fragment size negotiation, initiating a peripheral-negotiated connection parameter update requests, etc.

Perhaps we could instead have optional characteristics that provide these extensions and define these as we revise the DFUService specification? This would be the most "Bluetoothy" (if you will) way to do it in my opinion.

I think it is the best to go with something and iterate but it would be better if services and characteristics are settled before you go in production sweat_smile .

No problem, the projects I am talking about are prototypes/proof-of-concepts at this stage.

1.) Create a test suite for the DFUService to test corner cases, out-of-order sequence numbers, flow control, etc.

Unit test usually shine in these area and it is easy to setup. Eventually a compilation of both unit test and integration test would be the best.

@pan- Could you reply with an initial test plan that you're thinking of? I'd like some input and then I can add my thoughts and start the implementation.

I'm thinking we will go with what you suggested in another issue I raised -- use the BLE test suite by itself without using greentea and all that... ultimately simpler I suppose.

If you're feeling like it, you can start a boilerplate test app 😄

I will probably start working on this after the long holiday weekend here in the US. 🦃

I realized that there is no way in mbed-os to enable the data length extension from the server. There is no calls to DmConnSetDataLen anywhere in the C++ API. That will be addressed by the connectivity team.

I'm sure we will uncover more functions that are currently unavailable in the C++ API while going through this! It will be a good way to test (and subsequently expand) the capabilities of the current Cordio BLE API 😄

@AGlass0fMilk
Copy link
Member Author

I have been working with this a bit more this week and have performed updates using it with mcuboot. A few thoughts:

  1. Application extensions are not specified -- the user may create a non-standard DFU Service implementation by adding additional custom characteristics to the DFU Service.
  2. Delta mode should be removed from the standard control bits.
  3. Instead, we can implement our own "official" application extension characteristic. This can be used to specify Mbed-OS-specific DFU features such as delta mode, requesting a connection reconfiguration for throughput optimization, etc.
  4. @pan-, I see what you mean now about de-coupling the BlockDevice from this implementation. This would eliminate the problem of separating BLE event processing from processing queued binary data. Perhaps we can create some sort of DFUHandler class? This would let us use some design patterns (such as a decorator maybe) to encapsulate DFU extensions such as delta mode, etc. What do you think?
  5. I'm thinking we rename this to FOTAService to avoid confusion with Nordic's implementation 😄 (plus it's more fun to say)

@AGlass0fMilk
Copy link
Member Author

I also think we should introduce a read only bit in the control characteristic that tells the client whether a reboot (and thus a disconnection) is required to perform a DFU commit (ie: must reboot so bootloader can install/validate the update candidate).

* Maximum number of slots available
*/
#ifndef MBED_CONF_BLE_DFU_SERVICE_MAX_SLOTS
#define MBED_CONF_BLE_DFU_SERVICE_MAX_SLOTS 3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config like that could be part of the mbed_lib.json which would provide the default values


protected:

DFUService& _dfu_svc;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't appear that storing this is needed for anything

* @note If the application does not explicitly reject the control request,
* the request will be accepted by default.
*/
void on_dfu_control_request(mbed::Callback<GattAuthCallbackReply_t(ControlChange&)> cb) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to standardise on one callback model in BLE - the EventHandler model, I think it would work just as well in this case and have the benefit of uniformity with others.

/**
* Schedule a serialized call to process_buffer on the event queue.
*
* @note this function has no effect if a write has already been scheduled

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should return an error then, otherwise it's a programming error and should be at least an assert

_dfu_control = 0;
_status = 0;
_flush_bin_buf = false;
_scheduled_write = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's not 0 you should cancel the event

_scheduled_write = 0;
return;
}
uint8_t *temp_buf = new uint8_t[write_size];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the solution is not to use a circular buffer and just use a normal one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of making a sort of BlockDevice decorator class with a built-in buffer (BufferedBlockDevice is already taken 😖). This way, the data can be moved from the BLE stack buffers into the BlockDevice buffer and we can eliminate the need to allocate a separate buffer when writing the data.

@pan- The major decision I'm facing with this is: writing to flash can take a long time -- possibly long enough to cause timeouts for BLE. I want to come up with a way that allows the user to decide whether to process the buffered flash writes in separate thread or in the same context as the BLE (for baremetal users).

In my testing, I tried to process flash writes in the BLE thread but it seemed to cause disconnections/timeouts. Though maybe this was related to other issues, as realistically flash writes shouldn't take that long, probably less than the number of milliseconds required to even miss a connection event.

My idea for enabling both the multi-threaded and baremetal use cases is (assuming event queues are available in baremetal) to let the user provide the event queue on which to process flash writes. This way, they can pass in the existing event queue used for BLE events as well or a separate EventQueue that is dispatched on another thread.

Perhaps we also eliminate writing multiples of the BlockDevice's program size and simply queue writes of the program size. The drawback here is it will take more calls to empty the flash buffer (ie: longer overall write times). However, each write event will be shorter and prevent the flash writes from blocking the BLE stack for too long.

We could potentially make this configurable as well if there's any real performance difference between the two methods.

tr_err("programming memory error: %d", result);
set_status(DFU_STATE_FLASH_ERROR);
}
_current_offset += write_size;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it failed to program?

}

_scheduled_write = 0;
schedule_write();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? you just drained it

int result = slot->program(temp_buf, _current_offset, write_size);
if(result) {
tr_err("programming memory error: %d", result);
set_status(DFU_STATE_FLASH_ERROR);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we continue even with this error?


/**
* TODO is there a better way to do this? Without dynamic allocation, we would
* have to program byte-by-byte. This would likely be a significant hit in speed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can just do it in a loop with the write size set to program size to avoid code duplication

@paul-szczepanek-arm
Copy link
Member

I have some security concerns here - I think you need to put checks on every data written over the air that allows writing to arbitrary memory locations

@AGlass0fMilk
Copy link
Member Author

I have some security concerns here - I think you need to put checks on every data written over the air that allows writing to arbitrary memory locations

The intent is that the application can define the security requirements. The application is responsible for defining the BlockDevice and thus the memory region that is allowed to be written over the air. We can also make a subclass that enables BLE security requirements like encryption, MITM, and signing.

I didn't want to encumber the base implementation of the DFUService with these things because in some cases it may only be used for development.

Putting the onus of security on the application makes this service more modular. Our (future) example projects can mention security best practices and show how to implement them.

Your other review comments are valid though -- there are definitely pointer bugs as it stands now. The actual memory-writing code needs a major rewrite.

This commit introduces a subclass of GattAttribute that implements the Bluetooth SIG defined descriptor, Characteristic User Description Descriptor defined in Bluetooth Core Specification 5.2, Volume 3, Part G, Section 3.3.3.2

It allows the user to provide a string that will be used as the descriptor's value.
@AGlass0fMilk
Copy link
Member Author

Closing as this has been reimplemented as the FOTAService. Work will be continued there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants