Skip to content

BLE API devirtualization #9727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 1, 2019
Merged

BLE API devirtualization #9727

merged 26 commits into from
Mar 1, 2019

Conversation

pan-
Copy link
Member

@pan- pan- commented Feb 14, 2019

Description

This submission replace runtime polymorphism by compile time polymorphism in BLE main APIs, generic and pal layer.

Removing virtual functions allows the linker to remove functions that are not used by application code. Here are some numbers for the BLE layer on NRF52840 compiled with GCC in release mode.

Application Original Devirtualized Gain
BLE_BatteryLevel 106447 78307 28140
BLE_Beacon 105975 75369 30606
BLE_Button 106443 78303 28140
BLE_GAP 107112 79456 27656
BLE_GAPButton 106673 76067 30606
BLE_GattClient 106523 81433 25090
BLE_GattServer 106089 77433 28656
BLE_HeartRate 106477 78335 28142
BLE_LED 106443 77491 28952
BLE_LEDBlinker 105800 81524 24276
BLE_PeriodicAdvertising 106794 81774 25020
BLE_Privacy 106162 83000 23162
BLE_SM 106202 83364 22838
BLE_Thermometer 106513 78373 28140
TestApplication 109151 101335 7816

Even in our test application that uses all functions defined by the BLE API, we save around 7% of code.

More importantly this construct is the foundation for efficient compile time configuration (With both optimisation, the beacon example shrink to ~30K of code).

Principle

Compile time polymorphism is achieved with the Curiously Recurring Template Pattern (CRTP). In a nutshell, every interface definition is now a class template that accepts its implementation as parameter. Interface APIs call the (hidden) implementation defined in the derived class. If the function is not defined in the derived class then it fallbacks to a stub defined in the interface.

template <class Impl>
struct Interface {
    void foo() {
        // call to the implementation of foo
        static_cast<Impl*>(this)->foo_();
    }

protected:
    // convention is to use the suffix _ to name an implementation 
    void foo_() {
        printf("foo() default implementation");
    }
};

class Implementation : public Interface<Implementation> {
protected:
    void foo_() {
        printf("foo() derived implementation");
    }
};


class Stub : public Interface<Stub> { };

Implementation imp; 
Stub stub; 

imp.foo(); // call Implementation::foo_
stub.foo(); // call Interface::foo_

Backward compatibility

User facing API prototypes are now templates defined in the namespace ble::interface:

  • ::Gap -> ble::interface::LegacyGap
  • ::ble::Gap -> ble::interface::Gap
  • ::GattClient -> ble::interface::GattClient
  • ::GattServer -> ble::interface::GattServer
  • ::SecurityManager -> ble::interface::SecurityManager

It is expected that an implementation of ble forward the final implementation types in a header named BleImplementationForward.h and in the namespace ble::impl:

  • ble::impl::LegacyGap -> ble::interface::LegacyGap<LegacyGapImpl>
  • ble::impl::Gap -> ble::interface::Gap<GapImpl>
  • ble::impl::GattClient -> ble::interface::GattClient<GattClientImpl>
  • ble::impl::GattServer -> ble::interface::GattServer<GattServerImpl>
  • ble::impl::SecurityManager -> ble::interface::SecurityManager<SecurityManagerImpl>

These final types are then imported into the global namespace to expose types similar to the one we've been using for years. Note that to not leak information from the implementation class it is important to export the interface and not the implementation:

namespace ble {
namespace impl {

// Correct
typedef ble::interface::SecurityManager<SecurityManagerImpl> SecurityManager;

// Incorrect
//typedef SecurityManagerImpl SecurityManager;

} // impl  
} // ble 
Instantiation

To prevent expensive instantiation of template types (and horrible inclusion pattern for the generic layer ... ), function body of the template interfaces are not defined in headers files. Instead these are defined in .tpp files. It is expected from an implementation to instantiate the types it export in a cpp file:

#include "ble/GattClient.h"
#include "source/GattClient.tpp"
#include "GattClientImplementation.h" // include vendor implementation. 

// instantiate GattClient 
template class ble::interface::GattClient<GattClientImplementation>;
Out of scope

The following items would improve this submission but are out of scope for now:

  • Devirtualization of ::BLEInstanceBase.
  • Extract non dependent types in interfaces.
TODO
  • Squash history

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm @paul-szczepanek-arm @donatieng

@pan-
Copy link
Member Author

pan- commented Feb 14, 2019

@mattbrown015 You might be interested by this.

@ciarmcom
Copy link
Member

@pan-, thank you for your changes.
@donatieng @kjbracey-arm @paul-szczepanek-arm @ARMmbed/mbed-os-pan @ARMmbed/mbed-os-tools @ARMmbed/mbed-os-maintainers please review.

Copy link
Contributor

@bridadan bridadan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boy, that was an easy tools review!

Be sure to add this to the list of supported file types in the docs (you can follow my similar pr here: ARMmbed/mbed-os-5-docs#952).

@@ -413,6 +413,7 @@ def add_directory(
".h": FileType.HEADER,
".hh": FileType.HEADER,
".hpp": FileType.HEADER,
".tpp": FileType.HEADER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember a day when header files ended with a .h, and we liked it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you ? Headers of the C++ library, a 35 years old language, don't have extensions. In C, X macros and friends have been traditionally put in files that don't end with .h too.
If the .h extension as not been standardised at the language level it is because other extensions predate standardisation and these use cases are legitimate.

Copy link
Contributor

@kjbracey kjbracey Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter-pedant alert, and primed for counter-counter-pedantry.

Technically, the C++ standard doesn't say anything about file extensions. The standard headers are included via #include <atomic> or whatever in the C++ source, but they could perfectly well be stored on disc as atomic.hh or iso_atomic.hpp or just built in to the compiler. Mechanism could be totally separate to whatever the #include "xxx" does for user headers. (Similarly #include "myheader.h" might not actually find the file myheader.h. On RISC OS it would find h.myheader, with . being the directory separator).

I actually find it a bit annoying that vendors chose to drop suffixes from the C++ std filenames. Having no extension at all is quite unhelpful for most other tools.

Anyway, no strong feelings about the separate prefix here. I also prefer to have special prefixes for things that you only want #included with proper care, but it's up to tools folks to object. This one's a bit borderline, because nothing would actually break if randomly included, right? Would just slow down the build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan- There's PR to add .inc files, wouldnt this extension cover the use case for .tpp ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's a bit borderline, because nothing would actually break if randomly included, right?

@kjbracey-arm There's no header guards .tpp files at the moment. With header guards nothing would break.

@0xc0170 I choose tpp as an extension because it is more descriptive than h or hpp (and avoid clashes with actual header files). Any extension can be use. What is expected in inc files ?

Copy link
Contributor

@0xc0170 0xc0170 Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we documented this - only extensions and purpose, without "what should be there".

inc are used currently for some autogenerated headers in TFM code.

It might be better if this extension addition is sent via separate PR where we can discuss this and alos update docs (see #9715)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a tools perspective, I have no preference on the extensions. We can adapt to what you need.

The .inc extension is being added because the TF-M work is importing an external project into Mbed OS and the external project use .inc files. So we're simplifying the import process for that team.

Copy link
Contributor

@sg- sg- Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does *.tpp work with other builders ie: MDK, DS-5, IAR, etc... or will this break our exporters? How is wrapping this in pragma once or ifdef... not good enough to keep the code base consistent? Apologies if this was already considered, the thread is quite long and I didn't read it all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried for all exporters (but I can!). In this case, *.tpp files embed template class implementations and these are instantiated only once in a cpp file dedicated to the instantiation of the implementation type that contains all the required implementations.

These files can't be named .cpp as these are not compilation unit and cannot used the sole suffix .h as there already is a header containing the declaration so we would end up with two headers with the same name.

Suffix for template implementation really is just a convention as long as it respects the two requirements mentioned above. I started out with the suffix .impl.h and am happy to change the suffix to whatever is decided.


Note that it would be possible to include everything in header files (I tried) at the expense of weird include patterns, longer compilation time and random breakage requiring reordering if things get moved around. So instead I tried to keep template implementation files consistent with the way we write source implementation files (.cpp) beside the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pan- it would be good to confirm the exporters are working as well as extend the tests for exporters to exercise the file type support to match what the os is capable of building. @bridadan

typedef generic::GenericSecurityManager<
pal::vendor::cordio::CordioSecurityManager,
vendor::cordio::SigningEventMonitor
> GenericSecurityManagerImpl;
Copy link
Member

@paul-szczepanek-arm paul-szczepanek-arm Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why both namespace "impl" and "Impl" suffix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some information in the description; basically there is implementation classes and implementation interfaces. In that case GenericSecurityManagerImpl is the implementation class. The interface imported in the global namespace will be ble::interface::SecurityManager<GenericSecurityManagerImpl>. We don't want to import GenericSecurityManagerImpl as it requires visibility of GenericSecurityManagerImpl and may export thing differently than ble::interface::SecurityManager (visibility of function can change as an example).

@paul-szczepanek-arm
Copy link
Member

paul-szczepanek-arm commented Feb 15, 2019

I can tell that the next request will be for release notes and SPDX identifiers

SecurityManager &sm = createBLEInstance()->getSecurityManager();
ble_error_t status = sm.getLinkEncryption(connection_handle, &encryption);
if (status == BLE_ERROR_NONE &&
(encryption == link_encryption_t::ENCRYPTED ||
encryption == link_encryption_t::ENCRYPTED_WITH_MITM ||
encryption == link_encryption_t::ENCRYPTED_WITH_SC_AND_MITM)
) {
cmd = GattClient::GATT_OP_WRITE_CMD;
cmd = Base::GATT_OP_WRITE_CMD;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someone still hasn't configured their new IDE ;)

bool bondable,
bool mitm,
SecurityIOCapabilities_t iocaps,
const Passkey_t passkey,
const uint8_t* passkey,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve of the general concept. Makes sense as an optimisation on the assumption that you want to optimise for the "1 BLE implementation in the system" case.

Would presumably be a pessimisation with two implementations, but anyone putting two implementations in one image in these sorts of systems probably needs to rethink their ROM-saving approach. (Or am I wrong?)

Here are some numbers for the BLE layer on NRF52840 compiled with GCC in release mode.

This is a great chance to do some performance comparison - I'd love to see the same numbers for all the other toolchains. I believe GCC (and ARMC6?) stand to gain the most by this, as they can't eliminate unused virtuals. Presumably IAR and ARMC5 gain much less, as they should be eliminating unused virtuals.

I believe the same CRTP approach may be worthwhile on a smaller scale for a bunch of our HAL driver classes, with their virtual void lock() functions. Probably not many images are trying to simultaneously use SPI and UnlockedSPI : public SPI, and they certainly don't need run-time polymorphism.

(Although some key things do need it, like FileHandle or Socket, so we should still be looking at how to get compilers to optimise virtuals when actually required - eg spotting that UARTSerial is the only FileHandle in the build as LTO?).

@pan-
Copy link
Member Author

pan- commented Feb 15, 2019

Would presumably be a pessimisation with two implementations, but anyone putting two implementations in one image in these sorts of systems probably needs to rethink their ROM-saving approach. (Or am I wrong?)

If the two ports are not using the generic layer (and pal layer) then there is virtually no pessimisation as BLE interfaces are (almost) pure interfaces. If the two ports use the pal layer then it is possible to reuse the generic layer with a virtual pal without changing existing code:

// note: simplified for demonstration purposes 
template<class Pal>
struct GenericGattClient : ble::interface::GattClient<GenericGattClient<Pal> > { 
    GenericGattClient(Pal* pal)  { }
};  

template<class Impl> 
struct PalGattClient { 
    void foo() { 
         static_cast<Impl*>(this)->foo_();
    }
};

struct PalGattClientFoo : PalGattClient<PalGattClientFoo> { 
    void foo_() { }
};

struct PalGattClientBar : PalGattClient<PalGattClientBar> { 
    void foo_() { }
};

struct RuntimePalGattClient : PalGattClient<RuntimePalGattClient> { 
    virtual void foo_() = 0;
};

template<class Impl>
struct RuntimePalClientBridge : RuntimePalGattClient {
    RuntimePalClientBridge(Impl* impl) : _impl(impl) { }

    virtual void foo_() { 
        _impl->foo_();
    }

private:
    Impl* _impl;
};

namespace ble { 
namespace impl { 

typedef GenericGattClient<RuntimePalGattClient> GenericGattClientImpl;
typedef ble::interface::GattClient<GenericGattClientImpl> GattClient;

}
}

typedef ble::impl::GattClient GattClient;


class BleInstanceBaseFoo { 

GattClient &client() { 
    static PalGattClientFoo pal;
    static RuntimePalClientBridge<PalGattClientFoo> bridge(&pal);
    static ble::impl::GenericGattClientImpl generic(&bridge);
    return generic;
}

};

class BleInstanceBaseBar { 

GattClient &client() { 
    static PalGattClientBar pal;
    static RuntimePalClientBridge<PalGattClientBar> bridge(&pal);
    static ble::impl::GenericGattClientImpl generic(&bridge);
    return generic;
}

};

Note that in practice, two BLE radio are faaaar from being common (Unlike WiFi, I never seen any in real world product for BLE) and if you can afford two identical radio you can usually afford a device with a beefier flash.

Presumably IAR and ARMC5 gain much less, as they should be eliminating unused virtuals.

I haven't tried yet (but I will shortly 😄 ), however in my first try the implementation wasn't instantiated separately and the average gain from full inlining was ~4K. Effect of LTO where also reduced to noise (~1% gain).

@pan-
Copy link
Member Author

pan- commented Feb 15, 2019

@kjbracey-arm So I made some benchmark with the various compiler we use. Note that unlike the previous numbers exposed I removed the cordio stack as it is a fixed cost and unrelated to this PR.

Compiler Original (no stack) Devirtualized (no stack) Gain
GCC 55447 27307 28140
ARMC5 28926 25931 2995
IAR 28814 26347 2467
ARMC6 57897 28332 29565
IAR8 27510 25099 2411

There's several things to notice in this number:

  • Gains with ARMC5 and IAR are small but still about 10%. ARMC5 and IAR linker are able to rewrite vtables and remove unused virtual functions.
  • ARMC6 and GCC benefits the most from this optimisation: indeed their linker is not able to rewrite vtable and remove unused virtual functions.
  • The optimisation stabilise code size across compilers.

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 18, 2019

@pan- This PR is targeting 5.12?

@pan-
Copy link
Member Author

pan- commented Feb 18, 2019

@0xc0170 I think that would be the best.

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 20, 2019

This needs a rebase now

@pan-
Copy link
Member Author

pan- commented Feb 20, 2019

@0xc0170 I believe it will get (partially) squashed as well.

@pan-
Copy link
Member Author

pan- commented Feb 25, 2019

@0xc0170 I made a rebase and cleaned up the history as well.

@bulislaw
Copy link
Member

This PR is at risk of missing 5.12 release as it's marked as "needs: work". Code freeze is coming! On Friday 1st. Please made necessary updates ASAP and make sure the reviewers are aligned for prompt code inspection.

@cmonr
Copy link
Contributor

cmonr commented Feb 26, 2019

@0xc0170 I fixed the failing test. Would it be possible to relaunch tests ?

Not until IAR8 is in.

@cmonr
Copy link
Contributor

cmonr commented Feb 26, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Feb 27, 2019

Test run: SUCCESS

Summary: 12 of 12 test jobs passed
Build number : 2
Build artifacts

@cmonr
Copy link
Contributor

cmonr commented Feb 27, 2019

CI passed.

Reviewers, please review:
@0xc0170 @donatieng @kjbracey-arm @paul-szczepanek-arm

@0xc0170 0xc0170 requested a review from bulislaw February 27, 2019 09:52
Copy link
Contributor

@donatieng donatieng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good - very excited about seeing this one coming in. I think we should discuss how we could apply the same solution to other areas of the OS.

@paul-szczepanek-arm paul-szczepanek-arm mentioned this pull request Feb 27, 2019
@0xc0170
Copy link
Contributor

0xc0170 commented Feb 28, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Feb 28, 2019

Test run: FAILED

Summary: 1 of 13 test jobs failed
Build number : 3
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_exporter

@alekla01
Copy link
Contributor

Ci restarted (iar8 exporter issue we will resolve separately).

@cmonr cmonr merged commit aaf3ce4 into ARMmbed:master Mar 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.