Split type objects #4903

jepler · 2021-06-19T20:32:00Z

The principle is that the "less frequently used" fields of a type object are placed in a flexible array member; this flexible array either has 0 (hopefully the usual case) or 1 member. The new flag indicates whether the flexible portion of the structure exists.

Fields that reside there (plus 2 that I decided NOT to move after starting) are now hidden behind accessors in most cases; where the verification had already been performed somewhere else (you could tell because there was no NULL-pointer checking), it is accessed directly via the flexible member ext[0].

I selected an arbitrary split based on estimates of how frequently each field was used... [I had a grep, but I lost it]

This gives back 1784 bytes of firmware size on trinket_m0.

The unix port also builds and passes basic, but many other builds will have problems because their type objects need to be converted.

We would need a plan for how ulab could be compatible with both CP and MP since it both needs to declare types with the extended fields, and also query an optional field.

I know @dhalbert has suggested a change like this, but I couldn't find the issue to tie it to. Credit goes to him for the concept.

jepler · 2021-06-20T01:29:42Z

Here's how I estimated which fields were used most. The sed takes the ranges of lines within an mp_obj_type_t definition, and the awk prints the first field of each. sort | uniq -c counts unique instances of each one, then sort -n puts them in numeric order.

Here I've manually removed the lines which are just noise, so your local results will differ.

$ git grep -l 'mp_obj_type_t .*= {'  | xargs sed -n '/mp_obj_type_t.* = {/,/;/p' | awk '{print $1}' | sort | uniq -c | sort -n
      4 .base
      7 .parent
     11 .buffer_p
     17 .call
     17 .flags
     23 .binary_op
     24 .attr
     24 .subscr
     32 .protocol
     34 .iternext
     49 .unary_op
     54 .getiter
     79 .print
    147 .make_new
    170 .locals_dict
    227 .name

This also doesn't count exception types or other types declared via macros; dan had noted that a LOT of types are exceptions.

It's possible that setting the split differently could improve space a little bit. I left attr in the basic struct so that the attribute reference path was not complicated by additional checks; I think my rationale around .parent was similar, though .parent is probably not as performance-critical.

jepler · 2021-06-20T15:42:17Z

OK, here's one way to work towards compatibility:

Change the declaration of an extended slot to look like

 const mp_obj_type_t mp_type_bytes = {
     { &mp_type_type },
     .flags = MP_TYPE_FLAG_FULL,
     .name = MP_QSTR_bytes,
     .print = str_print,
     .make_new = bytes_make_new,
     .locals_dict = (mp_obj_dict_t *)&str8_locals_dict,
     .ext[0].binary_op = mp_obj_str_binary_op,
     .ext[0].subscr = bytes_subscr,
     .ext[0].getiter = mp_obj_new_bytes_iterator,
     .ext[0].buffer_p = { .get_buffer = mp_obj_str_get_buffer },
 };

which is just the same to the compiler

Now, introduce a macro for filling the slot:
```
#define BINARY_OP_SLOT(x) .ext[0].binary_op = x
```
An implementation that doesn't want/have split types can just #define BINARY_OP_SLOT(x) binary_op = x instead
MP_TYPE_FLAG_FULL of course would be defined to expand to 0 when the implementation doesn't support split types
Of course, in this case the accessors would be provided as inlines or macros that just return the field unconditionally

In this case, the namedtuple way of defining the extended fields would still be different ... So the other alternative would be to provide a new type mp_obj_fulltype_t which has all the fields directly in them, and when there is no split they would simply be the same types via typedef. Maybe that's preferable. (you still need the flag bit of course, to get RTTI-lite). Maybe mp_obj_shorttype_t and MP_TYPE_FLAG_SHORT would be better for allowing piecemeal migration.

I also think we probably don't want to carry this as a delta from micropython if we can avoid it, so working on adding this upstream first might be a better idea. This code can still serve as a good proof of concept, though.

dhalbert · 2021-06-21T12:23:56Z

I know @dhalbert has suggested a change like this, but I couldn't find the issue to tie it to. Credit goes to him for the concept.

@tannewt has been talking for years about shrinking these type objects. I'm not sure I actually ever mentioned the technique you are using. In days of old I saw complicated union-like types when passing arguments with varying parameters, and I mentioned that (lotsa flag bits to indicate what was being passed) but not the split technique with pointers you have adopted.

I misunderstood, it's not pointers, just an optional part. I think I did suggest that orally at one point. 😄

tannewt · 2021-06-21T23:16:57Z

@dpgeorge @jimmo any thoughts?

dpgeorge · 2021-06-21T23:30:20Z

Yeah, nice! I had this idea for a long time as well, and is part of the reason for adding the .flags member to mp_obj_type_t.

I wanted to do a full analysis of how frequently certain slots were used, and then write some code to search for the optimal separation into "standard" and "full" types. I also thought about having multiple flags, or even a count, to say how many extended slots there were. Eg sort the slots on how frequently they are used and then the count says how many are supported, roughly something like this:

#define BINARY_OP_THRESHOLD (4)
#define mp_type_get_binary_op(t) ((t->flags & 0xff) > BINARY_OP_THRESHOLD ? t->ext->binary_op : NULL)

jepler · 2021-06-23T02:39:10Z

To support fully variable (not just 2 sizes), we can set aside some bits in type->flags as suggested. It would be nice if we could

allow builds that always make all type objects full size (binary size vs performance trade-off)
have some freedom to re-order fields, or even move them between the mandatory and optional parts of the type object (with optional→mandatory being easy and →optional requiring more work)

Anyhow, I think the below design does this.

One critical part of this is a good way to find the right thing to put in flags. If there were a "compile-time variadic max macro", that would be great, you'd just write .flags = MAGIC_MAX(MP_TYPE_FIELD1, MP_TYPE_FIELD5, MP_TYPE_FIELD2). Or we could provide MAX2 .. MAXN macros where N was the total number of variadic fields.

The next best thing I found was to use __builtin_clz() which seems to be a compile-time constant all the way back to the oldest gcc on godbolt.org, 4.1.2. So then you just let .flags = 32-__builtin_clz(MP_BIT_FIELD1 | MP_BIT_FIELD5 | MP_BIT_FIELD2) (and .flags gets 5 since the first (highest) set bit is 5). There's a slight complication when the bits are all 0, but this can be hidden in a macro such as MP_TYPE_FLAG_EXT. This also breaks down when you pass 32 optional fields, but that's nonsense.

The end of mp_obj_type_t becomes a flexible array void *ext[];.

Which, I think, gets you to a declaration that looks like this, assuming the last 4 fields are among the optional ones, and the rest are always present:

 const mp_obj_type_t mp_type_bytes = {
     { &mp_type_type },
     .flags = MP_TYPE_FLAG_EXT(MP_BIT_BINARY_OP | MP_BIT_SUBSCR | MP_BIT_GETITER | MP_BIT_BUFFER_P),
     .name = MP_QSTR_bytes,
     .print = str_print,
     .make_new = bytes_make_new,
     .locals_dict = (mp_obj_dict_t *)&str8_locals_dict,
     MP_FIELD_BINARY_OP = mp_obj_str_binary_op,
     MP_FIELD_SUBSCR = bytes_subscr,
     MP_FIELD_GETITER = mp_obj_new_bytes_iterator,
     MP_FIELD_GETBUFFER = mp_obj_str_get_buffer,
 };

One thing that is unfortunately missing is automatic checking between the MP_FIELD and MP_BITs. I haven't run across any cunning ways to check that. (something that goes around parsing elf files? ugh, well, you could probably do that)

I think there are probably better names for things but I wanted to get these ideas written down without worrying too much about the perfect names.

jimmo · 2021-06-23T03:22:48Z

Very cool, and good to know that clz can be resolved at compile time on const inputs. There's a significant win here for ROM so keen to investigate, and would be great to ensure that CPy and MPy take the same approach.

Not sure if useful, but another approach I had considered for fully variable is to add an indirection via an index table (i.e. all types would pay for a fixed N-byte table, but then only for the slots they use, where N ~= 12). Sketch below:

#define SLOT_MAKE_NEW (0)
...
#define SLOT_BINARY_OP (4)
...
#define SLOT_MAX_SLOT   (8)

typedef struct {
  // the regular stuff (base, flags, name, etc)
  uint8_t slot_idx[MAX_SLOT];
  void* slots[];
} obj_type_t;

#define TYPE_MAKE_NEW(t) ((mp_make_new_fun_t)t->slots[0])
#define TYPE_BINARY_OP(t) ((mp_binary_op_fun_t)(t->slot_idx[SLOT_BINARY_OP] == 0 ? NULL : t->slots[t->slot_idx[SLOT_BINARY_OP]]))

const obj_type_t my_obj = {
  .name = ... etc,
  .slot_idx = { [SLOT_MAKE_NEW] = 0, [SLOT_BINARY_OP] = 1},
  .slots = { &my_type_make_new, &my_type_binary_op }
};

There's a bit of a subtle trick there where slot zero must be make_new in order to make the comparison to zero work (but almost all types have a make_new). Alternatively you could always require that slot zero is NULL, in which case no comparison is required and all the accessors become simply:

#define TYPE_FOO_OP(t) ((mp_foo_op_fun_t)(t->slots[t->slot_idx[SLOT_FOO_OP]]))

I don't have a good feel for how this will compare performance-wise with the bitwise ops to extract the relevant info out of flags. My suspicion is that just finding a good cutoff point between "short" and "full" is probably sufficient.

dpgeorge · 2021-06-23T09:23:14Z

It could be easiest to decide on a small (say 5-10) set of allowed variations of mp_obj_type_t and then provide macros to initialise types. Eg:

// Only with print, make_new, and a locals dict.
MP_DEFINE_TYPE_V1(
    mp_type_simple,
    {& my_type_type },
    MP_QSTR_simple,
    simple_print,
    simple_make_new,
    (mp_obj_dict_t *)&simple_locals_dict,
);

// Lots but not all slots filled.
MP_DEFINE_TYPE_V4(
    mp_type_bytes,
    { &mp_type_type },
    MP_QSTR_bytes,
    str_print,
    bytes_make_new,
    (mp_obj_dict_t *)&str8_locals_dict,
    NULL, // unary_op not implemented for this type
    mp_obj_str_binary_op,
    bytes_subscr,
    mp_obj_new_bytes_iterator,
    mp_obj_str_get_buffer
);

tannewt · 2021-06-23T17:13:33Z

My preference would be to have as few options as possible to reduce complexity. clz is a neat trick but I'd like it to be simple to describe how it works. Simple is also easier to implement.

dhalbert · 2021-07-02T12:07:23Z

@jepler How are you feeling about the imminence of some version of this? Even a simple version frees up a lot of flash we could use at the moment.

jepler · 2021-07-03T14:58:32Z

This (& my calculator project) are my main targets for next week!

jepler · 2021-07-07T00:16:10Z

There's a difference in the level of support for static initialization of flexible array members in gcc vs clang.

typedef struct flex {
   int i;
   struct {
       int a, b, c;
   } j[];
} flex_t;

flex_t f1 = {.i = 3};
flex_t f2 = {.i = 4, .j[0].a = 1, };
flex_t f3 = {.i = 4, .j = {{.a = 1,}}, };

Styles f1 and f3 are supported in gcc and clang; style f2 is supported in gcc but not in clang. Therefore, we must select style f3 when extended fields are being initialized.

dpgeorge · 2021-07-07T01:44:18Z

There's a difference in the level of support for static initialization of flexible array members in gcc vs clang.

Will this work on older compilers (eg older gcc, v4.8)? If it doesn't then we'll need a compile-time option to disable this feature.

jepler · 2021-07-07T13:19:27Z

https://godbolt.org/z/h8coM6MPv

This is set up to test it on current gcc and clang, as well as gcc 4.1.2, the oldest available. Not in that link but I also pushed clang back to 3.0.0, the oldest available, and it accepted the trick too.

The "f3" style being the only one supported means I only see how to enable a single static split, none of the more sophisticated things we conjectured about.

By comparing the address of the initial 'name' field instead of the addresses of the objects themselves, a small amount of type safety is added back, vs just casting to void. In the event that some other kind of object is passed in as 't', which happens to have a 'name' field of the right type, the construct would be (undesirably) accepted but it would almost certainly evaluate to false at runtime.

This appears to work with clang versions at least since 3.0.

.. These files can be created if the developer runs "./build.sh" or "./build-cp.sh" inside extmod/ulab.

v923z · 2021-07-07T19:01:12Z

We would need a plan for how ulab could be compatible with both CP and MP since it both needs to declare types with the extended fields, and also query an optional field.

I think it would be really great, if this proposal ended up in micropython proper. However, I just want to re-iterate my position with regards to ulab: if micropython needs more time, this fact shouldn't block development on the circuitpython side, and I am more than happy to accommodate temporary patches, if by doing so we can guarantee compatibility over the whole spectrum.

jepler · 2021-07-07T20:33:41Z

Our ultimate plan will be to offer this upstream to micropython!

jepler · 2021-07-12T12:15:35Z

Is there an intention to add a compile-time config macro for this feature, so that ports that have the space can opt out of this to improve performance?

In CircuitPython, I think we decided to always use this size savings. However, the preprocessor CAN disable this feature, with macros something like

#if !defined(MP_TYPE_FLAG_EXTENDED)
#define MP_TYPE_CALL call
#define mp_type_get_call_slot(t) t->call
#define MP_TYPE_FLAG_EXTENDED (0)
#define MP_TYPE_EXTENDED_FIELDS(...) __VA_ARGS__
#endif

which will be used in ulab for compatibility with old circuitpython, micropython, and circuitpython with this change.

jepler · 2021-07-12T12:25:15Z

Size savings by board, sorted by savings:

The smallest savings is 1340 bytes, on fomu. The biggest change of 3472 bytes was on electronut_labs_blip. The average is 2554 and the median is 2792.

jepler · 2021-07-12T13:44:46Z

Here's my own stab at a script to analyze slot usage: https://gist.github.com/jepler/5cb392d43c16845645b49296c27e0850
.. it works on an elf file

The resulting spreadsheet for trinket_m0:
https://docs.google.com/spreadsheets/d/1zcEAkoOfHaMzsCcB9Z1x8MyClU_jt_xDHfFMPpE-rbs/edit?usp=sharing

A further iteration tried all the combinations and found that my split was best, again for trinket_m0:
https://gist.github.com/d480e10a509c0af8de69bdb247df9f4a

Furthermore, the data savings was estimated at 1952 bytes, vs a measured flash usage change of 1784 bytes so code growth must have been on the order of 168 bytes.

dpgeorge · 2021-07-12T13:53:38Z

Using @jimmo 's spreadsheet I found a slightly better split (based on PYBv1.1) that puts the protocol slot in the base set. That saves an extra 92 bytes (on PYBv1.1 at least).

Using a 3-way split (base + ext1 + ext2) of the following form:

base = (print, make_new, attr, parent, locals_dict)
ext1 = (call, unary_op, getiter, iternext, protocol)
ext2 = (binary_op, subscr, buffer_p)

saves an extra 672 bytes compared to @jepler 's scheme (on PYBv1.1).

A 4-way split can do a little better again, but it's diminishing returns (saves 108 bytes more).

jepler · 2021-07-12T14:36:34Z

To re-cap some estimated size savings, now shifting to pybv11 (I think all these figures were from pyb types):

est. savings, exclusive of code size increase:

scheme	savings
perfect	5216
jimmo-index-table	3284
damien-4tier	3284
damien-3tier	3176
damien-2tier	2596
jepler-2tier	2504

jimmo · 2021-07-12T14:49:33Z

I did some experiments on the generated code for the different approaches.

I have three branches that add an extra slot to mp_obj_type_t in three different ways:

https://github.com/jimmo/micropython/tree/obj-slots-original -- this just adds a regular slot (baseline)
https://github.com/jimmo/micropython/tree/obj-slots-split -- adds a split type (but the extended type only has this new slot)
https://github.com/jimmo/micropython/tree/obj-slots-index -- index

They all add a void mp_do_extra(mp_obj_t t); to runtime.h that invokes the function in this slot, and defines a function in this slot on list.

Here's the baseline disassembly:

0802c748 <mp_do_extra>:
 802c748:       b510            push    {r4, lr}
 802c74a:       f000 fd5d       bl      802d208 <mp_obj_get_type>
 802c74e:       6bc3            ldr     r3, [r0, #60]   ; 0x3c
 802c750:       b113            cbz     r3, 802c758 <mp_do_extra+0x10>
 802c752:       e8bd 4010       ldmia.w sp!, {r4, lr}
 802c756:       4718            bx      r3
 802c758:       bd10            pop     {r4, pc}

And for the split (note extra function to extract the field)

0802c748 <mp_do_extra>:
 802c748:       b510            push    {r4, lr}
 802c74a:       f000 fd5d       bl      802d208 <mp_obj_get_type>
 802c74e:       f001 f876       bl      802d83e <mp_type_getextra>
 802c752:       b110            cbz     r0, 802c75a <mp_do_extra+0x12>
 802c754:       e8bd 4010       ldmia.w sp!, {r4, lr}
 802c758:       4700            bx      r0
 802c75a:       bd10            pop     {r4, pc}

0802d83e <mp_type_getextra>:
 802d83e:       8883            ldrh    r3, [r0, #4]
 802d840:       f013 0380       ands.w  r3, r3, #128    ; 0x80
 802d844:       bf14            ite     ne
 802d846:       6bc0            ldrne   r0, [r0, #60]   ; 0x3c
 802d848:       4618            moveq   r0, r3
 802d84a:       4770            bx      lr

And for the index:

0802c748 <mp_do_extra>:
 802c748:       b510            push    {r4, lr}
 802c74a:       f000 fd61       bl      802d210 <mp_obj_get_type>
 802c74e:       f890 303d       ldrb.w  r3, [r0, #61]   ; 0x3d
 802c752:       b12b            cbz     r3, 802c760 <mp_do_extra+0x18>
 802c754:       3310            adds    r3, #16
 802c756:       e8bd 4010       ldmia.w sp!, {r4, lr}
 802c75a:       f850 3023       ldr.w   r3, [r0, r3, lsl #2]
 802c75e:       4718            bx      r3
 802c760:       bd10            pop     {r4, pc}

The compiler does a very nice job of that! It also gets an added benefit for the fast path that it doesn't have to do the second check (if the slot index is zero, it doesn't have to check if the slot itself is NULL). (Edit: updated the disassembly, there was an error in mp_do_extra).

jepler · 2021-07-12T14:55:20Z

As long as there are at most 15 optional slots, can you use bitfields:4 instead of uint8_t for the slot indices?

jimmo · 2021-07-12T14:58:20Z

As long as there are at most 15 optional slots, can you use bitfields:4 instead of uint8_t for the slot indices?

I think so! I do wonder if you end up doing a lot more work to extract the bits out for a relatively small saving, but I will test it out.

jepler · 2021-07-12T14:59:35Z

Using @jimmo 's spreadsheet I found a slightly better split (based on PYBv1.1) that puts the protocol slot in the base set. That saves an extra 92 bytes (on PYBv1.1 at least).

This is different than trinket_m0; I think we use protocols less heavily than you.

Is there a better way for "3 tier" to look than

STATIC const mp_obj_type_t uctypes_struct_type = {
    { &mp_type_type },
    .flags = MP_TYPE_FLAG_EXTENDED2 /* (implies EXTENDED1) */ ,
    .name = MP_QSTR_struct,
    .print = uctypes_struct_print,
    .make_new = uctypes_struct_make_new,
    .attr = uctypes_struct_attr,
    MP_TYPE_EXTENDED_FIELDS1(
        .subscr = uctypes_struct_subscr,
        .unary_op = uctypes_struct_unary_op,
        MP_TYPE_EXTENDED_FIELDS2(
            .buffer_p = { .get_buffer = uctypes_get_buffer },
        ))
};

jimmo · 2021-07-12T15:09:50Z

As long as there are at most 15 optional slots, can you use bitfields:4 instead of uint8_t for the slot indices?

I think so! I do wonder if you end up doing a lot more work to extract the bits out for a relatively small saving, but I will test it out.

Here's the disassembly for this approach (see code at https://github.com/jimmo/micropython/tree/obj-slots-index-bits)

0802c748 <mp_do_extra>:
 802c748:       b510            push    {r4, lr}
 802c74a:       f000 fd65       bl      802d218 <mp_obj_get_type>
 802c74e:       f890 303c       ldrb.w  r3, [r0, #60]   ; 0x3c
 802c752:       f013 0ff0       tst.w   r3, #240        ; 0xf0
 802c756:       d008            beq.n   802c76a <mp_do_extra+0x22>
 802c758:       f343 1303       sbfx    r3, r3, #4, #4
 802c75c:       b25b            sxtb    r3, r3
 802c75e:       3310            adds    r3, #16
 802c760:       e8bd 4010       ldmia.w sp!, {r4, lr}
 802c764:       f850 3023       ldr.w   r3, [r0, r3, lsl #2]
 802c768:       4718            bx      r3
 802c76a:       bd10            pop     {r4, pc}

Note there was an error in the disassembly I posted for the index approach in the earlier comment. I have updated that comment and the branch.

jimmo · 2021-07-12T15:22:35Z

Here's the corresponding disassembly for M0 (rp2040)

Baseline:

100178e4 <mp_do_extra>:
100178e4:       b510            push    {r4, lr}
100178e6:       f7f5 f99f       bl      1000cc28 <mp_obj_get_type>
100178ea:       6bc3            ldr     r3, [r0, #60]   ; 0x3c
100178ec:       2b00            cmp     r3, #0
100178ee:       d000            beq.n   100178f2 <mp_do_extra+0xe>
100178f0:       4798            blx     r3
100178f2:       bd10            pop     {r4, pc}

Split

100178f4 <mp_do_extra>:
100178f4:       b510            push    {r4, lr}
100178f6:       f7f5 f997       bl      1000cc28 <mp_obj_get_type>
100178fa:       f7f5 fcb9       bl      1000d270 <mp_type_getextra>
100178fe:       2800            cmp     r0, #0
10017900:       d000            beq.n   10017904 <mp_do_extra+0x10>
10017902:       4780            blx     r0
10017904:       bd10            pop     {r4, pc}

1000d270 <mp_type_getextra>:
1000d270:       8881            ldrh    r1, [r0, #4]
1000d272:       0003            movs    r3, r0
1000d274:       2280            movs    r2, #128        ; 0x80
1000d276:       0008            movs    r0, r1
1000d278:       4010            ands    r0, r2
1000d27a:       4211            tst     r1, r2
1000d27c:       d000            beq.n   1000d280 <mp_type_getextra+0x10>
1000d27e:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
1000d280:       4770            bx      lr

Index

100178e4 <mp_do_extra>:
100178e4:       b510            push    {r4, lr}
100178e6:       f7f5 f99f       bl      1000cc28 <mp_obj_get_type>
100178ea:       0003            movs    r3, r0
100178ec:       333d            adds    r3, #61 ; 0x3d
100178ee:       781b            ldrb    r3, [r3, #0]
100178f0:       2b00            cmp     r3, #0
100178f2:       d003            beq.n   100178fc <mp_do_extra+0x18>
100178f4:       3310            adds    r3, #16
100178f6:       009b            lsls    r3, r3, #2
100178f8:       581b            ldr     r3, [r3, r0]
100178fa:       4798            blx     r3
100178fc:       bd10            pop     {r4, pc}

Index-with-bitfield

100178ec <mp_do_extra>:
100178ec:       b510            push    {r4, lr}
100178ee:       f7f5 f99b       bl      1000cc28 <mp_obj_get_type>
100178f2:       0003            movs    r3, r0
100178f4:       333c            adds    r3, #60 ; 0x3c
100178f6:       781b            ldrb    r3, [r3, #0]
100178f8:       220f            movs    r2, #15
100178fa:       0019            movs    r1, r3
100178fc:       4391            bics    r1, r2
100178fe:       d005            beq.n   1001790c <mp_do_extra+0x20>
10017900:       b25b            sxtb    r3, r3
10017902:       111b            asrs    r3, r3, #4
10017904:       3310            adds    r3, #16
10017906:       009b            lsls    r3, r3, #2
10017908:       581b            ldr     r3, [r3, r0]
1001790a:       4798            blx     r3
1001790c:       bd10            pop     {r4, pc}

jepler · 2021-07-12T15:31:54Z

Code size, per access of type object field (@jimmo's code, counted manually by me)

style	m4	m0
baseline	18	16
2-tier	34	36
uint8_t index table	26	26
bitfield index table	36	34

jepler · 2021-07-12T16:27:39Z

Looking again at 3-tier, based on trinket_m0 types, I found the best saved 2152 bytes, or an additional 200 bytes compared to 1952 with the two-tier system.

My optimal 3-tier was:

tier	fields
base	(base flags name) print make_new attr parent
first	getiter iternext protocol locals_dict
second	call unary_op binary_op subscr buffer_p

code (runs for quite awhile) & output (just the few lines at top and bottom) in gist: https://gist.github.com/254a06c86fa956bcc3cd87fb0940ce97

The code's a bit of a mess so I am not super-confident in the results.

Another iteration of the script computed these statistics & estimates:

# types 108
# filled slots 364
Slot print#0 filled 63 times
Slot make_new#1 filled 72 times
Slot call#2 filled 10 times
Slot unary_op#3 filled 25 times
Slot binary_op#4 filled 14 times
Slot attr#5 filled 37 times
Slot subscr#6 filled 11 times
Slot getiter#7 filled 30 times
Slot iternext#8 filled 18 times
Slot bufer_p#9 filled 5 times
Slot protocol#10 filled 8 times
Slot parent#11 filled 30 times
Slot locals_dict#12 filled 41 times
optimum 2320
slotted 2752

considering that the size of the best 2-tier system was 4156 and the original is 6480, jimmo's slotting idea looks real tasty for our most constrained boards.

jepler · 2021-07-12T19:56:58Z

@tannewt @dhalbert I did the suggested re-namings. As discussed in the meetings I'd like to merge this now into CP knowing we may have to re-do it if upstream does even more/better than we did here, but it needs a fresh review.

tannewt

Thank you for doing this! I hope we don't fill the space up too quickly. :-)

jepler · 2021-07-13T00:06:01Z

@dpgeorge @jimmo really appreciate your feedback on this, let me know if I can pitch in on the implementation in micropython.

jimmo · 2021-07-13T01:51:51Z

FWIW here's some (rough) timing data on the different approaches. This is milliseconds, for 1000000 iterations of calling mp_do_extra with the code in those branches (but with the print removed from list_extra). Calling it for three different types (testing the "extended, has the slot" (list), "extended, doesn't have the slot" (dict) and "not-extended" (str) paths).

(str was a bit of a bad choice in hindsight, because it uses a different path in mp_obj_get_type).

Baseline:

str: 214
dict: 203
list: 202

Split

str: 262
dict: 257
list: 256

Index:

str: 214
dict: 208
list: 226

Index (bits)

str: 214
dict: 208
list: 250

At some stage I will implement the index approach for MicroPython and run the full performance test suite.

jepler added 5 commits July 6, 2021 09:15

Introduce, use getters for fields which will be optional in types

cd119ca

WIP

df56ba2

milestone: coverage make test_full passes

0d75606

milestone: a selection of builds succeed

d37f8a1

incorporate ulab changes

f287832

jepler mentioned this pull request Jul 7, 2021

support circuitpython-style split type objects v923z/micropython-ulab#418

Merged

jepler added 7 commits July 7, 2021 08:31

Fix EXTENDED_FIELDS macro to work with clang

44a3da5

This appears to work with clang versions at least since 3.0.

Move & comment the field access macros

26ae1c7

Fix type checking against struct time

21e2dfa

Fix esp32 build errors

3174837

Exclude files inside ulab from translation

66155ff

.. These files can be created if the developer runs "./build.sh" or "./build-cp.sh" inside extmod/ulab.

make translate

a19a48f

jepler force-pushed the split-type-objects branch from 51781b8 to a19a48f Compare July 7, 2021 20:50

jepler added 5 commits July 12, 2021 07:03

Rename mp_type_subscr -> mp_type_get_subscr_slot

0efdf4a

Rename mp_type_getiter -> mp_type_get_getiter_slot

9f6132b

Rename mp_type_iternext -> mp_type_get_iternext_slot

cdd879a

Rename mp_type_getbuffer -> mp_type_get_getbuffer_slot

1b2da0c

Rename mp_type_protocol -> mp_type_get_protocol_slot

75e995f

jepler added 2 commits July 12, 2021 07:30

Rename mp_type_parent -> mp_type_get_parent_slot

88434c5

Rename mp_type_attr -> mp_type_get_attr_slot

93c6d16

update ulab

6d0ed1a

tannewt approved these changes Jul 12, 2021

View reviewed changes

tannewt merged commit 9fdecac into adafruit:main Jul 12, 2021

jepler mentioned this pull request Jul 14, 2021

Flatiter v923z/micropython-ulab#422

Merged

jimmo mentioned this pull request Jul 15, 2021

py/obj.h: Add optimised mp_obj_type_t representations. micropython/micropython#7542

Closed

jepler deleted the split-type-objects branch November 3, 2021 21:10

jimmo mentioned this pull request Jun 24, 2022

py/obj.h: Optimise mp_obj_type_t storage (v2) micropython/micropython#8813

Closed

Split type objects #4903

Split type objects #4903

Uh oh!

Conversation

jepler commented Jun 19, 2021

Uh oh!

jepler commented Jun 20, 2021

Uh oh!

jepler commented Jun 20, 2021

Uh oh!

dhalbert commented Jun 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannewt commented Jun 21, 2021

Uh oh!

dpgeorge commented Jun 21, 2021

Uh oh!

jepler commented Jun 23, 2021 • edited by tannewt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimmo commented Jun 23, 2021

Uh oh!

dpgeorge commented Jun 23, 2021

Uh oh!

tannewt commented Jun 23, 2021

Uh oh!

dhalbert commented Jul 2, 2021

Uh oh!

jepler commented Jul 3, 2021

Uh oh!

jepler commented Jul 7, 2021

Uh oh!

dpgeorge commented Jul 7, 2021

Uh oh!

jepler commented Jul 7, 2021

Uh oh!

v923z commented Jul 7, 2021

Uh oh!

jepler commented Jul 7, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

dpgeorge commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimmo commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

jimmo commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

jimmo commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimmo commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

jepler commented Jul 12, 2021

Uh oh!

tannewt left a comment

Choose a reason for hiding this comment

Uh oh!

jepler commented Jul 13, 2021

Uh oh!

jimmo commented Jul 13, 2021

Uh oh!

Uh oh!

dhalbert commented Jun 21, 2021 •

edited

Loading

jepler commented Jun 23, 2021 •

edited by tannewt

Loading

jepler commented Jul 12, 2021 •

edited

Loading

jimmo commented Jul 12, 2021 •

edited

Loading

jimmo commented Jul 12, 2021 •

edited

Loading