Skip to content
yfakariya edited this page Aug 9, 2015 · 2 revisions

As of 0.6.0, MessagePack for CLI supports polymorphism for object members and collection items. Actually, there are 2 kinds of polymorphism, one is the "Known subtypes based polymorphism" and another is "Runtime type based polymorphism."

Usecases

There are serveral usecases to use polymorphism.

  1. Serialize polimorphic collection (issue #58). Sometimes you want to serialize hetero genious collection items.
  2. Serialize 'rich' domain model which has own data and logic (issue #47). You can deserialize objects and invoke their virtual methods.

Choosing a Kind of Polymorphism

Known Subtypes Based Polymorphism

  • Pros ** Easy to interop. Known Subtypes Based Polymorphism uses simple format, so you can easily implement counterpart system. ** Naturally secure. You can control possible instance types via custom attribute, so there are few chance to inject malicious code except you also download untrusted assembly.
  • Cons ** You must continuously maintain known subtype list(s). ** All types must be known at compilation time.

Runtime Type Based Polymorphism

  • Pros ** Easy to use and maintain. You just put custom attribute to the member, and it will work fine in the future. ** You don't have to know possible subtypes at compilation time.
  • Cons ** It uses native .NET type identifier based format, so it is hart to keep interoperability because other systems must interpret the information and translate them to their own type system requirement. ** You cannot control possible subtypes, it might hurt stability of your application. ** If deserializing binary will be come from external environment, the binary may contain malicious type information. Atackers can specify special type(s) which has default constructor which causes significant side effects like file/registry manipulation etc.

Usage

You can specify some members(fields/properties) are polymorphic by marking them with custom attribute like following:

// Known subtypes based polymorphism.
[MessagePackKnownType( 0, typeof( FileInfo ) )]
[MessagePackKnownType( 1, typeof( DirectoryInfo ) )]
public FileSystemInfo Info { get; set; }

// Runtime type based polymorphism.
[MessagePackRuntimeType]
public object Data { get; set; }

As you imagine, you cannot mix multiple polymorphism custom attribute to the member.

You can also specify polymorphism to collections themselvs, each collection items, each dictionary keys/values, and each Tuple items. This table shows valid combination and meanings of the attributes:

|Kind|Attribute|Target|Note| |||::|| |Know Subtype Based|MessagePackKnownTypeAttribute|Noncollection objects or Collections them selves|| ||MessagePackKnownCollectionItemTypeAttribute|Collection items or Dictionary values|For example, items of List<object> typed property value.| ||MessagePackKnownDictionaryKeyTypeAttribute|Dictionary keys|For example, keys of Dictionary<object, object> typed property value.| ||MessagePackKnownTupleItemTypeAttribute|An item of tuples|1st argument specifies 1 based item number (n of ItemN property).| |Runtime Type Based|MessagePackRuntimeTypeAttribute|Noncollection objects or Collections them selves|| ||MessagePackRuntimeCollectionItemTypeAttribute|Collection items or Dictionary values|For example, items of List<object> typed property value.| ||MessagePackRuntimeDictionaryKeyTypeAttribute|Dictionary keys|For example, keys of Dictionary<object, object> typed property value.| ||MessagePackRuntimeTupleItemTypeAttribute|An item of tuples|1st argument specifies 1 based item number (n of ItemN property).|

As you see, you can specify both of collections themselves are polymorphic and their keys/items are polymorphic for collection typed (that is, the type implements IEnumerable, but not IDictionary and not sealed) or dictionary typed members. In addition, you can specify polymorphic to tuple item(s). Note that System.Tuple are sealed, so you cannot specify as that the Tuple typed member itslef is polymorphic.

For the remainder, there are default behavior of collection and System.Object typed members.

  • System.Object means boxed MessagePackObject when the member is not marked with above polymorphic attribute(s).
  • Deserialized abstract collection typed member value is determined by SerializationContext.DefaultCollectionTypes registration. Defaults are List<T> and Dictionary<TKey, TValue>.

Polymorphism Internals

This section discusses about type information format to develop interoperable implementation.

Basic Design

  • Objects' type information will be serialized together with their values.
  • The type information and values are serialized within single array.
  • The type information consists of their data.
  • The type information itself will be encoded in an array.

Known Subtype Based Polymorphism Type Information Format

It will be encoded as simple 2 elements array.

[<StringTypeCode>, <Data>]

In above figure, "StringTypeCode" is type code string specified in the custom attributes. It will be encoded as MessagePack str(raw) format. It should be encoded as compact as possible. "Data" is serialized object values and its form will be array or map.

Runtime Type Based Polymorphism Type Information Format

It will be encoded as 2 elements array.

[<EncodedNETType>, <Data>]

In above figure, "Data" is serialized object values and its form will be array or map. The "EncodedNETType" is 6 element array formatted structured data and it is equivelant to .NET type name with assembly qualified name. This table shows contents of the structured type information and mapping between type qualified name and the structured data:

|Index|Type|Content| |::|::|| |0|integer|Format ID. Only 1 is valid. Discussed later.| |1|str(raw)|Compressed type full name. Discussed later.| |2|str(raw)|Assembly's simple name.| |3|array|Assembly's version with 4 element int array.| |4|str(raw)|Assembly's culture name. nil for neutral assembly.| |5|bin(raw)|Assembly's public key token. nil for null.|

Note that the Format ID 1 means this format uses "Compressed Format". This format compresses the type name. Because many type owns the prefix as namespace, ant the prefix often matches its declaring assembly simple name, we can save space with omit the duplicated substring. The format replaces such prefix with '.'. For example, the type which has "TheCompany.TheProduct.TheComponent.TheLayer.TheType, TheCompany.TheProduct.TheComponent, Version=1.2.3.4, Culture=neutral, PublicKeyToken=null", then the result type information logically should be following:

[1, ".TheLayer.TheType", "TheCompany.TheProduct.TheComponent", [1, 2, 3, 4], nil, nil]

The physical format looks like following:

0x96 0x01 0xB12E5468654C617965722E54686554797065 0xD922546865436F6D70616E792E54686550726F647563742E546865436F6D706F6E656E74 0x94 0x01 0x02 0x03 0x04 0xC0 0xC0

It is 63 bytes binary instead of 142 bytes UTF-8 encoded string.

Clone this wiki locally