Skip to content
This repository was archived by the owner on Feb 20, 2019. It is now read-only.

[Design] New Pickling Generators

Josh Suereth edited this page Jul 28, 2015 · 2 revisions

Currently, the algorithm to generate picklers is optimised for very efficient runtime pickling. However, we've discovered some areas of unsafety with the interaction of the current IR classes and the scala compiler, specifically there are times where the scala compiler will elide symbol information that WOULD be available by runtime reflection. In these instances, incorrect picklers are generated.

We'd like to enhance the the algorithms used to generate picklers with a nuanced approach, with goals of:

  1. Correctness - Any algorithm should be accurate. In this vein, we should be able to correctly utilize existing conventions, such as JPA annotations, java.io.Serializable, case class etc.
  2. Efficiency Any algorithm should retain as much effiency as possible, ideally operating as fast as hand-rolled code would. The preference for statically generated code, rather than runtime reflection, should account for most speed up.
  3. Completeness - Pickling should be a drop in replacement for serialization, similar to java.io.Serializable or Kryo. This means pickling should attempt to handle ALL possible scenarios of objects needing to be serialized, and only force the user to hand-roll code when they desire. This does not mean we do not encourage hand-written picklers, only that we'd like the "out of the box" picklers to be solid and something you can rely on.
  4. Runtime configuration - There are some mechanisms that may not be available on all platforms, e.g. the use of sun.misc.Unsafe may be the most efficient means of interacting with a class, but would not be available in all environments.

New Approach

The new approach will divide the Pickler/Unpickler generation into several layers/algorithms. Some of these algorithms may only be used during runtime, while others can only be used for static compilation. Additionally, we will split the generated picklers into several encodings for how to actually serializer/reify classes at runtime.

Types of serializable objects

Case classes.

These are a Scala language construct. Most case classes should be serializable as is.

Serializable classes.

These are a Java langauge construct. Serializable denotes certain mechanisms are available to serialize a class safely. For pickling, we can additional add the following assumptions:

  1. If there is no writeObject/readObject method, we can assume "simple" serialization, and mimick it in our generated picklers.
  2. If we cannot unify the constructor with all serializable fields, we should be able to shove the field values into the object via reflection/Unsafe.

JPA annotations

These are a Java extension. These annotations describe how to take a class and serialize it into a database. However, we should also be able to use these same annotation to serialize classes out to a pickler format.

Singletons

These are a Java idiom. We can try to infer the existence of a static singleton instance and create a pickler which simple grabs the singleton instance (not serializing anything).

Unknown Objects

Most runtime serialization frameworks also take a crack at serializing unknown objects (or unannotated). These are objects that don't fall into any special class, or have user serialization annotations. These objects CAN be serialized, but it can also be unsafe to do so. Right now, the existing pickling implementation handles these quite well at runtime, and not so well at compile time.

Pickler implementations

Here is a set of pickler implementations, in order of preference, we'd like to use. These represent what code in a Pickler/Unpickler would look like.

Direct constructor method calls.

This implementation involves trying to directly call the constructor of the class.

Applicable

  • Scala Case classes
  • JPA annotated classes (if methods are not private)
  • Classes fully defined by constructor (if we can detect it)

sun.misc.Unsafe

This implementation involves the use of sun.misc.Unsafe to look up field layouts for classes. When pickling, we rip the underlying primitives directly out of the class. When instantiating, we construct instances and shove date into memory locations directly.

Applicable

  • Not running in Android (no Unsafe)
  • java.io.Serializable (without fancy writeObject/readObject)
  • Can use on any object, but not 100% safe.

Java reflection

This implementation involves calling the constructor of a class via java reflection, and setting values

Applicable

  • JPA classes (Where not all fields/methods are public or constructor cannot be unified)
  • java.io.Serializable (without fancy writeObject/readObject)
  • Can set private method calls to public (i.e. not with some security managers).
  • Can use on any object, but not 100% safe.

Caveats / TODOs

  • Unsafe vs. Java reflection should actually be a RUNTIME decision (if possible)
  • We should outline what "constructor unification" is, in my vernacular.

Examples

Here are a set of examples, and how pickler should operate>

Example 1 - Case class

case class Foo(x: int)
  • Pickling should always be able to generate a Foo pickler/unpickler if there is an implicit Int pickler available.
  • The pickler/unpickler should be generated with the direct method (i.e. just calls constructor directly)

Example 2 - Singleton object

object Bar { .. }
  • Pickling should issue a warning that singletons are not serialized across JVMs, but instead just referenced.
  • Pickling should always be able to generate a Bar pickler
  • The pickler/unpickler should be generated with the direct method.

Example 3 - Simple Serializable Object

class Foo(private var bar: Int) extends java.io.Serializable
  • Pickling should issue a debug/warning message that it needs to use Reflection to implement this pickler.
  • Pickling should always be able to generate the Foo pickler IF an Int pickler is available implicitly
  • Pickling should encode a pickling/unpickling strategy that can be customized at runtime to either use reflection or unsafe algorithms.

Example 4 - Complicated Serializable Object

class Foo(private var bar: Int) extends java.io.Serializable {
  override protected def writeObject(o: ObjectOuputStream): Unit = {
    ...
  }
}

Note: Complicated => means that either writeObject or readObject is implemented.

  • Pickling should issue a warning that this serializable class has custom write/read methods. This can be configured to be a always error, via import pickling.static.simpleSerializableOnly.
  • Short Term - Pickling can fallback to runtime pickler generation. If import pickling.static.staticOnly is available, instead we issue an error.
  • Long Term - Read the bytecode of read/write object and generate the pickler to mimic the behavior, only interacting with a PReader/PWriter instead of an ObjectStream.

Example Java JPA classes

class Foo {
   public Foo() {}
   @Id   private int id;
   @Field private String name;
}
  • Pickling should issue a debug/warning message that it needs to use Reflection to implement this pickler.
  • Pickling should always be able to generate the Foo pickler IF an Int pickler and a String pickler is available implicitly
  • Pickling should encode a pickling/unpickling strategy that can be customized at runtime to either use reflection or unsafe algorithms.