-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Welcome to the tarantool-avro wiki!
What’s the fastest way to validate MsgPack data against a schema? Most schema languages have at least this feature in common: for each known key in an object a schema prescribes the value’s type. A straightforward implementation would consider every key/value pair in the input. For each key it will look up the corresponding schema node and then it will act depending on the node’s type.
This approach resembles an interpretor. With dynamic languages one can easily generate code in runtime. What if we generate a code that validates input against the particular schema? We are getting rid of the schema interpretor’s overheads plus we are more likely to benefit from JIT compilation, if the language has it.
In Tarantool, a NoSQL database with LuaJIT built in, we were exploring this route for the past three months. We validate JSON data against Apacha Avro™ schema. Actually, our needs go beyond validation. We need transparent data conversion from one schema revision to another. And to reduce storage footprint, we drop some bits that are recoverable using a schema, like key names in objects (aka flatten
). Switching from a schema interpretor carefully written in C/C++ to a generated code in Lua resulted in an impressive 4X speedup.