-
Notifications
You must be signed in to change notification settings - Fork 114
Presentation: Jackson Performance
Although Jackson JSON Processor is fast out-of-the-box, with default settings and common usage patterns, there are ways to make it process things even faster.
This presentation looks at couple of things you can use that can make a big difference in performance, for cases where every last drop of CPU power matters.
(note: this section is inspired by Jackson Performance: best practices Uncyclo page at FasterXML Jackson Uncyclo
There are some basic ground rules to follow, to ensure that Jackson processes things at optimal level. These are things that you should "do anyway", even if you do not have actual performance problems: think of them as an interpretation of the "Boy Scout Rule" ("Always leave the campground cleaner than you found it"). Note that guidelines are shown in loosely decreasing order of importance.
- Reuse heavy-weight objects:
ObjectMapper
(data-binding) and `JsonFactory (streaming API)
- To a lesser degree, you may also want to reuse
ObjectReader
andObjectWriter
instances -- this is just some icing on the cake, but they are fully thread-safe and reusable
- Close things that need to be closed:
JsonParser
,JsonGenerator
- This helps reuse underlying things such as symbol tables, reusable input/output buffers
- Nothing to close for
ObjectMapper
- Use "unrefined" (least processed) forms of input: i.e. do not try decorating input sources and output targets:
- Input:
byte[]
is better if you have it;InputStream
next best;Reader
then -- and in every case, do NOT try reading input into String! - Output:
OutputStream
is best;Writer
second best; callingwriteValueAsString()
is the least efficient (why construct intermediate String?) - Rationale: Jackson is very good at finding the most efficient (sometimes zero-copy) way to consume/produce JSON encoded data -- let it do its magic
- If you need to re-process, replay, don't re-parse
- Sometimes you need to process things in multiple phases; for example, you may need to parser part of JSON to figure out further processing or data-binding rules, and/or modify intermediate presentation for further processing
- Instead of writing out intermediate forms back as JSON (which will incur both JSON writing and reading overhead), it is better to use a more efficient intermediate form
- The most efficient intermediate form is
TokenBuffer
(flat sequence of JSON Tokens); followed by JSON Tree model (JsonNode
) - May also want to use
ObjetMapper.convertValue()
, to convert between Object types
- Use
ObjectReader
methodreadValues()
for reading sequences of same POJO type
- Functionally equivalent to calling
readValue()
multiple times, but both more convenient AND (slightly) more efficient
- Prefer 'ObjectReader'/'ObjectWriter' over 'ObjectMapper'
-
ObjectReader
andObjectWriter
are safer to use -- they are fully immutable and freely shareable between threads -- but they can also be bit more efficient, since they can avoid some of the lookups thatObjectMapper
has to do
Once you have reviewed "the basics" discussed above, you may want to consider other tasks specifically aimed at further improving performance.
There are two main criteria that differentiate approaches listed below:
- Ease -- how much work is involved in making the change
- Compatibility -- is the resulting system interoperable with "Plain Old JSON" usage?
The big benefit of Jackson Databind API is the ease of use: with just a line or two of code you can convert between POJOs and JSON. But this convenience is not completely free: there is overhead involved in some of the automated processing, such as that of handling POJO property values using Java Reflection API (compared to explicit calls to getters and setters).
So one straight-forward (if laborious) possibility is to rewrite data conversion to use Jackson Streaming API.
With Streaming API one has to construct JsonParser
s and JsonGenerator
s, and use low-level calls to read and write JSON as tokens.
If you explicitly rewrite all the conversions to use Streaming API instead of data binding, you may be able to increase through-put by 30-40%; and this without any changes to actual JSON produced. But writing and maintaining the low-level code takes time and effort, so whether you want to do this depends on how much you want to invest in getting moderate speedup.
One possible trade-off is that of only rewriting parts of the process; specifically, optimizing most commonly used conversions: these are usually leaf-level classes (classes that have only primitive or String -valued properties). You can achieve this by only writing JsonSerializer
s and JsonDeserializer
s for small number of types; Jackson can happily use both its own default POJO serializers, deserializers, and custom overrides for specific types.
Another kind of trade-off is to consider Smile binary format, which was developed as part of Jackson 1.6.
Smile is a binary format that is 100% compatible with logical JSON Data model; similar to how "binary XML" (like Fast Infoset) is related to standard textual XML. This means that conversion between JSON and Smile can be done efficiently and without loss of information. It also means that the API for working with Smile-encoded data is 100% regular Jackson API: the only difference being that the underlying factory is of type SmileFactory
, instead of JsonFactory
. This factory is provided by Jackson Smile Module
Converting a service (or client) to use Smile is very easy: just create an ObjectMapper
that uses SmileFactory
. But the potential challenge is that such a change is visible to clients; this may or may not be a problem (depending on whether content format can be auto-negotiated, like is done using JAX-RS).
But it is a visible change either way.
Use of binary format may be problematic more generally, as well; dealing with binary formats is very difficult from Javascript (and this is true for ALL binary formats, including protobuf
and thrift
) -- and for Javascript, specifically, it is SLOWER than handling of JSON -- but may also be problematic from languages that do not yet have Smile codec available. Currently Smile support is provided by libsmile
library written in C (and obviously standard Java implementation).
Finally, debugging of binary formats is more difficult than that of textual data formats, as some kind of reader will be needed.
Performance improvements from using Smile are similar to using Streaming API (30 - 50% improvement), but an additional bonus is that size of the data will decrease as well; typically by similar amount (30-50%).
Note that performance improvements are more significant with redundant data like streams of similar Objects ("big data", such as Map/Reduce data streams); this because Smile
can use back-references to all but eliminate repeating property names and short String values (like Enumerated values).
Finally, note that as with JSON, you can also choose between Streaming API and databinding when using Smile as the underlying format. Doing this will combine performance benefits.
As a very new (to be included in soon-to-be-released Jackson 2.1) option, it will be possible to change actual JSON Structure used for serializing Java Objects.
For example, consider case of hypothetical Point
class:
public class Point {
public int x, y;
}
which would typically be serialized as:
{"x":27, "y":15}
However, if one declares it as:
@JsonFormat(shape=JsonFormat.Shape.ARRAY)
@JsonPropertyOrder(alphabetic=true)
public class Point {
public int x, y;
}
we would instead get:
[27,15]
which basically just eliminates property names by using positional values for indicating which property value is stored where. This can lead to significant compaction of the serialized JSON content; and this translates quite directly to performance. It is also worth noting that this works equally well for "simple" non-repeating data (like request/response messages), as property names are simply eliminated.
As with Smile, this change is directly visible to client, and either requires that client uses Jackson, or implements similar functionality. Nonetheless, this format is slightly easier to read (or at least debug) and process with scripting languages.
Since this features is brand new, it has not been extensively performance tested, but the initial results suggest that it can achieve improvements similar to use of Smile
or hand-written Streaming API based converters. And this feature can be combined with use of Smile
format as well.
After going through couple of compromises (easy OR compatible), there is one approach that is both (yay!): Jackson Afterburner Module.
What Afterburner
module does is to optimize underlying serializers and deserializers, by:
- Uses byte code generation to replace Java Reflection calls (used for field and method access and constructor calls) with actual byte code -- similar to how one would write explicit field access and method calls from Java code
- Inlines handling of a small set of basic types (
String
,int
,long
-- possibly more in future), so that if the default serializer/deserializer is used, calls are replaced by equivalent standard handling (which eliminates couple of method calls, and possible argument/return value boxing) - Speculative "match/parsing" of ordered property names, using special matching calls in
JsonParser
-- this can eliminate symbol table lookups if field names are serialized in order Jackson serializes them (which may be indicated by use of@JsonPropertyOrder
)
Since these optimizations more or less mimic more efficient patterns used by "hand-written" converters (i.e. our first option, use of Streaming API), performance improvements could theoretically reach the level of such converters. In practice we have observed improvements in 60-70% range of this maximum (that is, Afterburner can eliminate 2/3 of overhead that standard databinding has over hand-written alternatives).
Approaches discussed so far have different levels of maturity, and this may affect your choices:
- Streaming API - based converters ("hand-written"): Streaming API has been available since the first Jackson release
- Smile format: First introduced in Jackson 1.6, very stable, both format and parser/generator implementations
- Significant amount of real heavy production use by projects like Elastic Search
- Afterburner: Has been available since Jackson 1.8 -- not experimental, but has not been used as heavily as Smile.
- POJO-as-array: Experimental, included in Jackson 2.1; work in progress
But do you need to choose just one approach? Absolutely not!
In fact, bundling can save you big here: you can combine most of the approaches. Specifically:
- Choice of
Smile
over JSON is compatible with all the other choices and can vary independently. - Choices of "POJO-as-array" and Afterburner are compatible with choices other than Streaming API
So, you could consider combinations like:
- Use
Smile
format, but write your code using Streaming API: this is what some frameworks do (Elastic Search) - Use Afterburner with "POJO-as-Array"; either with regular JSON or Smile
if you are after maximal performance.
With "extreme" combinations such as listed above, use of plain old JSON can meet or exceed performance of fast binary formats such as protobuf
, thrift
or avro
.
And with Smile
, both processing speed and data sizes can exceed alternatives (as small, even faster!).
Although not all combinations discussed above are included, JVM Serializers benchmark can give some idea for improvements, as it includes results for JSON/Smile, Streaming-API/databind/Afterburner combinations.