Skip to content

Include ChaCha pseudorandom generator #17387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 3, 2014
Merged

Include ChaCha pseudorandom generator #17387

merged 1 commit into from
Oct 3, 2014

Conversation

sneves
Copy link
Contributor

@sneves sneves commented Sep 19, 2014

This commit introduces a ChaCha-based pseudorandom number generator to librand. Here's why I prefer it to the default, ISAAC:

  • While ISAAC is meant to be cryptographically secure, it stands on relatively shaky ground. Its design is inspired by the much-maligned RC4, and the existing analysis reveals some suboptimal properties, although not particularly devastating. ISAAC64, the default on 64-bit architectures, doesn't seem to have been analyzed at all.
  • ISAAC has large storage requirements. An ISAAC64 state consumes 4128 bytes, which is ~12.5% of a typical CPU's L1 cache. This ChaCha implementation consumes 136 bytes, and can be further reduced to 105, at the cost of slightly higher implementation complexity. The performance impact of cache hoarding is hard to demonstrate in synthetic benchmarks.
  • ISAAC is inherently sequential, and its speed is somewhat limited by data-dependent memory accesses, not unlike RC4. ChaCha, on the other hand, is not and can be vectorized easily for significant speedups, at the cost of larger state.

In my tests, an SSE2 (using only what #[simd] types allow at the moment) ChaCha20 roughly matches ISAAC in speed, and an AVX2 implementation matches ISAAC64. Better SIMD support (particularly shuffles) would improve these results further.

I'm currently using the 20-round ChaCha, as originally specified. However, 12 and 8-round variants are also possible (and believed to be secure), and those would provide an additional speedup factor of roughly 20/{12,8}. I have not included SIMD implementations in this commit to avoid the complexity of figuring out which instruction set to use.

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

@huonw
Copy link
Member

huonw commented Sep 19, 2014

cc me

@thestinger
Copy link
Contributor

It would be great to completely replace ISAAC with a more modern and thoroughly reviewed algorithm like ChaCha20. The reduced round versions sound very tempting because they would be faster, lighter and provide better security properties. How does the performance of the scalar implementation compare to ISAAC/ISAAC64?

@thestinger
Copy link
Contributor

cc #10047


#[test]
fn test_rng_true_values() {
// Test vector from http://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this test vector is too short to be hitting the update codepath, i.e. it's only testing that it is initialised correctly; could you switch to

   KEY:       000102030405060708090a0b0c0d0e0f101112131415161718191a1b
              1c1d1e1f
   NONCE:     0001020304050607
   KEYSTREAM: f798a189f195e66982105ffb640bb7757f579da31602fc93ec01ac56
              f85ac3c134a4547b733b46413042c9440049176905d3be59ea1c53f1
              5916155c2be8241a38008b9a26bc35941e2444177c8ade6689de9526
              4986d95889fb60e84629c9bd9a5acb1cc118be563eb9b3a4a472f82e
              09a7e778492b562ef7130e88dfe031c79db9d4f7c7a899151b9a4750
              32b63fc385245fe054e3dd5a97a5f576fe064025d3ce042c566ab2c5
              07b138db853e3d6959660996546cc9c4a6eafdc777c040d70eaf46f7
              6dad3979e5c5360c3317166a1c894c94a371876a94df7628fe4eaaf2
              ccb27d5aaae0ad7ad0f9d4b6ad3b54098746d4524d38407a6deb3ab7
              8fab78c9

(I think I would prefer to just render the output string to a hexidecimal string, e.g. with the {:08x} format specifier, rather than modify the keystream text.)

Alternatively, you could also do something similar to the ISAAC tests, which check the prefix and then check a 'random' subsequence much later (I generated those using a known-good implementation of ISAAC, presumably the same would be done here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the vector is too short. But for what it's worth, init simply puts the state words in place, and the first next_u32 call forces update to be called, since self.index is initialized to STATE_WORDS.

I'll generate some test vectors using the reference implementation. This will be needed anyway, in case we choose to go with a lower number of rounds.

@huonw
Copy link
Member

huonw commented Sep 19, 2014

This is nice!

I'm in favour of replacing ISAAC. (I'll review in more detail soon.)

@sneves
Copy link
Contributor Author

sneves commented Sep 19, 2014

@thestinger Scalar performance oscillates depending on the CPU. On an Intel Sandy bridge, I get an approximate 2.7x slowdown compared to ISAAC. On Haswell it is a 2.0x slowdown, with ChaCha8 achieving performance parity with it. I don't have other CPU models to try this out on.

@huonw
Copy link
Member

huonw commented Sep 19, 2014

What is the performance if you disable the assert inside next_u32?

@sneves
Copy link
Contributor Author

sneves commented Sep 19, 2014

@huonw A little better, ~3-5%.

@nikomatsakis
Copy link
Contributor

@huonw shall I delegate reviewing responsibility to you then?

@huonw
Copy link
Member

huonw commented Sep 19, 2014

Sounds good.

};


macro_rules!quarter_round(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally these macros would be formatted something like

macro_rules! quarter_round{
    ($a: expr, $b: expr, $c: expr, $d: expr) =>  {{
        // code here
    }}
}

macro_rules! double_round {
    ($x: expr) => {{
        // ...
    }}
}

@huonw
Copy link
Member

huonw commented Sep 19, 2014

General note (not at all required in this PR), it seems like two sensible extensions would be:

  • randomly jumping around in the sequence (e.g. have a set_counter(&mut self, n: u64) method that sets words 12 and 13 of the state appropriately)
  • allowing for non-zero nonces

Also, I'm nervous to suggest this, but do we really need to guarantee that the stream doesn't cycle after 270 bytes (that is, can we remove the assertion)?

@sneves
Copy link
Contributor Author

sneves commented Sep 19, 2014

Jumping ahead does indeed sound like a useful feature. Perhaps it should have its own trait (XorShift can also jump arbitrarily ahead via matrix exponentiation)?

I can increase the counter to 96 or 128 bits by using the nonce words, removing any period issues and the need for assertions. This does not affect security, since those words are already assumed to be public and attacker-controlled.

@huonw
Copy link
Member

huonw commented Sep 19, 2014

I'd be OK with just keeping any jumping as specific to certain types, we can look at generalisation later (e.g. this is constant time and simple, whlle the XorShift jumping is logarithmic and relatively complicated).

Hm, increasing the size of the counter seems slightly peculiar, I'm not immediately against it, but it would be sweeping any 'errors' under the rug. (Someone wanting a particular nonce would then just fast-forward by a very large number to get to the 128-bit counter that corresponds to that nonce.)

I say 'error' because I'm not actually sure that this would be a problem at all.

@klutzy
Copy link
Contributor

klutzy commented Sep 20, 2014

Increasing counter size means using nonce as "external" counter per 2^70 bytes. Since nonce is usually used for counter of data block, so I think it's not problematic.

@huonw
Copy link
Member

huonw commented Sep 23, 2014

@klutzy fwiw, I'm not sure where that is stated in that draft, I can see:

The first counter input word is set to one and
the plaintext is encrypted by XORing it with the output of
invocations of the ChaCha20 function as needed, incrementing the
first counter word after each block and overflowing into the second.

But I think that is just referring to the fact that the counter is stored in two separate u32 words.


/// Sets the internal 128-bit ChaCha counter to
/// a user-provided value. This permits jumping
/// arbitrarily ahead (or backwards) in the pseudorandom stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a paragraph here describing the interaction with the nonce, e.g. something like:

The counter_high value is in place of the nonce, and so users wishing to obtain the conventional ChaCha pseudorandom stream associated with a particular nonce can call this function with arguments 0, desired_nonce.

@huonw
Copy link
Member

huonw commented Sep 23, 2014

Just a few small things and I think this should be good to go!

/// A random number generator that uses the ChaCha20 algorithm [1].
///
/// The ChaCha algorithm is widely accepted as suitable for
/// cryptographic purposes, but this implementation has not be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say "has not been" instead of "has not be"

@klutzy
Copy link
Contributor

klutzy commented Sep 24, 2014

@huonw I meant this part: "When used in TLS, ... the nonce is the sequence number for the record, as an 8-byte, big-endian number." (nonce counter for each record, usual counter for each data block of record)

@vks
Copy link
Contributor

vks commented Sep 25, 2014

@thestinger

The reduced round versions sound very tempting because they would be faster, lighter and provide better security properties.

How does a reduced round version provide better security properties?

@sneves
Copy link
Contributor Author

sneves commented Sep 26, 2014

@vks I assume he meant better security relatively to ISAAC.

@huonw Is there anything still holding this PR back?

@huonw
Copy link
Member

huonw commented Sep 28, 2014

Looks good to me.

r=me after squashing the commits. (Be sure to comment when you update, github doesn't provide notifications.)

@sneves
Copy link
Contributor Author

sneves commented Sep 29, 2014

OK, done.

r=huonw

@sneves
Copy link
Contributor Author

sneves commented Sep 30, 2014

Oops, forgot to export ChaChaRng. Fixed now.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Oct 3, 2014
@bors bors merged commit 5615ace into rust-lang:master Oct 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants