Skip to content

Deal with problematic characters in comments #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 15, 2015
Merged

Deal with problematic characters in comments #96

merged 1 commit into from
Jun 15, 2015

Conversation

marcusklaas
Copy link
Contributor

Resolves several issues with comments containing problematic characters.

@@ -11,6 +11,7 @@
#![feature(rustc_private)]
#![feature(collections)]
#![feature(str_char)]
#![feature(core)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

@nrc
Copy link
Member

nrc commented Jun 7, 2015

Thanks for doing this, looks good. With the minor comments addressed, I'd be happy to land this as is.

However, I think the algorithm you're using could be better - it is O(n^2), and the constant multiplier is high too. I think this can be solved in O(n) by walking the string once and doing the matching yourself (this at the expense of some flexibility, because you probably want to just match against another string, rather than some Pattern). I think that in practice this won't matter too much because n will always be fairly small, but I imagine in the future we might want to run such a function over a whole block or something, in which case it could make a noticeably impact on performance.

So, I'm happy to land as is, but add a comment, or you can try to come up with an O(n) algorithm.

@marcusklaas
Copy link
Contributor Author

I honestly didn't think it would really matter that much. I was very wrong. I ran on the following test string:

// Ick, just needed to get a &'static to handle_result.
static HANDLE_RESULT: &'static Fn(HashMap<String, String>) = &handle_result;

pub fn idempotent_check(filename: String) -> Result<(), HashMap<String, String>> {
    let args = vec!["rustfmt".to_owned(), filename];
    let mut def_config_file = fs::File::open("default.toml").unwrap();
    let mut def_config = String::new();
    def_config_file.read_to_string(&mut def_config).unwrap();
    // this thread is not used for concurrency, but rather to workaround the issue that the passed
    // function handle needs to have static lifetime. Instead of using a global RefCell, we use
    // panic to return a result in case of failure. This has the advantage of smoothing the road to
    // multithreaded rustfmt
    thread::catch_panic(move || {
        run(args, WriteMode::Return(HANDLE_RESULT), &def_config);
    }).map_err(|any|
        *any.downcast().unwrap()
    )
}

These were the results:

running 3 tests
test utils::bench_find_uncommented_new ... bench:      1802 ns/iter (+/- 125)
test utils::bench_find_uncommented_old ... bench:    139293 ns/iter (+/- 13399)

The new implementation isn't as elegant, but you can't argue with those measurements. I'll push the code in a bit.

@marcusklaas
Copy link
Contributor Author

Merge conflict resolved!

let mut match_count: usize = 0;
let mut possible_comment = false;

for (i, b) in self.as_bytes().iter().cloned().enumerate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to clone here? Feels like it shouldn't be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can inspect a u8 instead of an &u8. The alternative is to pattern match: for (i, &b) in ..., which is arguably nicer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter does seem nicer

@nrc
Copy link
Member

nrc commented Jun 12, 2015

The new approach looks great! I think the only thing to do is to iterate over chars rather than bytes.

@marcusklaas
Copy link
Contributor Author

You have convinced me. It now iterates over chars. If we'd ever want the last drop of performance, we should probably use something like jetscii anyways.

nrc added a commit that referenced this pull request Jun 15, 2015
Deal with problematic characters in comments
@nrc nrc merged commit 129e2f1 into rust-lang:master Jun 15, 2015
@marcusklaas marcusklaas deleted the find-uncommented branch September 11, 2015 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants