Skip to content

Improve method name suggestions #95119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 71 additions & 2 deletions compiler/rustc_span/src/lev_distance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,62 @@ pub fn lev_distance(a: &str, b: &str, limit: usize) -> Option<usize> {
(dcol[m] <= limit).then_some(dcol[m])
}

/// Provides a word similarity score between two words that accounts for substrings being more
/// meaningful than a typical Levenshtein distance. The lower the score, the closer the match.
/// 0 is an identical match.
///
/// Uses the Levenshtein distance between the two strings and removes the cost of the length
/// difference. If this is 0 then it is either a substring match or a full word match, in the
/// substring match case we detect this and return `1`. To prevent finding meaningless substrings,
/// eg. "in" in "shrink", we only perform this subtraction of length difference if one of the words
/// is not greater than twice the length of the other. For cases where the words are close in size
/// but not an exact substring then the cost of the length difference is discounted by half.
///
/// Returns `None` if the distance exceeds the limit.
pub fn lev_distance_with_substrings(a: &str, b: &str, limit: usize) -> Option<usize> {
let n = a.chars().count();
let m = b.chars().count();

// Check one isn't less than half the length of the other. If this is true then there is a
// big difference in length.
let big_len_diff = (n * 2) < m || (m * 2) < n;
let len_diff = if n < m { m - n } else { n - m };
let lev = lev_distance(a, b, limit + len_diff)?;

// This is the crux, subtracting length difference means exact substring matches will now be 0
let score = lev - len_diff;

// If the score is 0 but the words have different lengths then it's a substring match not a full
// word match
let score = if score == 0 && len_diff > 0 && !big_len_diff {
1 // Exact substring match, but not a total word match so return non-zero
} else if !big_len_diff {
// Not a big difference in length, discount cost of length difference
score + (len_diff + 1) / 2
} else {
// A big difference in length, add back the difference in length to the score
score + len_diff
};

(score <= limit).then_some(score)
}

/// Finds the best match for given word in the given iterator where substrings are meaningful.
///
/// A version of [`find_best_match_for_name`] that uses [`lev_distance_with_substrings`] as the score
/// for word similarity. This takes an optional distance limit which defaults to one-third of the
/// given word.
///
/// Besides the modified Levenshtein, we use case insensitive comparison to improve accuracy
/// on an edge case with a lower(upper)case letters mismatch.
pub fn find_best_match_for_name_with_substrings(
candidates: &[Symbol],
lookup: Symbol,
dist: Option<usize>,
) -> Option<Symbol> {
find_best_match_for_name_impl(true, candidates, lookup, dist)
}

/// Finds the best match for a given word in the given iterator.
///
/// As a loose rule to avoid the obviously incorrect suggestions, it takes
Expand All @@ -54,11 +110,20 @@ pub fn lev_distance(a: &str, b: &str, limit: usize) -> Option<usize> {
///
/// Besides Levenshtein, we use case insensitive comparison to improve accuracy
/// on an edge case with a lower(upper)case letters mismatch.
#[cold]
pub fn find_best_match_for_name(
candidates: &[Symbol],
lookup: Symbol,
dist: Option<usize>,
) -> Option<Symbol> {
find_best_match_for_name_impl(false, candidates, lookup, dist)
}

#[cold]
fn find_best_match_for_name_impl(
use_substring_score: bool,
candidates: &[Symbol],
lookup: Symbol,
dist: Option<usize>,
) -> Option<Symbol> {
let lookup = lookup.as_str();
let lookup_uppercase = lookup.to_uppercase();
Expand All @@ -74,7 +139,11 @@ pub fn find_best_match_for_name(
let mut dist = dist.unwrap_or_else(|| cmp::max(lookup.len(), 3) / 3);
let mut best = None;
for c in candidates {
match lev_distance(lookup, c.as_str(), dist) {
match if use_substring_score {
lev_distance_with_substrings(lookup, c.as_str(), dist)
} else {
lev_distance(lookup, c.as_str(), dist)
} {
Some(0) => return Some(*c),
Some(d) => {
dist = d - 1;
Expand Down
11 changes: 11 additions & 0 deletions compiler/rustc_span/src/lev_distance/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ fn test_lev_distance_limit() {
assert_eq!(lev_distance("abc", "xyz", 2), None);
}

#[test]
fn test_method_name_similarity_score() {
assert_eq!(lev_distance_with_substrings("empty", "is_empty", 1), Some(1));
assert_eq!(lev_distance_with_substrings("shrunk", "rchunks", 2), None);
assert_eq!(lev_distance_with_substrings("abc", "abcd", 1), Some(1));
assert_eq!(lev_distance_with_substrings("a", "abcd", 1), None);
assert_eq!(lev_distance_with_substrings("edf", "eq", 1), None);
assert_eq!(lev_distance_with_substrings("abc", "xyz", 3), Some(3));
assert_eq!(lev_distance_with_substrings("abcdef", "abcdef", 2), Some(0));
}

#[test]
fn test_find_best_match_for_name() {
use crate::create_default_session_globals_then;
Expand Down
13 changes: 10 additions & 3 deletions compiler/rustc_typeck/src/check/method/probe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ use rustc_middle::ty::GenericParamDefKind;
use rustc_middle::ty::{self, ParamEnvAnd, ToPredicate, Ty, TyCtxt, TypeFoldable};
use rustc_session::lint;
use rustc_span::def_id::LocalDefId;
use rustc_span::lev_distance::{find_best_match_for_name, lev_distance};
use rustc_span::lev_distance::{
find_best_match_for_name_with_substrings, lev_distance_with_substrings,
};
use rustc_span::{symbol::Ident, Span, Symbol, DUMMY_SP};
use rustc_trait_selection::autoderef::{self, Autoderef};
use rustc_trait_selection::traits::query::evaluate_obligation::InferCtxtExt;
Expand Down Expand Up @@ -1699,7 +1701,11 @@ impl<'a, 'tcx> ProbeContext<'a, 'tcx> {
.iter()
.map(|cand| cand.name)
.collect::<Vec<Symbol>>();
find_best_match_for_name(&names, self.method_name.unwrap().name, None)
find_best_match_for_name_with_substrings(
&names,
self.method_name.unwrap().name,
None,
)
}
.unwrap();
Ok(applicable_close_candidates.into_iter().find(|method| method.name == best_name))
Expand Down Expand Up @@ -1856,7 +1862,8 @@ impl<'a, 'tcx> ProbeContext<'a, 'tcx> {
if x.kind.namespace() != Namespace::ValueNS {
return false;
}
match lev_distance(name.as_str(), x.name.as_str(), max_dist) {
match lev_distance_with_substrings(name.as_str(), x.name.as_str(), max_dist)
{
Some(d) => d > 0,
None => false,
}
Expand Down
5 changes: 4 additions & 1 deletion src/test/ui/associated-item/associated-item-enum.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ LL | enum Enum { Variant }
| --------- variant or associated item `mispellable_trait` not found here
...
LL | Enum::mispellable_trait();
| ^^^^^^^^^^^^^^^^^ variant or associated item not found in `Enum`
| ^^^^^^^^^^^^^^^^^
| |
| variant or associated item not found in `Enum`
| help: there is an associated function with a similar name: `misspellable`

error[E0599]: no variant or associated item named `MISPELLABLE` found for enum `Enum` in the current scope
--> $DIR/associated-item-enum.rs:19:11
Expand Down
2 changes: 1 addition & 1 deletion src/test/ui/derives/issue-91550.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ LL | struct Value(u32);
| doesn't satisfy `Value: Hash`
...
LL | hs.insert(Value(0));
| ^^^^^^ method cannot be called on `HashSet<Value>` due to unsatisfied trait bounds
| ^^^^^^
|
= note: the following trait bounds were not satisfied:
`Value: Eq`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ error[E0599]: the method `as_deref_mut` exists for enum `Option<{integer}>`, but
--> $DIR/option-as_deref_mut.rs:2:33
|
LL | let _result = &mut Some(42).as_deref_mut();
| ^^^^^^^^^^^^ method cannot be called on `Option<{integer}>` due to unsatisfied trait bounds
| ^^^^^^^^^^^^
|
= note: the following trait bounds were not satisfied:
`{integer}: Deref`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ error[E0599]: the method `as_deref_mut` exists for enum `Result<{integer}, _>`,
--> $DIR/result-as_deref_mut.rs:2:31
|
LL | let _result = &mut Ok(42).as_deref_mut();
| ^^^^^^^^^^^^ method cannot be called on `Result<{integer}, _>` due to unsatisfied trait bounds
| ^^^^^^^^^^^^
|
= note: the following trait bounds were not satisfied:
`{integer}: Deref`
Expand Down
4 changes: 4 additions & 0 deletions src/test/ui/rust-2018/trait-import-suggestions.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ help: the following trait is implemented but not in scope; perhaps add a `use` f
|
LL | use std::str::FromStr;
|
help: there is an associated function with a similar name
|
LL | let y = u32::from_str_radix("33");
| ~~~~~~~~~~~~~~

error: aborting due to 4 previous errors

Expand Down
2 changes: 1 addition & 1 deletion src/test/ui/suggestions/suggest-methods.stderr
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ error[E0599]: no method named `count_o` found for type `u32` in the current scop
--> $DIR/suggest-methods.rs:28:19
|
LL | let _ = 63u32.count_o();
| ^^^^^^^ method not found in `u32`
| ^^^^^^^ help: there is an associated function with a similar name: `count_ones`

error: aborting due to 4 previous errors

Expand Down