-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Over sampling
OverSampler is an object that over-samples the minority class at random with replacement.
Parameters:
- ratio : Controls the number of new samples to draw. The number of new samples is given by int(ratio * num_minority_samples)
- random_state : Seed for random numbers generation.
SMOTE is an object that generates synthetic samples by applying the SMOTE algorithm. New minority samples are generated along the lines that connecting minority samples to its nearest minority neighbours.
Parameters:
- k : Number of nearest neighbours to use when generating synthetic samples.
- ratio : Controls the number of synthetic samples to generate. The number of new samples is given by int(ratio * num_minority_samples)
- random_state : Seed for random numbers generation.
bSMOTE1 is an object that generates synthetic samples by applying the SMOTE algorithm, but only to samples that are near the border between different classes.
Initially m nearest neighbours for every sample in the minority class are found. Minority samples that are completely surrounded by majority samples, i.e.: all m nearest neighbours belong to the majority class, are considered to be noise and left out of the process. Samples with at most m/2 NNs from the majority class are considered to be safe, and also left out of the process.
Samples for which the number of NNs from the majority class is greater than m<2 (but not m) are considered in danger (near the borderline) and used to generate synthetic samples. New minority samples are generated along the lines that connecting minority samples to its nearest minority neighbours.
Parameters:
- m : Number of nearest neighbours to use when deciding if a sample is in danger.
- k : Number of nearest neighbours to use when generating synthetic samples.
- ratio : Controls the number of synthetic samples to generate. The number of new samples is given by int(ratio * num_minority_samples)
- random_state : Seed for random numbers generation.
bSMOTE2 is an object that generates synthetic samples by applying the SMOTE algorithm, but only to samples that are near the border between different classes.
Similarly to bSMOTE1, initially m nearest neighbours for every sample in the minority class are found. Minority samples that are completely surrounded by majority samples, i.e.: all m nearest neighbours belong to the majority class, are considered to be noise and left out of the process. Samples with at most m/2 NNs from the majority class are considered to be safe, and also left out of the process.
Samples for which the number of NNs from the majority class is greater than m<2 (but not m) are considered in danger (near the borderline) and used to generate synthetic samples. What differs bSMOTE2 from bSMOTE typ1 New minority samples are generated along the lines that connecting minority samples to its nearest minority neighbours.
Parameters:
- m : Number of nearest neighbours to use when deciding if a sample is in danger.
- k : Number of nearest neighbours to use when generating synthetic samples.
- ratio : Controls the number of synthetic samples to generate. The number of new samples is given by int(ratio * num_minority_samples)
- random_state : Seed for random numbers generation.