Skip to content

Commit 5036cd7

Browse files
authored
Merge pull request #77 from awslabs/zdwolfe-docbash_xgboost_direct_marketing_grammar_2
xgboost_direct_marketing: Fix a grammar error
2 parents 176578f + 2a098c6 commit 5036cd7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

introduction_to_applying_machine_learning/xgboost_direct_marketing/xgboost_direct_marketing_sagemaker.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@
286286
"\n",
287287
"* Handling missing values: Some machine learning algorithms are capable of handling missing values, but most would rather not. Options include:\n",
288288
" * Removing observations with missing values: This works well if only a very small fraction of observations have incomplete information.\n",
289-
" * Remove features with missing values: This works well if there are a small number of features which have a large number of missing values.\n",
289+
" * Removing features with missing values: This works well if there are a small number of features which have a large number of missing values.\n",
290290
" * Imputing missing values: Entire [books](https://www.amazon.com/Flexible-Imputation-Missing-Interdisciplinary-Statistics/dp/1439868247) have been written on this topic, but common choices are replacing the missing value with the mode or mean of that column's non-missing values.\n",
291291
"* Converting categorical to numeric: The most common method is one hot encoding, which for each feature maps every distinct value of that column to its own feature which takes a value of 1 when the categorical feature is equal to that value, and 0 otherwise.\n",
292292
"* Oddly distributed data: Although for non-linear models like Gradient Boosted Trees, this has very limited implications, parametric models like regression can produce wildly inaccurate estimates when fed highly skewed data. In some cases, simply taking the natural log of the features is sufficient to produce more normally distributed data. In others, bucketing values into discrete ranges is helpful. These buckets can then be treated as categorical variables and included in the model when one hot encoded.\n",

0 commit comments

Comments
 (0)