Skip to content

Commit bc70715

Browse files
min_df was larger than max_df and outside of the acceptable range of 0.0-1.0 (#1601)
* min_df was larger than max_df and outside of the acceptable range of 0.0 to 1.0. This gave me an error but changing the min_df to 0.2 or 0.02 resolved the error. It is unclear if the author intended min_df to be 0.2 or 0.02. * Update ntm_20newsgroups_topic_model.ipynb remove output and changed min_df to a likely better default of 0.2 Co-authored-by: Aaron Markham <[email protected]>
1 parent d003ef0 commit bc70715

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

introduction_to_applying_machine_learning/ntm_20newsgroups_topic_modeling/ntm_20newsgroups_topic_model.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@
279279
"print('Tokenizing and counting, this may take a few minutes...')\n",
280280
"start_time = time.time()\n",
281281
"vectorizer = CountVectorizer(input='content', analyzer='word', stop_words='english',\n",
282-
" tokenizer=LemmaTokenizer(), max_features=vocab_size, max_df=0.95, min_df=2)\n",
282+
" tokenizer=LemmaTokenizer(), max_features=vocab_size, max_df=0.95, min_df=0.2)\n",
283283
"vectors = vectorizer.fit_transform(data)\n",
284284
"vocab_list = vectorizer.get_feature_names()\n",
285285
"print('vocab size:', len(vocab_list))\n",

0 commit comments

Comments
 (0)