Skip to content

Commit 0d92612

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 610d4f7ae65fe8aa0bd923e8f8b00a0eb9600594
1 parent 75f330d commit 0d92612

File tree

1,261 files changed

+6214
-6151
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,261 files changed

+6214
-6151
lines changed
Binary file not shown.

dev/_downloads/21a6ff17ef2837fe1cd49e63223a368d/plot_unveil_tree_structure.py

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@
6868
# - ``weighted_n_node_samples[i]``: the weighted number of training samples
6969
# reaching node ``i``
7070
# - ``value[i, j, k]``: the summary of the training samples that reached node i for
71-
# output j and class k (for regression tree, class is set to 1).
71+
# output j and class k (for regression tree, class is set to 1). See below
72+
# for more information about ``value``.
7273
#
7374
# Using the arrays, we can traverse the tree structure to compute various
7475
# properties. Below, we will compute the depth of each node and whether or not
@@ -108,7 +109,7 @@
108109
if is_leaves[i]:
109110
print(
110111
"{space}node={node} is a leaf node with value={value}.".format(
111-
space=node_depth[i] * "\t", node=i, value=values[i]
112+
space=node_depth[i] * "\t", node=i, value=np.around(values[i], 3)
112113
)
113114
)
114115
else:
@@ -122,24 +123,36 @@
122123
feature=feature[i],
123124
threshold=threshold[i],
124125
right=children_right[i],
125-
value=values[i],
126+
value=np.around(values[i], 3),
126127
)
127128
)
128129

129130
# %%
130131
# What is the values array used here?
131132
# -----------------------------------
132133
# The `tree_.value` array is a 3D array of shape
133-
# [``n_nodes``, ``n_classes``, ``n_outputs``] which provides the count of samples
134-
# reaching a node for each class and for each output. Each node has a ``value``
135-
# array which is the number of weighted samples reaching this
136-
# node for each output and class.
134+
# [``n_nodes``, ``n_classes``, ``n_outputs``] which provides the proportion of samples
135+
# reaching a node for each class and for each output.
136+
# Each node has a ``value`` array which is the proportion of weighted samples reaching
137+
# this node for each output and class with respect to the parent node.
138+
#
139+
# One could convert this to the absolute weighted number of samples reaching a node,
140+
# by multiplying this number by `tree_.weighted_n_node_samples[node_idx]` for the
141+
# given node. Note sample weights are not used in this example, so the weighted
142+
# number of samples is the number of samples reaching the node because each sample
143+
# has a weight of 1 by default.
137144
#
138145
# For example, in the above tree built on the iris dataset, the root node has
139-
# ``value = [37, 34, 41]``, indicating there are 37 samples
146+
# ``value = [0.33, 0.304, 0.366]`` indicating there are 33% of class 0 samples,
147+
# 30.4% of class 1 samples, and 36.6% of class 2 samples at the root node. One can
148+
# convert this to the absolute number of samples by multiplying by the number of
149+
# samples reaching the root node, which is `tree_.weighted_n_node_samples[0]`.
150+
# Then the root node has ``value = [37, 34, 41]``, indicating there are 37 samples
140151
# of class 0, 34 samples of class 1, and 41 samples of class 2 at the root node.
152+
#
141153
# Traversing the tree, the samples are split and as a result, the ``value`` array
142-
# reaching each node changes. The left child of the root node has ``value = [37, 0, 0]``
154+
# reaching each node changes. The left child of the root node has ``value = [1., 0, 0]``
155+
# (or ``value = [37, 0, 0]`` when converted to the absolute number of samples)
143156
# because all 37 samples in the left child node are from class 0.
144157
#
145158
# Note: In this example, `n_outputs=1`, but the tree classifier can also handle
@@ -148,8 +161,10 @@
148161

149162
##############################################################################
150163
# We can compare the above output to the plot of the decision tree.
164+
# Here, we show the proportions of samples of each class that reach each
165+
# node corresponding to the actual elements of `tree_.value` array.
151166

152-
tree.plot_tree(clf)
167+
tree.plot_tree(clf, proportion=True)
153168
plt.show()
154169

155170
##############################################################################
Binary file not shown.

dev/_downloads/f7a387851c5762610f4e8197e52bbbca/plot_unveil_tree_structure.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
"cell_type": "markdown",
4141
"metadata": {},
4242
"source": [
43-
"## Tree structure\n\nThe decision classifier has an attribute called ``tree_`` which allows access\nto low level attributes such as ``node_count``, the total number of nodes,\nand ``max_depth``, the maximal depth of the tree. The\n``tree_.compute_node_depths()`` method computes the depth of each node in the\ntree. `tree_` also stores the entire binary tree structure, represented as a\nnumber of parallel arrays. The i-th element of each array holds information\nabout the node ``i``. Node 0 is the tree's root. Some of the arrays only\napply to either leaves or split nodes. In this case the values of the nodes\nof the other type is arbitrary. For example, the arrays ``feature`` and\n``threshold`` only apply to split nodes. The values for leaf nodes in these\narrays are therefore arbitrary.\n\nAmong these arrays, we have:\n\n - ``children_left[i]``: id of the left child of node ``i`` or -1 if leaf\n node\n - ``children_right[i]``: id of the right child of node ``i`` or -1 if leaf\n node\n - ``feature[i]``: feature used for splitting node ``i``\n - ``threshold[i]``: threshold value at node ``i``\n - ``n_node_samples[i]``: the number of training samples reaching node\n ``i``\n - ``impurity[i]``: the impurity at node ``i``\n - ``weighted_n_node_samples[i]``: the weighted number of training samples\n reaching node ``i``\n - ``value[i, j, k]``: the summary of the training samples that reached node i for\n output j and class k (for regression tree, class is set to 1).\n\nUsing the arrays, we can traverse the tree structure to compute various\nproperties. Below, we will compute the depth of each node and whether or not\nit is a leaf.\n\n"
43+
"## Tree structure\n\nThe decision classifier has an attribute called ``tree_`` which allows access\nto low level attributes such as ``node_count``, the total number of nodes,\nand ``max_depth``, the maximal depth of the tree. The\n``tree_.compute_node_depths()`` method computes the depth of each node in the\ntree. `tree_` also stores the entire binary tree structure, represented as a\nnumber of parallel arrays. The i-th element of each array holds information\nabout the node ``i``. Node 0 is the tree's root. Some of the arrays only\napply to either leaves or split nodes. In this case the values of the nodes\nof the other type is arbitrary. For example, the arrays ``feature`` and\n``threshold`` only apply to split nodes. The values for leaf nodes in these\narrays are therefore arbitrary.\n\nAmong these arrays, we have:\n\n - ``children_left[i]``: id of the left child of node ``i`` or -1 if leaf\n node\n - ``children_right[i]``: id of the right child of node ``i`` or -1 if leaf\n node\n - ``feature[i]``: feature used for splitting node ``i``\n - ``threshold[i]``: threshold value at node ``i``\n - ``n_node_samples[i]``: the number of training samples reaching node\n ``i``\n - ``impurity[i]``: the impurity at node ``i``\n - ``weighted_n_node_samples[i]``: the weighted number of training samples\n reaching node ``i``\n - ``value[i, j, k]``: the summary of the training samples that reached node i for\n output j and class k (for regression tree, class is set to 1). See below\n for more information about ``value``.\n\nUsing the arrays, we can traverse the tree structure to compute various\nproperties. Below, we will compute the depth of each node and whether or not\nit is a leaf.\n\n"
4444
]
4545
},
4646
{
@@ -51,21 +51,21 @@
5151
},
5252
"outputs": [],
5353
"source": [
54-
"n_nodes = clf.tree_.node_count\nchildren_left = clf.tree_.children_left\nchildren_right = clf.tree_.children_right\nfeature = clf.tree_.feature\nthreshold = clf.tree_.threshold\nvalues = clf.tree_.value\n\nnode_depth = np.zeros(shape=n_nodes, dtype=np.int64)\nis_leaves = np.zeros(shape=n_nodes, dtype=bool)\nstack = [(0, 0)] # start with the root node id (0) and its depth (0)\nwhile len(stack) > 0:\n # `pop` ensures each node is only visited once\n node_id, depth = stack.pop()\n node_depth[node_id] = depth\n\n # If the left and right child of a node is not the same we have a split\n # node\n is_split_node = children_left[node_id] != children_right[node_id]\n # If a split node, append left and right children and depth to `stack`\n # so we can loop through them\n if is_split_node:\n stack.append((children_left[node_id], depth + 1))\n stack.append((children_right[node_id], depth + 1))\n else:\n is_leaves[node_id] = True\n\nprint(\n \"The binary tree structure has {n} nodes and has \"\n \"the following tree structure:\\n\".format(n=n_nodes)\n)\nfor i in range(n_nodes):\n if is_leaves[i]:\n print(\n \"{space}node={node} is a leaf node with value={value}.\".format(\n space=node_depth[i] * \"\\t\", node=i, value=values[i]\n )\n )\n else:\n print(\n \"{space}node={node} is a split node with value={value}: \"\n \"go to node {left} if X[:, {feature}] <= {threshold} \"\n \"else to node {right}.\".format(\n space=node_depth[i] * \"\\t\",\n node=i,\n left=children_left[i],\n feature=feature[i],\n threshold=threshold[i],\n right=children_right[i],\n value=values[i],\n )\n )"
54+
"n_nodes = clf.tree_.node_count\nchildren_left = clf.tree_.children_left\nchildren_right = clf.tree_.children_right\nfeature = clf.tree_.feature\nthreshold = clf.tree_.threshold\nvalues = clf.tree_.value\n\nnode_depth = np.zeros(shape=n_nodes, dtype=np.int64)\nis_leaves = np.zeros(shape=n_nodes, dtype=bool)\nstack = [(0, 0)] # start with the root node id (0) and its depth (0)\nwhile len(stack) > 0:\n # `pop` ensures each node is only visited once\n node_id, depth = stack.pop()\n node_depth[node_id] = depth\n\n # If the left and right child of a node is not the same we have a split\n # node\n is_split_node = children_left[node_id] != children_right[node_id]\n # If a split node, append left and right children and depth to `stack`\n # so we can loop through them\n if is_split_node:\n stack.append((children_left[node_id], depth + 1))\n stack.append((children_right[node_id], depth + 1))\n else:\n is_leaves[node_id] = True\n\nprint(\n \"The binary tree structure has {n} nodes and has \"\n \"the following tree structure:\\n\".format(n=n_nodes)\n)\nfor i in range(n_nodes):\n if is_leaves[i]:\n print(\n \"{space}node={node} is a leaf node with value={value}.\".format(\n space=node_depth[i] * \"\\t\", node=i, value=np.around(values[i], 3)\n )\n )\n else:\n print(\n \"{space}node={node} is a split node with value={value}: \"\n \"go to node {left} if X[:, {feature}] <= {threshold} \"\n \"else to node {right}.\".format(\n space=node_depth[i] * \"\\t\",\n node=i,\n left=children_left[i],\n feature=feature[i],\n threshold=threshold[i],\n right=children_right[i],\n value=np.around(values[i], 3),\n )\n )"
5555
]
5656
},
5757
{
5858
"cell_type": "markdown",
5959
"metadata": {},
6060
"source": [
61-
"## What is the values array used here?\nThe `tree_.value` array is a 3D array of shape\n[``n_nodes``, ``n_classes``, ``n_outputs``] which provides the count of samples\nreaching a node for each class and for each output. Each node has a ``value``\narray which is the number of weighted samples reaching this\nnode for each output and class.\n\nFor example, in the above tree built on the iris dataset, the root node has\n``value = [37, 34, 41]``, indicating there are 37 samples\nof class 0, 34 samples of class 1, and 41 samples of class 2 at the root node.\nTraversing the tree, the samples are split and as a result, the ``value`` array\nreaching each node changes. The left child of the root node has ``value = [37, 0, 0]``\nbecause all 37 samples in the left child node are from class 0.\n\nNote: In this example, `n_outputs=1`, but the tree classifier can also handle\nmulti-output problems. The `value` array at each node would just be a 2D\narray instead.\n\n"
61+
"## What is the values array used here?\nThe `tree_.value` array is a 3D array of shape\n[``n_nodes``, ``n_classes``, ``n_outputs``] which provides the proportion of samples\nreaching a node for each class and for each output.\nEach node has a ``value`` array which is the proportion of weighted samples reaching\nthis node for each output and class with respect to the parent node.\n\nOne could convert this to the absolute weighted number of samples reaching a node,\nby multiplying this number by `tree_.weighted_n_node_samples[node_idx]` for the\ngiven node. Note sample weights are not used in this example, so the weighted\nnumber of samples is the number of samples reaching the node because each sample\nhas a weight of 1 by default.\n\nFor example, in the above tree built on the iris dataset, the root node has\n``value = [0.33, 0.304, 0.366]`` indicating there are 33% of class 0 samples,\n30.4% of class 1 samples, and 36.6% of class 2 samples at the root node. One can\nconvert this to the absolute number of samples by multiplying by the number of\nsamples reaching the root node, which is `tree_.weighted_n_node_samples[0]`.\nThen the root node has ``value = [37, 34, 41]``, indicating there are 37 samples\nof class 0, 34 samples of class 1, and 41 samples of class 2 at the root node.\n\nTraversing the tree, the samples are split and as a result, the ``value`` array\nreaching each node changes. The left child of the root node has ``value = [1., 0, 0]``\n(or ``value = [37, 0, 0]`` when converted to the absolute number of samples)\nbecause all 37 samples in the left child node are from class 0.\n\nNote: In this example, `n_outputs=1`, but the tree classifier can also handle\nmulti-output problems. The `value` array at each node would just be a 2D\narray instead.\n\n"
6262
]
6363
},
6464
{
6565
"cell_type": "markdown",
6666
"metadata": {},
6767
"source": [
68-
"We can compare the above output to the plot of the decision tree.\n\n"
68+
"We can compare the above output to the plot of the decision tree.\nHere, we show the proportions of samples of each class that reach each\nnode corresponding to the actual elements of `tree_.value` array.\n\n"
6969
]
7070
},
7171
{
@@ -76,7 +76,7 @@
7676
},
7777
"outputs": [],
7878
"source": [
79-
"tree.plot_tree(clf)\nplt.show()"
79+
"tree.plot_tree(clf, proportion=True)\nplt.show()"
8080
]
8181
},
8282
{

dev/_downloads/scikit-learn-docs.zip

2.56 KB
Binary file not shown.
-126 Bytes
-200 Bytes
-79 Bytes
-66 Bytes
-85 Bytes
6 Bytes
-1.88 KB
152 Bytes
-93 Bytes
57 Bytes
75 Bytes
88 Bytes
-40 Bytes
47 Bytes
-96 Bytes
17 Bytes
15 Bytes
-159 Bytes
-28 Bytes
28 Bytes
78 Bytes
89 Bytes
423 Bytes
948 Bytes
108 Bytes
-18 Bytes
24 Bytes
91 Bytes
2 Bytes

dev/_sources/auto_examples/applications/plot_cyclical_feature_engineering.rst.txt

Lines changed: 1 addition & 1 deletion

dev/_sources/auto_examples/applications/plot_digits_denoising.rst.txt

Lines changed: 1 addition & 1 deletion

dev/_sources/auto_examples/applications/plot_face_recognition.rst.txt

Lines changed: 5 additions & 5 deletions

dev/_sources/auto_examples/applications/plot_model_complexity_influence.rst.txt

Lines changed: 15 additions & 15 deletions

0 commit comments

Comments
 (0)