Skip to content

Commit f46d224

Browse files
committed
Pushing the docs to dev/ for branch: main, commit ee4e1637cede6d6d3176233e8e6f134081d4a638
1 parent 1c563a7 commit f46d224

File tree

1,592 files changed

+8397
-10034
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,592 files changed

+8397
-10034
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: a1dc9aa6e6536d644aee02956e53fe9a
3+
config: 275383862e516939c5e61555821f4dee
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/09c15b8ca914c1951a06a9ce3431460f/plot_ols_ridge_variance.ipynb

Lines changed: 0 additions & 43 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/5ff7ffaf8076af51ffb8c5732f697c8e/plot_ols.py

Lines changed: 0 additions & 97 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/9d9fb0a68272db3e5ae8ad0da4cbcd69/plot_ols_ridge_variance.py

Lines changed: 0 additions & 62 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
"""
2+
===========================================
3+
Ordinary Least Squares and Ridge Regression
4+
===========================================
5+
6+
1. Ordinary Least Squares:
7+
We illustrate how to use the ordinary least squares (OLS) model,
8+
:class:`~sklearn.linear_model.LinearRegression`, on a single feature of
9+
the diabetes dataset. We train on a subset of the data, evaluate on a
10+
test set, and visualize the predictions.
11+
12+
2. Ordinary Least Squares and Ridge Regression Variance:
13+
We then show how OLS can have high variance when the data is sparse or
14+
noisy, by fitting on a very small synthetic sample repeatedly. Ridge
15+
regression, :class:`~sklearn.linear_model.Ridge`, reduces this variance
16+
by penalizing (shrinking) the coefficients, leading to more stable
17+
predictions.
18+
19+
"""
20+
21+
# Authors: The scikit-learn developers
22+
# SPDX-License-Identifier: BSD-3-Clause
23+
24+
# %%
25+
# Data Loading and Preparation
26+
# ----------------------------
27+
#
28+
# Load the diabetes dataset. For simplicity, we only keep a single feature in the data.
29+
# Then, we split the data and target into training and test sets.
30+
from sklearn.datasets import load_diabetes
31+
from sklearn.model_selection import train_test_split
32+
33+
X, y = load_diabetes(return_X_y=True)
34+
X = X[:, [2]] # Use only one feature
35+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20, shuffle=False)
36+
37+
# %%
38+
# Linear regression model
39+
# -----------------------
40+
#
41+
# We create a linear regression model and fit it on the training data. Note that by
42+
# default, an intercept is added to the model. We can control this behavior by setting
43+
# the `fit_intercept` parameter.
44+
from sklearn.linear_model import LinearRegression
45+
46+
regressor = LinearRegression().fit(X_train, y_train)
47+
48+
# %%
49+
# Model evaluation
50+
# ----------------
51+
#
52+
# We evaluate the model's performance on the test set using the mean squared error
53+
# and the coefficient of determination.
54+
from sklearn.metrics import mean_squared_error, r2_score
55+
56+
y_pred = regressor.predict(X_test)
57+
58+
print(f"Mean squared error: {mean_squared_error(y_test, y_pred):.2f}")
59+
print(f"Coefficient of determination: {r2_score(y_test, y_pred):.2f}")
60+
61+
# %%
62+
# Plotting the results
63+
# --------------------
64+
#
65+
# Finally, we visualize the results on the train and test data.
66+
import matplotlib.pyplot as plt
67+
68+
fig, ax = plt.subplots(ncols=2, figsize=(10, 5), sharex=True, sharey=True)
69+
70+
ax[0].scatter(X_train, y_train, label="Train data points")
71+
ax[0].plot(
72+
X_train,
73+
regressor.predict(X_train),
74+
linewidth=3,
75+
color="tab:orange",
76+
label="Model predictions",
77+
)
78+
ax[0].set(xlabel="Feature", ylabel="Target", title="Train set")
79+
ax[0].legend()
80+
81+
ax[1].scatter(X_test, y_test, label="Test data points")
82+
ax[1].plot(X_test, y_pred, linewidth=3, color="tab:orange", label="Model predictions")
83+
ax[1].set(xlabel="Feature", ylabel="Target", title="Test set")
84+
ax[1].legend()
85+
86+
fig.suptitle("Linear Regression")
87+
88+
plt.show()
89+
90+
# %%
91+
#
92+
# OLS on this single-feature subset learns a linear function that minimizes
93+
# the mean squared error on the training data. We can see how well (or poorly)
94+
# it generalizes by looking at the R^2 score and mean squared error on the
95+
# test set. In higher dimensions, pure OLS often overfits, especially if the
96+
# data is noisy. Regularization techniques (like Ridge or Lasso) can help
97+
# reduce that.
98+
99+
# %%
100+
# Ordinary Least Squares and Ridge Regression Variance
101+
# ----------------------------------------------------------
102+
#
103+
# Next, we illustrate the problem of high variance more clearly by using
104+
# a tiny synthetic dataset. We sample only two data points, then repeatedly
105+
# add small Gaussian noise to them and refit both OLS and Ridge. We plot
106+
# each new line to see how much OLS can jump around, whereas Ridge remains
107+
# more stable thanks to its penalty term.
108+
109+
110+
import matplotlib.pyplot as plt
111+
import numpy as np
112+
113+
from sklearn import linear_model
114+
115+
X_train = np.c_[0.5, 1].T
116+
y_train = [0.5, 1]
117+
X_test = np.c_[0, 2].T
118+
119+
np.random.seed(0)
120+
121+
classifiers = dict(
122+
ols=linear_model.LinearRegression(), ridge=linear_model.Ridge(alpha=0.1)
123+
)
124+
125+
for name, clf in classifiers.items():
126+
fig, ax = plt.subplots(figsize=(4, 3))
127+
128+
for _ in range(6):
129+
this_X = 0.1 * np.random.normal(size=(2, 1)) + X_train
130+
clf.fit(this_X, y_train)
131+
132+
ax.plot(X_test, clf.predict(X_test), color="gray")
133+
ax.scatter(this_X, y_train, s=3, c="gray", marker="o", zorder=10)
134+
135+
clf.fit(X_train, y_train)
136+
ax.plot(X_test, clf.predict(X_test), linewidth=2, color="blue")
137+
ax.scatter(X_train, y_train, s=30, c="red", marker="+", zorder=10)
138+
139+
ax.set_title(name)
140+
ax.set_xlim(0, 2)
141+
ax.set_ylim((0, 1.6))
142+
ax.set_xlabel("X")
143+
ax.set_ylabel("y")
144+
145+
fig.tight_layout()
146+
147+
plt.show()
148+
149+
150+
# %%
151+
# Conclusion
152+
# ----------
153+
#
154+
# - In the first example, we applied OLS to a real dataset, showing
155+
# how a plain linear model can fit the data by minimizing the squared error
156+
# on the training set.
157+
#
158+
# - In the second example, OLS lines varied drastically each time noise
159+
# was added, reflecting its high variance when data is sparse or noisy. By
160+
# contrast, **Ridge** regression introduces a regularization term that shrinks
161+
# the coefficients, stabilizing predictions.
162+
#
163+
# Techniques like :class:`~sklearn.linear_model.Ridge` or
164+
# :class:`~sklearn.linear_model.Lasso` (which applies an L1 penalty) are both
165+
# common ways to improve generalization and reduce overfitting. A well-tuned
166+
# Ridge or Lasso often outperforms pure OLS when features are correlated, data
167+
# is noisy, or sample size is small.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)