Skip to content

Commit 4396440

Browse files
committed
Merge tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "This time (again) cpufreq gets the majority of changes which mostly are driver updates (including a major consolidation of intel_pstate), some schedutil governor modifications and core cleanups. There also are some changes in the system suspend area, mostly related to diagnostics and debug messages plus some renames of things related to suspend-to-idle. One major change here is that suspend-to-idle is now going to be preferred over S3 on systems where the ACPI tables indicate to do so and provide requsite support (the Low Power Idle S0 _DSM in particular). The system sleep documentation and the tools related to it are updated too. The rest is a few cpuidle changes (nothing major), devfreq updates, generic power domains (genpd) framework updates and a few assorted modifications elsewhere. Specifics: - Drop the P-state selection algorithm based on a PID controller from intel_pstate and make it use the same P-state selection method (based on the CPU load) for all types of systems in the active mode (Rafael Wysocki, Srinivas Pandruvada). - Rework the cpufreq core and governors to make it possible to take cross-CPU utilization updates into account and modify the schedutil governor to actually do so (Viresh Kumar). - Clean up the handling of transition latency information in the cpufreq core and untangle it from the information on which drivers cannot do dynamic frequency switching (Viresh Kumar). - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek cpufreq driver and update its DT bindings (Sean Wang). - Modify the cpufreq dt-platdev driver to autimatically create cpufreq devices for the new (v2) Operating Performance Points (OPP) DT bindings and update its whitelist of supported systems (Viresh Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley Xiao). - Add support for Ux500 to the cpufreq-dt driver and drop the obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann). - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem Nguyen). - Fix and clean up assorted issues in the cpufreq drivers and core (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva, Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla). - Update the IO-wait boost handling in the schedutil governor to make it less aggressive (Joel Fernandes). - Rework system suspend diagnostics to make it print fewer messages to the kernel log by default, add a sysfs knob to allow more suspend-related messages to be printed and add Low Power S0 Idle constraints checks to the ACPI suspend-to-idle code (Rafael Wysocki, Srinivas Pandruvada). - Prefer suspend-to-idle over S3 on ACPI-based systems with the ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM interface present in the ACPI tables (Rafael Wysocki). - Update documentation related to system sleep and rename a number of items in the code to make it cleare that they are related to suspend-to-idle (Rafael Wysocki). - Export a variable allowing device drivers to check the target system sleep state from the core system suspend code (Florian Fainelli). - Clean up the cpuidle subsystem to handle the polling state on x86 in a more straightforward way and to use %pOF instead of full_name (Rafael Wysocki, Rob Herring). - Update the devfreq framework to fix and clean up a few minor issues (Chanwoo Choi, Rob Herring). - Extend diagnostics in the generic power domains (genpd) framework and clean it up slightly (Thara Gopinath, Rob Herring). - Fix and clean up a couple of issues in the operating performance points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz). - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling (AVS) driver (David Wu). - Fix the usage of notifiers in CPU power management on some platforms (Alex Shi). - Update the pm-graph system suspend/hibernation and boot profiling utility (Todd Brandt). - Make it possible to run the cpupower utility without CPU0 (Prarit Bhargava)" * tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits) cpuidle: Make drivers initialize polling state cpuidle: Move polling state initialization code to separate file cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol cpufreq: imx6q: Fix imx6sx low frequency support cpufreq: speedstep-lib: make several arrays static, makes code smaller PM: docs: Delete the obsolete states.txt document PM: docs: Describe high-level PM strategies and sleep states PM / devfreq: Fix memory leak when fail to register device PM / devfreq: Add dependency on PM_OPP PM / devfreq: Move private devfreq_update_stats() into devfreq PM / devfreq: Convert to using %pOF instead of full_name PM / AVS: rockchip-io: add io selectors and supplies for RV1108 cpufreq: ti: Fix 'of_node_put' being called twice in error handling path cpufreq: dt-platdev: Drop few entries from whitelist cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2 ARM: ux500: don't select CPUFREQ_DT cpuidle: Convert to using %pOF instead of full_name cpufreq: Convert to using %pOF instead of full_name PM / Domains: Convert to using %pOF instead of full_name cpufreq: Cap the default transition delay value to 10 ms ...
2 parents b42a362 + d97561f commit 4396440

File tree

101 files changed

+2835
-1746
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+2835
-1746
lines changed

Documentation/ABI/testing/sysfs-power

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,3 +273,15 @@ Description:
273273

274274
This output is useful for system wakeup diagnostics of spurious
275275
wakeup interrupts.
276+
277+
What: /sys/power/pm_debug_messages
278+
Date: July 2017
279+
Contact: Rafael J. Wysocki <[email protected]>
280+
Description:
281+
The /sys/power/pm_debug_messages file controls the printing
282+
of debug messages from the system suspend/hiberbation
283+
infrastructure to the kernel log.
284+
285+
Writing a "1" to this file enables the debug messages and
286+
writing a "0" (default) to it disables them. Reads from
287+
this file return the current value.

Documentation/admin-guide/pm/cpufreq.rst

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -479,14 +479,6 @@ This governor exposes the following tunables:
479479

480480
# echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
481481

482-
483-
``min_sampling_rate``
484-
The minimum value of ``sampling_rate``.
485-
486-
Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and
487-
:c:data:`tick_nohz_active` are both set or to 20 times the value of
488-
:c:data:`jiffies` in microseconds otherwise.
489-
490482
``up_threshold``
491483
If the estimated CPU load is above this value (in percent), the governor
492484
will set the frequency to the maximum value allowed for the policy.

Documentation/admin-guide/pm/index.rst

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,6 @@ Power Management
55
.. toctree::
66
:maxdepth: 2
77

8-
cpufreq
9-
intel_pstate
10-
11-
.. only:: subproject and html
12-
13-
Indices
14-
=======
15-
16-
* :ref:`genindex`
8+
strategies
9+
system-wide
10+
working-state

Documentation/admin-guide/pm/intel_pstate.rst

Lines changed: 8 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -167,35 +167,17 @@ is set.
167167
``powersave``
168168
.............
169169

170-
Without HWP, this P-state selection algorithm generally depends on the
171-
processor model and/or the system profile setting in the ACPI tables and there
172-
are two variants of it.
173-
174-
One of them is used with processors from the Atom line and (regardless of the
175-
processor model) on platforms with the system profile in the ACPI tables set to
176-
"mobile" (laptops mostly), "tablet", "appliance PC", "desktop", or
177-
"workstation". It is also used with processors supporting the HWP feature if
178-
that feature has not been enabled (that is, with the ``intel_pstate=no_hwp``
179-
argument in the kernel command line). It is similar to the algorithm
170+
Without HWP, this P-state selection algorithm is similar to the algorithm
180171
implemented by the generic ``schedutil`` scaling governor except that the
181172
utilization metric used by it is based on numbers coming from feedback
182173
registers of the CPU. It generally selects P-states proportional to the
183-
current CPU utilization, so it is referred to as the "proportional" algorithm.
184-
185-
The second variant of the ``powersave`` P-state selection algorithm, used in all
186-
of the other cases (generally, on processors from the Core line, so it is
187-
referred to as the "Core" algorithm), is based on the values read from the APERF
188-
and MPERF feedback registers and the previously requested target P-state.
189-
It does not really take CPU utilization into account explicitly, but as a rule
190-
it causes the CPU P-state to ramp up very quickly in response to increased
191-
utilization which is generally desirable in server environments.
192-
193-
Regardless of the variant, this algorithm is run by the driver's utilization
194-
update callback for the given CPU when it is invoked by the CPU scheduler, but
195-
not more often than every 10 ms (that can be tweaked via ``debugfs`` in `this
196-
particular case <Tuning Interface in debugfs_>`_). Like in the ``performance``
197-
case, the hardware configuration is not touched if the new P-state turns out to
198-
be the same as the current one.
174+
current CPU utilization.
175+
176+
This algorithm is run by the driver's utilization update callback for the
177+
given CPU when it is invoked by the CPU scheduler, but not more often than
178+
every 10 ms. Like in the ``performance`` case, the hardware configuration
179+
is not touched if the new P-state turns out to be the same as the current
180+
one.
199181

200182
This is the default P-state selection algorithm if the
201183
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
@@ -720,34 +702,7 @@ P-state is called, the ``ftrace`` filter can be set to to
720702
gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
721703
<idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
722704

723-
Tuning Interface in ``debugfs``
724-
-------------------------------
725-
726-
The ``powersave`` algorithm provided by ``intel_pstate`` for `the Core line of
727-
processors in the active mode <powersave_>`_ is based on a `PID controller`_
728-
whose parameters were chosen to address a number of different use cases at the
729-
same time. However, it still is possible to fine-tune it to a specific workload
730-
and the ``debugfs`` interface under ``/sys/kernel/debug/pstate_snb/`` is
731-
provided for this purpose. [Note that the ``pstate_snb`` directory will be
732-
present only if the specific P-state selection algorithm matching the interface
733-
in it actually is in use.]
734-
735-
The following files present in that directory can be used to modify the PID
736-
controller parameters at run time:
737-
738-
| ``deadband``
739-
| ``d_gain_pct``
740-
| ``i_gain_pct``
741-
| ``p_gain_pct``
742-
| ``sample_rate_ms``
743-
| ``setpoint``
744-
745-
Note, however, that achieving desirable results this way generally requires
746-
expert-level understanding of the power vs performance tradeoff, so extra care
747-
is recommended when attempting to do that.
748-
749705

750706
.. _LCEU2015: http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
751707
.. _SDM: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
752708
.. _ACPI specification: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
753-
.. _PID controller: https://en.wikipedia.org/wiki/PID_controller
Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
===================
2+
System Sleep States
3+
===================
4+
5+
::
6+
7+
Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <[email protected]>
8+
9+
Sleep states are global low-power states of the entire system in which user
10+
space code cannot be executed and the overall system activity is significantly
11+
reduced.
12+
13+
14+
Sleep States That Can Be Supported
15+
==================================
16+
17+
Depending on its configuration and the capabilities of the platform it runs on,
18+
the Linux kernel can support up to four system sleep states, includig
19+
hibernation and up to three variants of system suspend. The sleep states that
20+
can be supported by the kernel are listed below.
21+
22+
.. _s2idle:
23+
24+
Suspend-to-Idle
25+
---------------
26+
27+
This is a generic, pure software, light-weight variant of system suspend (also
28+
referred to as S2I or S2Idle). It allows more energy to be saved relative to
29+
runtime idle by freezing user space, suspending the timekeeping and putting all
30+
I/O devices into low-power states (possibly lower-power than available in the
31+
working state), such that the processors can spend time in their deepest idle
32+
states while the system is suspended.
33+
34+
The system is woken up from this state by in-band interrupts, so theoretically
35+
any devices that can cause interrupts to be generated in the working state can
36+
also be set up as wakeup devices for S2Idle.
37+
38+
This state can be used on platforms without support for :ref:`standby <standby>`
39+
or :ref:`suspend-to-RAM <s2ram>`, or it can be used in addition to any of the
40+
deeper system suspend variants to provide reduced resume latency. It is always
41+
supported if the :c:macro:`CONFIG_SUSPEND` kernel configuration option is set.
42+
43+
.. _standby:
44+
45+
Standby
46+
-------
47+
48+
This state, if supported, offers moderate, but real, energy savings, while
49+
providing a relatively straightforward transition back to the working state. No
50+
operating state is lost (the system core logic retains power), so the system can
51+
go back to where it left off easily enough.
52+
53+
In addition to freezing user space, suspending the timekeeping and putting all
54+
I/O devices into low-power states, which is done for :ref:`suspend-to-idle
55+
<s2idle>` too, nonboot CPUs are taken offline and all low-level system functions
56+
are suspended during transitions into this state. For this reason, it should
57+
allow more energy to be saved relative to :ref:`suspend-to-idle <s2idle>`, but
58+
the resume latency will generally be greater than for that state.
59+
60+
The set of devices that can wake up the system from this state usually is
61+
reduced relative to :ref:`suspend-to-idle <s2idle>` and it may be necessary to
62+
rely on the platform for setting up the wakeup functionality as appropriate.
63+
64+
This state is supported if the :c:macro:`CONFIG_SUSPEND` kernel configuration
65+
option is set and the support for it is registered by the platform with the
66+
core system suspend subsystem. On ACPI-based systems this state is mapped to
67+
the S1 system state defined by ACPI.
68+
69+
.. _s2ram:
70+
71+
Suspend-to-RAM
72+
--------------
73+
74+
This state (also referred to as STR or S2RAM), if supported, offers significant
75+
energy savings as everything in the system is put into a low-power state, except
76+
for memory, which should be placed into the self-refresh mode to retain its
77+
contents. All of the steps carried out when entering :ref:`standby <standby>`
78+
are also carried out during transitions to S2RAM. Additional operations may
79+
take place depending on the platform capabilities. In particular, on ACPI-based
80+
systems the kernel passes control to the platform firmware (BIOS) as the last
81+
step during S2RAM transitions and that usually results in powering down some
82+
more low-level components that are not directly controlled by the kernel.
83+
84+
The state of devices and CPUs is saved and held in memory. All devices are
85+
suspended and put into low-power states. In many cases, all peripheral buses
86+
lose power when entering S2RAM, so devices must be able to handle the transition
87+
back to the "on" state.
88+
89+
On ACPI-based systems S2RAM requires some minimal boot-strapping code in the
90+
platform firmware to resume the system from it. This may be the case on other
91+
platforms too.
92+
93+
The set of devices that can wake up the system from S2RAM usually is reduced
94+
relative to :ref:`suspend-to-idle <s2idle>` and :ref:`standby <standby>` and it
95+
may be necessary to rely on the platform for setting up the wakeup functionality
96+
as appropriate.
97+
98+
S2RAM is supported if the :c:macro:`CONFIG_SUSPEND` kernel configuration option
99+
is set and the support for it is registered by the platform with the core system
100+
suspend subsystem. On ACPI-based systems it is mapped to the S3 system state
101+
defined by ACPI.
102+
103+
.. _hibernation:
104+
105+
Hibernation
106+
-----------
107+
108+
This state (also referred to as Suspend-to-Disk or STD) offers the greatest
109+
energy savings and can be used even in the absence of low-level platform support
110+
for system suspend. However, it requires some low-level code for resuming the
111+
system to be present for the underlying CPU architecture.
112+
113+
Hibernation is significantly different from any of the system suspend variants.
114+
It takes three system state changes to put it into hibernation and two system
115+
state changes to resume it.
116+
117+
First, when hibernation is triggered, the kernel stops all system activity and
118+
creates a snapshot image of memory to be written into persistent storage. Next,
119+
the system goes into a state in which the snapshot image can be saved, the image
120+
is written out and finally the system goes into the target low-power state in
121+
which power is cut from almost all of its hardware components, including memory,
122+
except for a limited set of wakeup devices.
123+
124+
Once the snapshot image has been written out, the system may either enter a
125+
special low-power state (like ACPI S4), or it may simply power down itself.
126+
Powering down means minimum power draw and it allows this mechanism to work on
127+
any system. However, entering a special low-power state may allow additional
128+
means of system wakeup to be used (e.g. pressing a key on the keyboard or
129+
opening a laptop lid).
130+
131+
After wakeup, control goes to the platform firmware that runs a boot loader
132+
which boots a fresh instance of the kernel (control may also go directly to
133+
the boot loader, depending on the system configuration, but anyway it causes
134+
a fresh instance of the kernel to be booted). That new instance of the kernel
135+
(referred to as the ``restore kernel``) looks for a hibernation image in
136+
persistent storage and if one is found, it is loaded into memory. Next, all
137+
activity in the system is stopped and the restore kernel overwrites itself with
138+
the image contents and jumps into a special trampoline area in the original
139+
kernel stored in the image (referred to as the ``image kernel``), which is where
140+
the special architecture-specific low-level code is needed. Finally, the
141+
image kernel restores the system to the pre-hibernation state and allows user
142+
space to run again.
143+
144+
Hibernation is supported if the :c:macro:`CONFIG_HIBERNATION` kernel
145+
configuration option is set. However, this option can only be set if support
146+
for the given CPU architecture includes the low-level code for system resume.
147+
148+
149+
Basic ``sysfs`` Interfaces for System Suspend and Hibernation
150+
=============================================================
151+
152+
The following files located in the :file:`/sys/power/` directory can be used by
153+
user space for sleep states control.
154+
155+
``state``
156+
This file contains a list of strings representing sleep states supported
157+
by the kernel. Writing one of these strings into it causes the kernel
158+
to start a transition of the system into the sleep state represented by
159+
that string.
160+
161+
In particular, the strings "disk", "freeze" and "standby" represent the
162+
:ref:`hibernation <hibernation>`, :ref:`suspend-to-idle <s2idle>` and
163+
:ref:`standby <standby>` sleep states, respectively. The string "mem"
164+
is interpreted in accordance with the contents of the ``mem_sleep`` file
165+
described below.
166+
167+
If the kernel does not support any system sleep states, this file is
168+
not present.
169+
170+
``mem_sleep``
171+
This file contains a list of strings representing supported system
172+
suspend variants and allows user space to select the variant to be
173+
associated with the "mem" string in the ``state`` file described above.
174+
175+
The strings that may be present in this file are "s2idle", "shallow"
176+
and "deep". The string "s2idle" always represents :ref:`suspend-to-idle
177+
<s2idle>` and, by convention, "shallow" and "deep" represent
178+
:ref:`standby <standby>` and :ref:`suspend-to-RAM <s2ram>`,
179+
respectively.
180+
181+
Writing one of the listed strings into this file causes the system
182+
suspend variant represented by it to be associated with the "mem" string
183+
in the ``state`` file. The string representing the suspend variant
184+
currently associated with the "mem" string in the ``state`` file
185+
is listed in square brackets.
186+
187+
If the kernel does not support system suspend, this file is not present.
188+
189+
``disk``
190+
This file contains a list of strings representing different operations
191+
that can be carried out after the hibernation image has been saved. The
192+
possible options are as follows:
193+
194+
``platform``
195+
Put the system into a special low-power state (e.g. ACPI S4) to
196+
make additional wakeup options available and possibly allow the
197+
platform firmware to take a simplified initialization path after
198+
wakeup.
199+
200+
``shutdown``
201+
Power off the system.
202+
203+
``reboot``
204+
Reboot the system (useful for diagnostics mostly).
205+
206+
``suspend``
207+
Hybrid system suspend. Put the system into the suspend sleep
208+
state selected through the ``mem_sleep`` file described above.
209+
If the system is successfully woken up from that state, discard
210+
the hibernation image and continue. Otherwise, use the image
211+
to restore the previous state of the system.
212+
213+
``test_resume``
214+
Diagnostic operation. Load the image as though the system had
215+
just woken up from hibernation and the currently running kernel
216+
instance was a restore kernel and follow up with full system
217+
resume.
218+
219+
Writing one of the listed strings into this file causes the option
220+
represented by it to be selected.
221+
222+
The currently selected option is shown in square brackets which means
223+
that the operation represented by it will be carried out after creating
224+
and saving the image next time hibernation is triggered by writing
225+
``disk`` to :file:`/sys/power/state`.
226+
227+
If the kernel does not support hibernation, this file is not present.
228+
229+
According to the above, there are two ways to make the system go into the
230+
:ref:`suspend-to-idle <s2idle>` state. The first one is to write "freeze"
231+
directly to :file:`/sys/power/state`. The second one is to write "s2idle" to
232+
:file:`/sys/power/mem_sleep` and then to write "mem" to
233+
:file:`/sys/power/state`. Likewise, there are two ways to make the system go
234+
into the :ref:`standby <standby>` state (the strings to write to the control
235+
files in that case are "standby" or "shallow" and "mem", respectively) if that
236+
state is supported by the platform. However, there is only one way to make the
237+
system go into the :ref:`suspend-to-RAM <s2ram>` state (write "deep" into
238+
:file:`/sys/power/mem_sleep` and "mem" into :file:`/sys/power/state`).
239+
240+
The default suspend variant (ie. the one to be used without writing anything
241+
into :file:`/sys/power/mem_sleep`) is either "deep" (on the majority of systems
242+
supporting :ref:`suspend-to-RAM <s2ram>`) or "s2idle", but it can be overridden
243+
by the value of the "mem_sleep_default" parameter in the kernel command line.
244+
On some ACPI-based systems, depending on the information in the ACPI tables, the
245+
default may be "s2idle" even if :ref:`suspend-to-RAM <s2ram>` is supported.

0 commit comments

Comments
 (0)