|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +.. include:: <isonum.txt> |
| 3 | + |
| 4 | +========================= |
| 5 | +System Suspend Code Flows |
| 6 | +========================= |
| 7 | + |
| 8 | +:Copyright: |copy| 2020 Intel Corporation |
| 9 | + |
| 10 | +:Author: Rafael J. Wysocki < [email protected]> |
| 11 | + |
| 12 | +At least one global system-wide transition needs to be carried out for the |
| 13 | +system to get from the working state into one of the supported |
| 14 | +:doc:`sleep states <sleep-states>`. Hibernation requires more than one |
| 15 | +transition to occur for this purpose, but the other sleep states, commonly |
| 16 | +referred to as *system-wide suspend* (or simply *system suspend*) states, need |
| 17 | +only one. |
| 18 | + |
| 19 | +For those sleep states, the transition from the working state of the system into |
| 20 | +the target sleep state is referred to as *system suspend* too (in the majority |
| 21 | +of cases, whether this means a transition or a sleep state of the system should |
| 22 | +be clear from the context) and the transition back from the sleep state into the |
| 23 | +working state is referred to as *system resume*. |
| 24 | + |
| 25 | +The kernel code flows associated with the suspend and resume transitions for |
| 26 | +different sleep states of the system are quite similar, but there are some |
| 27 | +significant differences between the :ref:`suspend-to-idle <s2idle>` code flows |
| 28 | +and the code flows related to the :ref:`suspend-to-RAM <s2ram>` and |
| 29 | +:ref:`standby <standby>` sleep states. |
| 30 | + |
| 31 | +The :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states |
| 32 | +cannot be implemented without platform support and the difference between them |
| 33 | +boils down to the platform-specific actions carried out by the suspend and |
| 34 | +resume hooks that need to be provided by the platform driver to make them |
| 35 | +available. Apart from that, the suspend and resume code flows for these sleep |
| 36 | +states are mostly identical, so they both together will be referred to as |
| 37 | +*platform-dependent suspend* states in what follows. |
| 38 | + |
| 39 | + |
| 40 | +.. _s2idle_suspend: |
| 41 | + |
| 42 | +Suspend-to-idle Suspend Code Flow |
| 43 | +================================= |
| 44 | + |
| 45 | +The following steps are taken in order to transition the system from the working |
| 46 | +state to the :ref:`suspend-to-idle <s2idle>` sleep state: |
| 47 | + |
| 48 | + 1. Invoking system-wide suspend notifiers. |
| 49 | + |
| 50 | + Kernel subsystems can register callbacks to be invoked when the suspend |
| 51 | + transition is about to occur and when the resume transition has finished. |
| 52 | + |
| 53 | + That allows them to prepare for the change of the system state and to clean |
| 54 | + up after getting back to the working state. |
| 55 | + |
| 56 | + 2. Freezing tasks. |
| 57 | + |
| 58 | + Tasks are frozen primarily in order to avoid unchecked hardware accesses |
| 59 | + from user space through MMIO regions or I/O registers exposed directly to |
| 60 | + it and to prevent user space from entering the kernel while the next step |
| 61 | + of the transition is in progress (which might have been problematic for |
| 62 | + various reasons). |
| 63 | + |
| 64 | + All user space tasks are intercepted as though they were sent a signal and |
| 65 | + put into uninterruptible sleep until the end of the subsequent system resume |
| 66 | + transition. |
| 67 | + |
| 68 | + The kernel threads that choose to be frozen during system suspend for |
| 69 | + specific reasons are frozen subsequently, but they are not intercepted. |
| 70 | + Instead, they are expected to periodically check whether or not they need |
| 71 | + to be frozen and to put themselves into uninterruptible sleep if so. [Note, |
| 72 | + however, that kernel threads can use locking and other concurrency controls |
| 73 | + available in kernel space to synchronize themselves with system suspend and |
| 74 | + resume, which can be much more precise than the freezing, so the latter is |
| 75 | + not a recommended option for kernel threads.] |
| 76 | + |
| 77 | + 3. Suspending devices and reconfiguring IRQs. |
| 78 | + |
| 79 | + Devices are suspended in four phases called *prepare*, *suspend*, |
| 80 | + *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more |
| 81 | + information on what exactly happens in each phase). |
| 82 | + |
| 83 | + Every device is visited in each phase, but typically it is not physically |
| 84 | + accessed in more than two of them. |
| 85 | + |
| 86 | + The runtime PM API is disabled for every device during the *late* suspend |
| 87 | + phase and high-level ("action") interrupt handlers are prevented from being |
| 88 | + invoked before the *noirq* suspend phase. |
| 89 | + |
| 90 | + Interrupts are still handled after that, but they are only acknowledged to |
| 91 | + interrupt controllers without performing any device-specific actions that |
| 92 | + would be triggered in the working state of the system (those actions are |
| 93 | + deferred till the subsequent system resume transition as described |
| 94 | + `below <s2idle_resume_>`_). |
| 95 | + |
| 96 | + IRQs associated with system wakeup devices are "armed" so that the resume |
| 97 | + transition of the system is started when one of them signals an event. |
| 98 | + |
| 99 | + 4. Freezing the scheduler tick and suspending timekeeping. |
| 100 | + |
| 101 | + When all devices have been suspended, CPUs enter the idle loop and are put |
| 102 | + into the deepest available idle state. While doing that, each of them |
| 103 | + "freezes" its own scheduler tick so that the timer events associated with |
| 104 | + the tick do not occur until the CPU is woken up by another interrupt source. |
| 105 | + |
| 106 | + The last CPU to enter the idle state also stops the timekeeping which |
| 107 | + (among other things) prevents high resolution timers from triggering going |
| 108 | + forward until the first CPU that is woken up restarts the timekeeping. |
| 109 | + That allows the CPUs to stay in the deep idle state relatively long in one |
| 110 | + go. |
| 111 | + |
| 112 | + From this point on, the CPUs can only be woken up by non-timer hardware |
| 113 | + interrupts. If that happens, they go back to the idle state unless the |
| 114 | + interrupt that woke up one of them comes from an IRQ that has been armed for |
| 115 | + system wakeup, in which case the system resume transition is started. |
| 116 | + |
| 117 | + |
| 118 | +.. _s2idle_resume: |
| 119 | + |
| 120 | +Suspend-to-idle Resume Code Flow |
| 121 | +================================ |
| 122 | + |
| 123 | +The following steps are taken in order to transition the system from the |
| 124 | +:ref:`suspend-to-idle <s2idle>` sleep state into the working state: |
| 125 | + |
| 126 | + 1. Resuming timekeeping and unfreezing the scheduler tick. |
| 127 | + |
| 128 | + When one of the CPUs is woken up (by a non-timer hardware interrupt), it |
| 129 | + leaves the idle state entered in the last step of the preceding suspend |
| 130 | + transition, restarts the timekeeping (unless it has been restarted already |
| 131 | + by another CPU that woke up earlier) and the scheduler tick on that CPU is |
| 132 | + unfrozen. |
| 133 | + |
| 134 | + If the interrupt that has woken up the CPU was armed for system wakeup, |
| 135 | + the system resume transition begins. |
| 136 | + |
| 137 | + 2. Resuming devices and restoring the working-state configuration of IRQs. |
| 138 | + |
| 139 | + Devices are resumed in four phases called *noirq resume*, *early resume*, |
| 140 | + *resume* and *complete* (see :ref:`driverapi_pm_devices` for more |
| 141 | + information on what exactly happens in each phase). |
| 142 | + |
| 143 | + Every device is visited in each phase, but typically it is not physically |
| 144 | + accessed in more than two of them. |
| 145 | + |
| 146 | + The working-state configuration of IRQs is restored after the *noirq* resume |
| 147 | + phase and the runtime PM API is re-enabled for every device whose driver |
| 148 | + supports it during the *early* resume phase. |
| 149 | + |
| 150 | + 3. Thawing tasks. |
| 151 | + |
| 152 | + Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_ |
| 153 | + transition are "thawed", which means that they are woken up from the |
| 154 | + uninterruptible sleep that they went into at that time and user space tasks |
| 155 | + are allowed to exit the kernel. |
| 156 | + |
| 157 | + 4. Invoking system-wide resume notifiers. |
| 158 | + |
| 159 | + This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition |
| 160 | + and the same set of callbacks is invoked at this point, but a different |
| 161 | + "notification type" parameter value is passed to them. |
| 162 | + |
| 163 | + |
| 164 | +Platform-dependent Suspend Code Flow |
| 165 | +==================================== |
| 166 | + |
| 167 | +The following steps are taken in order to transition the system from the working |
| 168 | +state to platform-dependent suspend state: |
| 169 | + |
| 170 | + 1. Invoking system-wide suspend notifiers. |
| 171 | + |
| 172 | + This step is the same as step 1 of the suspend-to-idle suspend transition |
| 173 | + described `above <s2idle_suspend_>`_. |
| 174 | + |
| 175 | + 2. Freezing tasks. |
| 176 | + |
| 177 | + This step is the same as step 2 of the suspend-to-idle suspend transition |
| 178 | + described `above <s2idle_suspend_>`_. |
| 179 | + |
| 180 | + 3. Suspending devices and reconfiguring IRQs. |
| 181 | + |
| 182 | + This step is analogous to step 3 of the suspend-to-idle suspend transition |
| 183 | + described `above <s2idle_suspend_>`_, but the arming of IRQs for system |
| 184 | + wakeup generally does not have any effect on the platform. |
| 185 | + |
| 186 | + There are platforms that can go into a very deep low-power state internally |
| 187 | + when all CPUs in them are in sufficiently deep idle states and all I/O |
| 188 | + devices have been put into low-power states. On those platforms, |
| 189 | + suspend-to-idle can reduce system power very effectively. |
| 190 | + |
| 191 | + On the other platforms, however, low-level components (like interrupt |
| 192 | + controllers) need to be turned off in a platform-specific way (implemented |
| 193 | + in the hooks provided by the platform driver) to achieve comparable power |
| 194 | + reduction. |
| 195 | + |
| 196 | + That usually prevents in-band hardware interrupts from waking up the system, |
| 197 | + which must be done in a special platform-dependent way. Then, the |
| 198 | + configuration of system wakeup sources usually starts when system wakeup |
| 199 | + devices are suspended and is finalized by the platform suspend hooks later |
| 200 | + on. |
| 201 | + |
| 202 | + 4. Disabling non-boot CPUs. |
| 203 | + |
| 204 | + On some platforms the suspend hooks mentioned above must run in a one-CPU |
| 205 | + configuration of the system (in particular, the hardware cannot be accessed |
| 206 | + by any code running in parallel with the platform suspend hooks that may, |
| 207 | + and often do, trap into the platform firmware in order to finalize the |
| 208 | + suspend transition). |
| 209 | + |
| 210 | + For this reason, the CPU offline/online (CPU hotplug) framework is used |
| 211 | + to take all of the CPUs in the system, except for one (the boot CPU), |
| 212 | + offline (typically, the CPUs that have been taken offline go into deep idle |
| 213 | + states). |
| 214 | + |
| 215 | + This means that all tasks are migrated away from those CPUs and all IRQs are |
| 216 | + rerouted to the only CPU that remains online. |
| 217 | + |
| 218 | + 5. Suspending core system components. |
| 219 | + |
| 220 | + This prepares the core system components for (possibly) losing power going |
| 221 | + forward and suspends the timekeeping. |
| 222 | + |
| 223 | + 6. Platform-specific power removal. |
| 224 | + |
| 225 | + This is expected to remove power from all of the system components except |
| 226 | + for the memory controller and RAM (in order to preserve the contents of the |
| 227 | + latter) and some devices designated for system wakeup. |
| 228 | + |
| 229 | + In many cases control is passed to the platform firmware which is expected |
| 230 | + to finalize the suspend transition as needed. |
| 231 | + |
| 232 | + |
| 233 | +Platform-dependent Resume Code Flow |
| 234 | +=================================== |
| 235 | + |
| 236 | +The following steps are taken in order to transition the system from a |
| 237 | +platform-dependent suspend state into the working state: |
| 238 | + |
| 239 | + 1. Platform-specific system wakeup. |
| 240 | + |
| 241 | + The platform is woken up by a signal from one of the designated system |
| 242 | + wakeup devices (which need not be an in-band hardware interrupt) and |
| 243 | + control is passed back to the kernel (the working configuration of the |
| 244 | + platform may need to be restored by the platform firmware before the |
| 245 | + kernel gets control again). |
| 246 | + |
| 247 | + 2. Resuming core system components. |
| 248 | + |
| 249 | + The suspend-time configuration of the core system components is restored and |
| 250 | + the timekeeping is resumed. |
| 251 | + |
| 252 | + 3. Re-enabling non-boot CPUs. |
| 253 | + |
| 254 | + The CPUs disabled in step 4 of the preceding suspend transition are taken |
| 255 | + back online and their suspend-time configuration is restored. |
| 256 | + |
| 257 | + 4. Resuming devices and restoring the working-state configuration of IRQs. |
| 258 | + |
| 259 | + This step is the same as step 2 of the suspend-to-idle suspend transition |
| 260 | + described `above <s2idle_resume_>`_. |
| 261 | + |
| 262 | + 5. Thawing tasks. |
| 263 | + |
| 264 | + This step is the same as step 3 of the suspend-to-idle suspend transition |
| 265 | + described `above <s2idle_resume_>`_. |
| 266 | + |
| 267 | + 6. Invoking system-wide resume notifiers. |
| 268 | + |
| 269 | + This step is the same as step 4 of the suspend-to-idle suspend transition |
| 270 | + described `above <s2idle_resume_>`_. |
0 commit comments