Skip to content

Commit 43033bc

Browse files
nathanlynchmpe
authored andcommitted
powerpc/pseries: add RTAS work area allocator
Various pseries-specific RTAS functions take a temporary "work area" parameter - a buffer in memory accessible to RTAS. Typically such functions are passed the statically allocated rtas_data_buf buffer as the argument. This buffer is protected by a global spinlock. So users of rtas_data_buf cannot perform sleeping operations while accessing the buffer. Most RTAS functions that have a work area parameter can return a status (-2/990x) that indicates that the caller should retry. Before retrying, the caller may need to reschedule or sleep (see rtas_busy_delay() for details). This combination of factors leads to uncomfortable constructions like this: do { spin_lock(&rtas_data_buf_lock); rc = rtas_call(token, __pa(rtas_data_buf, ...); if (rc == 0) { /* parse or copy out rtas_data_buf contents */ } spin_unlock(&rtas_data_buf_lock); } while (rtas_busy_delay(rc)); Another unfortunately common way of handling this is for callers to blithely ignore the possibility of a -2/990x status and hope for the best. If users were allowed to perform blocking operations while owning a work area, the programming model would become less tedious and error-prone. Users could schedule away, sleep, or perform other blocking operations without having to release and re-acquire resources. We could continue to use a single work area buffer, and convert rtas_data_buf_lock to a mutex. But that would impose an unnecessarily coarse serialization on all users. As awkward as the current design is, it prevents longer running operations that need to repeatedly use rtas_data_buf from blocking the progress of others. There are more considerations. One is that while 4KB is fine for all current in-kernel uses, some RTAS calls can take much smaller buffers, and some (VPD, platform dumps) would likely benefit from larger ones. Another is that at least one RTAS function (ibm,get-vpd) has *two* work area parameters. And finally, we should expect the number of work area users in the kernel to increase over time as we introduce lockdown-compatible ABIs to replace less safe use cases based on sys_rtas/librtas. So a special-purpose allocator for RTAS work area buffers seems worth trying. Properties: * The backing memory for the allocator is reserved early in boot in order to satisfy RTAS addressing requirements, and then managed with genalloc. * Allocations can block, but they never fail (mempool-like). * Prioritizes first-come, first-serve fairness over throughput. * Early boot allocations before the allocator has been initialized are served via an internal static buffer. Intended to replace rtas_data_buf. New code that needs RTAS work area buffers should prefer this API. Signed-off-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 24098f5 commit 43033bc

File tree

4 files changed

+309
-1
lines changed

4 files changed

+309
-1
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
/* SPDX-License-Identifier: GPL-2.0-only */
2+
#ifndef _ASM_POWERPC_RTAS_WORK_AREA_H
3+
#define _ASM_POWERPC_RTAS_WORK_AREA_H
4+
5+
#include <linux/build_bug.h>
6+
#include <linux/sizes.h>
7+
#include <linux/types.h>
8+
9+
#include <asm/page.h>
10+
11+
/**
12+
* struct rtas_work_area - RTAS work area descriptor.
13+
*
14+
* Descriptor for a "work area" in PAPR terminology that satisfies
15+
* RTAS addressing requirements.
16+
*/
17+
struct rtas_work_area {
18+
/* private: Use the APIs provided below. */
19+
char *buf;
20+
size_t size;
21+
};
22+
23+
enum {
24+
/* Maximum allocation size, enforced at build time. */
25+
RTAS_WORK_AREA_MAX_ALLOC_SZ = SZ_128K,
26+
};
27+
28+
/**
29+
* rtas_work_area_alloc() - Acquire a work area of the requested size.
30+
* @size_: Allocation size. Must be compile-time constant and not more
31+
* than %RTAS_WORK_AREA_MAX_ALLOC_SZ.
32+
*
33+
* Allocate a buffer suitable for passing to RTAS functions that have
34+
* a memory address parameter, often (but not always) referred to as a
35+
* "work area" in PAPR. Although callers are allowed to block while
36+
* holding a work area, the amount of memory reserved for this purpose
37+
* is limited, and allocations should be short-lived. A good guideline
38+
* is to release any allocated work area before returning from a
39+
* system call.
40+
*
41+
* This function does not fail. It blocks until the allocation
42+
* succeeds. To prevent deadlocks, callers are discouraged from
43+
* allocating more than one work area simultaneously in a single task
44+
* context.
45+
*
46+
* Context: This function may sleep.
47+
* Return: A &struct rtas_work_area descriptor for the allocated work area.
48+
*/
49+
#define rtas_work_area_alloc(size_) ({ \
50+
static_assert(__builtin_constant_p(size_)); \
51+
static_assert((size_) > 0); \
52+
static_assert((size_) <= RTAS_WORK_AREA_MAX_ALLOC_SZ); \
53+
__rtas_work_area_alloc(size_); \
54+
})
55+
56+
/*
57+
* Do not call __rtas_work_area_alloc() directly. Use
58+
* rtas_work_area_alloc().
59+
*/
60+
struct rtas_work_area *__rtas_work_area_alloc(size_t size);
61+
62+
/**
63+
* rtas_work_area_free() - Release a work area.
64+
* @area: Work area descriptor as returned from rtas_work_area_alloc().
65+
*
66+
* Return a work area buffer to the pool.
67+
*/
68+
void rtas_work_area_free(struct rtas_work_area *area);
69+
70+
static inline char *rtas_work_area_raw_buf(const struct rtas_work_area *area)
71+
{
72+
return area->buf;
73+
}
74+
75+
static inline size_t rtas_work_area_size(const struct rtas_work_area *area)
76+
{
77+
return area->size;
78+
}
79+
80+
static inline phys_addr_t rtas_work_area_phys(const struct rtas_work_area *area)
81+
{
82+
return __pa(area->buf);
83+
}
84+
85+
/*
86+
* Early setup for the work area allocator. Call from
87+
* rtas_initialize() only.
88+
*/
89+
90+
#ifdef CONFIG_PPC_PSERIES
91+
void rtas_work_area_reserve_arena(phys_addr_t limit);
92+
#else /* CONFIG_PPC_PSERIES */
93+
static inline void rtas_work_area_reserve_arena(phys_addr_t limit) {}
94+
#endif /* CONFIG_PPC_PSERIES */
95+
96+
#endif /* _ASM_POWERPC_RTAS_WORK_AREA_H */

arch/powerpc/kernel/rtas.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
#include <asm/machdep.h>
3737
#include <asm/mmu.h>
3838
#include <asm/page.h>
39+
#include <asm/rtas-work-area.h>
3940
#include <asm/rtas.h>
4041
#include <asm/time.h>
4142
#include <asm/trace.h>
@@ -1939,6 +1940,8 @@ void __init rtas_initialize(void)
19391940
#endif
19401941
ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
19411942
ibm_errinjct_token = rtas_token("ibm,errinjct");
1943+
1944+
rtas_work_area_reserve_arena(rtas_region);
19421945
}
19431946

19441947
int __init early_init_dt_scan_rtas(unsigned long node,

arch/powerpc/platforms/pseries/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
33
ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG
44

55
obj-y := lpar.o hvCall.o nvram.o reconfig.o \
6-
of_helpers.o \
6+
of_helpers.o rtas-work-area.o \
77
setup.o iommu.o event_sources.o ras.o \
88
firmware.o power.o dlpar.o mobility.o rng.o \
99
pci.o pci_dlpar.o eeh_pseries.o msi.o \
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
// SPDX-License-Identifier: GPL-2.0-only
2+
3+
#define pr_fmt(fmt) "rtas-work-area: " fmt
4+
5+
#include <linux/genalloc.h>
6+
#include <linux/log2.h>
7+
#include <linux/kernel.h>
8+
#include <linux/memblock.h>
9+
#include <linux/mempool.h>
10+
#include <linux/minmax.h>
11+
#include <linux/mutex.h>
12+
#include <linux/numa.h>
13+
#include <linux/sizes.h>
14+
#include <linux/wait.h>
15+
16+
#include <asm/machdep.h>
17+
#include <asm/rtas-work-area.h>
18+
#include <asm/rtas.h>
19+
20+
enum {
21+
/*
22+
* Ensure the pool is page-aligned.
23+
*/
24+
RTAS_WORK_AREA_ARENA_ALIGN = PAGE_SIZE,
25+
/*
26+
* Don't let a single allocation claim the whole arena.
27+
*/
28+
RTAS_WORK_AREA_ARENA_SZ = RTAS_WORK_AREA_MAX_ALLOC_SZ * 2,
29+
/*
30+
* The smallest known work area size is for ibm,get-vpd's
31+
* location code argument, which is limited to 79 characters
32+
* plus 1 nul terminator.
33+
*
34+
* PAPR+ 7.3.20 ibm,get-vpd RTAS Call
35+
* PAPR+ 12.3.2.4 Converged Location Code Rules - Length Restrictions
36+
*/
37+
RTAS_WORK_AREA_MIN_ALLOC_SZ = roundup_pow_of_two(80),
38+
};
39+
40+
static struct {
41+
struct gen_pool *gen_pool;
42+
char *arena;
43+
struct mutex mutex; /* serializes allocations */
44+
struct wait_queue_head wqh;
45+
mempool_t descriptor_pool;
46+
bool available;
47+
} rwa_state = {
48+
.mutex = __MUTEX_INITIALIZER(rwa_state.mutex),
49+
.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state.wqh),
50+
};
51+
52+
/*
53+
* A single work area buffer and descriptor to serve requests early in
54+
* boot before the allocator is fully initialized. We know 4KB is the
55+
* most any boot time user needs (they all call ibm,get-system-parameter).
56+
*/
57+
static bool early_work_area_in_use __initdata;
58+
static char early_work_area_buf[SZ_4K] __initdata __aligned(SZ_4K);
59+
static struct rtas_work_area early_work_area __initdata = {
60+
.buf = early_work_area_buf,
61+
.size = sizeof(early_work_area_buf),
62+
};
63+
64+
65+
static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
66+
{
67+
WARN_ON(size > early_work_area.size);
68+
WARN_ON(early_work_area_in_use);
69+
early_work_area_in_use = true;
70+
memset(early_work_area.buf, 0, early_work_area.size);
71+
return &early_work_area;
72+
}
73+
74+
static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
75+
{
76+
WARN_ON(work_area != &early_work_area);
77+
WARN_ON(!early_work_area_in_use);
78+
early_work_area_in_use = false;
79+
}
80+
81+
struct rtas_work_area * __ref __rtas_work_area_alloc(size_t size)
82+
{
83+
struct rtas_work_area *area;
84+
unsigned long addr;
85+
86+
might_sleep();
87+
88+
/*
89+
* The rtas_work_area_alloc() wrapper enforces this at build
90+
* time. Requests that exceed the arena size will block
91+
* indefinitely.
92+
*/
93+
WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
94+
95+
if (!rwa_state.available)
96+
return rtas_work_area_alloc_early(size);
97+
/*
98+
* To ensure FCFS behavior and prevent a high rate of smaller
99+
* requests from starving larger ones, use the mutex to queue
100+
* allocations.
101+
*/
102+
mutex_lock(&rwa_state.mutex);
103+
wait_event(rwa_state.wqh,
104+
(addr = gen_pool_alloc(rwa_state.gen_pool, size)) != 0);
105+
mutex_unlock(&rwa_state.mutex);
106+
107+
area = mempool_alloc(&rwa_state.descriptor_pool, GFP_KERNEL);
108+
area->buf = (char *)addr;
109+
area->size = size;
110+
111+
return area;
112+
}
113+
114+
void __ref rtas_work_area_free(struct rtas_work_area *area)
115+
{
116+
if (!rwa_state.available) {
117+
rtas_work_area_free_early(area);
118+
return;
119+
}
120+
121+
gen_pool_free(rwa_state.gen_pool, (unsigned long)area->buf, area->size);
122+
mempool_free(area, &rwa_state.descriptor_pool);
123+
wake_up(&rwa_state.wqh);
124+
}
125+
126+
/*
127+
* Initialization of the work area allocator happens in two parts. To
128+
* reliably reserve an arena that satisfies RTAS addressing
129+
* requirements, we must perform a memblock allocation early,
130+
* immmediately after RTAS instantiation. Then we have to wait until
131+
* the slab allocator is up before setting up the descriptor mempool
132+
* and adding the arena to a gen_pool.
133+
*/
134+
static __init int rtas_work_area_allocator_init(void)
135+
{
136+
const unsigned int order = ilog2(RTAS_WORK_AREA_MIN_ALLOC_SZ);
137+
const phys_addr_t pa_start = __pa(rwa_state.arena);
138+
const phys_addr_t pa_end = pa_start + RTAS_WORK_AREA_ARENA_SZ - 1;
139+
struct gen_pool *pool;
140+
const int nid = NUMA_NO_NODE;
141+
int err;
142+
143+
err = -ENOMEM;
144+
if (!rwa_state.arena)
145+
goto err_out;
146+
147+
pool = gen_pool_create(order, nid);
148+
if (!pool)
149+
goto err_out;
150+
/*
151+
* All RTAS functions that consume work areas are OK with
152+
* natural alignment, when they have alignment requirements at
153+
* all.
154+
*/
155+
gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
156+
157+
err = gen_pool_add(pool, (unsigned long)rwa_state.arena,
158+
RTAS_WORK_AREA_ARENA_SZ, nid);
159+
if (err)
160+
goto err_destroy;
161+
162+
err = mempool_init_kmalloc_pool(&rwa_state.descriptor_pool, 1,
163+
sizeof(struct rtas_work_area));
164+
if (err)
165+
goto err_destroy;
166+
167+
rwa_state.gen_pool = pool;
168+
rwa_state.available = true;
169+
170+
pr_debug("arena [%pa-%pa] (%uK), min/max alloc sizes %u/%u\n",
171+
&pa_start, &pa_end,
172+
RTAS_WORK_AREA_ARENA_SZ / SZ_1K,
173+
RTAS_WORK_AREA_MIN_ALLOC_SZ,
174+
RTAS_WORK_AREA_MAX_ALLOC_SZ);
175+
176+
return 0;
177+
178+
err_destroy:
179+
gen_pool_destroy(pool);
180+
err_out:
181+
return err;
182+
}
183+
machine_arch_initcall(pseries, rtas_work_area_allocator_init);
184+
185+
/**
186+
* rtas_work_area_reserve_arena() - Reserve memory suitable for RTAS work areas.
187+
*/
188+
void __init rtas_work_area_reserve_arena(const phys_addr_t limit)
189+
{
190+
const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
191+
const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
192+
const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
193+
const int nid = NUMA_NO_NODE;
194+
195+
/*
196+
* Too early for a machine_is(pseries) check. But PAPR
197+
* effectively mandates that ibm,get-system-parameter is
198+
* present:
199+
*
200+
* R1–7.3.16–1. All platforms must support the System
201+
* Parameters option.
202+
*
203+
* So set up the arena if we find that, with a fallback to
204+
* ibm,configure-connector, just in case.
205+
*/
206+
if (rtas_service_present("ibm,get-system-parameter") ||
207+
rtas_service_present("ibm,configure-connector"))
208+
rwa_state.arena = memblock_alloc_try_nid(size, align, min, limit, nid);
209+
}

0 commit comments

Comments
 (0)