Skip to content

Commit 6a4d4b3

Browse files
committed
Merge tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt: "This contains some small RISC-V updates I'd like to target for 4.18. They are all fairly small this time. Here's a short summary, there's more info in the commits/merges: - a fix to __clear_user to respect the passed arguments. - enough support for the perf subsystem to work with RISC-V's ISA defined performance counters. - support for sparse and cleanups suggested by it. - support for R_RISCV_32 (a relocation, not the 32-bit ISA). - some MAINTAINERS cleanups. - the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's always present. I've given these a simple build+boot test" * tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig RISC-V: Handle R_RISCV_32 in modules riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set riscv: add riscv-specific predefines to CHECKFLAGS riscv: split the declaration of __copy_user riscv: no __user for probe_kernel_address() riscv: use NULL instead of a plain 0 perf: riscv: Add Document for Future Porting Guide perf: riscv: preliminary RISC-V support MAINTAINERS: Update Albert's email, he's back at Berkeley MAINTAINERS: Add myself as a maintainer for SiFive's drivers riscv: Fix the bug in memory access fixup code
2 parents 8949170 + 24a130c commit 6a4d4b3

File tree

17 files changed

+884
-15
lines changed

17 files changed

+884
-15
lines changed

Documentation/riscv/pmu.txt

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
Supporting PMUs on RISC-V platforms
2+
==========================================
3+
Alan Kao <[email protected]>, Mar 2018
4+
5+
Introduction
6+
------------
7+
8+
As of this writing, perf_event-related features mentioned in The RISC-V ISA
9+
Privileged Version 1.10 are as follows:
10+
(please check the manual for more details)
11+
12+
* [m|s]counteren
13+
* mcycle[h], cycle[h]
14+
* minstret[h], instret[h]
15+
* mhpeventx, mhpcounterx[h]
16+
17+
With such function set only, porting perf would require a lot of work, due to
18+
the lack of the following general architectural performance monitoring features:
19+
20+
* Enabling/Disabling counters
21+
Counters are just free-running all the time in our case.
22+
* Interrupt caused by counter overflow
23+
No such feature in the spec.
24+
* Interrupt indicator
25+
It is not possible to have many interrupt ports for all counters, so an
26+
interrupt indicator is required for software to tell which counter has
27+
just overflowed.
28+
* Writing to counters
29+
There will be an SBI to support this since the kernel cannot modify the
30+
counters [1]. Alternatively, some vendor considers to implement
31+
hardware-extension for M-S-U model machines to write counters directly.
32+
33+
This document aims to provide developers a quick guide on supporting their
34+
PMUs in the kernel. The following sections briefly explain perf' mechanism
35+
and todos.
36+
37+
You may check previous discussions here [1][2]. Also, it might be helpful
38+
to check the appendix for related kernel structures.
39+
40+
41+
1. Initialization
42+
-----------------
43+
44+
*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
45+
various methods according to perf's internal convention and PMU-specific
46+
parameters. One should declare such instance to represent the PMU. By default,
47+
*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
48+
basic support to a baseline QEMU model.
49+
50+
Then he/she can either assign the instance's pointer to *riscv_pmu* so that
51+
the minimal and already-implemented logic can be leveraged, or invent his/her
52+
own *riscv_init_platform_pmu* implementation.
53+
54+
In other words, existing sources of *riscv_base_pmu* merely provide a
55+
reference implementation. Developers can flexibly decide how many parts they
56+
can leverage, and in the most extreme case, they can customize every function
57+
according to their needs.
58+
59+
60+
2. Event Initialization
61+
-----------------------
62+
63+
When a user launches a perf command to monitor some events, it is first
64+
interpreted by the userspace perf tool into multiple *perf_event_open*
65+
system calls, and then each of them calls to the body of *event_init*
66+
member function that was assigned in the previous step. In *riscv_base_pmu*'s
67+
case, it is *riscv_event_init*.
68+
69+
The main purpose of this function is to translate the event provided by user
70+
into bitmap, so that HW-related control registers or counters can directly be
71+
manipulated. The translation is based on the mappings and methods provided in
72+
*riscv_pmu*.
73+
74+
Note that some features can be done in this stage as well:
75+
76+
(1) interrupt setting, which is stated in the next section;
77+
(2) privilege level setting (user space only, kernel space only, both);
78+
(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*;
79+
(4) tweaks for non-sampling events, which will be utilized by functions such as
80+
*perf_adjust_period*, usually something like the follows:
81+
82+
if (!is_sampling_event(event)) {
83+
hwc->sample_period = x86_pmu.max_period;
84+
hwc->last_period = hwc->sample_period;
85+
local64_set(&hwc->period_left, hwc->sample_period);
86+
}
87+
88+
In the case of *riscv_base_pmu*, only (3) is provided for now.
89+
90+
91+
3. Interrupt
92+
------------
93+
94+
3.1. Interrupt Initialization
95+
96+
This often occurs at the beginning of the *event_init* method. In common
97+
practice, this should be a code segment like
98+
99+
int x86_reserve_hardware(void)
100+
{
101+
int err = 0;
102+
103+
if (!atomic_inc_not_zero(&pmc_refcount)) {
104+
mutex_lock(&pmc_reserve_mutex);
105+
if (atomic_read(&pmc_refcount) == 0) {
106+
if (!reserve_pmc_hardware())
107+
err = -EBUSY;
108+
else
109+
reserve_ds_buffers();
110+
}
111+
if (!err)
112+
atomic_inc(&pmc_refcount);
113+
mutex_unlock(&pmc_reserve_mutex);
114+
}
115+
116+
return err;
117+
}
118+
119+
And the magic is in *reserve_pmc_hardware*, which usually does atomic
120+
operations to make implemented IRQ accessible from some global function pointer.
121+
*release_pmc_hardware* serves the opposite purpose, and it is used in event
122+
destructors mentioned in previous section.
123+
124+
(Note: From the implementations in all the architectures, the *reserve/release*
125+
pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
126+
It does NOT deal with the binding between an event and a physical counter,
127+
which will be introduced in the next section.)
128+
129+
3.2. IRQ Structure
130+
131+
Basically, a IRQ runs the following pseudo code:
132+
133+
for each hardware counter that triggered this overflow
134+
135+
get the event of this counter
136+
137+
// following two steps are defined as *read()*,
138+
// check the section Reading/Writing Counters for details.
139+
count the delta value since previous interrupt
140+
update the event->count (# event occurs) by adding delta, and
141+
event->hw.period_left by subtracting delta
142+
143+
if the event overflows
144+
sample data
145+
set the counter appropriately for the next overflow
146+
147+
if the event overflows again
148+
too frequently, throttle this event
149+
fi
150+
fi
151+
152+
end for
153+
154+
However as of this writing, none of the RISC-V implementations have designed an
155+
interrupt for perf, so the details are to be completed in the future.
156+
157+
4. Reading/Writing Counters
158+
---------------------------
159+
160+
They seem symmetric but perf treats them quite differently. For reading, there
161+
is a *read* interface in *struct pmu*, but it serves more than just reading.
162+
According to the context, the *read* function not only reads the content of the
163+
counter (event->count), but also updates the left period to the next interrupt
164+
(event->hw.period_left).
165+
166+
But the core of perf does not need direct write to counters. Writing counters
167+
is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one
168+
has to set the counter to a good value for the next interrupt; 2) inside the IRQ
169+
it should set the counter to the same resonable value.
170+
171+
Reading is not a problem in RISC-V but writing would need some effort, since
172+
counters are not allowed to be written by S-mode.
173+
174+
175+
5. add()/del()/start()/stop()
176+
-----------------------------
177+
178+
Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
179+
starts/stop the counter of some event in the PMU. All of them take the same
180+
arguments: *struct perf_event *event* and *int flag*.
181+
182+
Consider perf as a state machine, then you will find that these functions serve
183+
as the state transition process between those states.
184+
Three states (event->hw.state) are defined:
185+
186+
* PERF_HES_STOPPED: the counter is stopped
187+
* PERF_HES_UPTODATE: the event->count is up-to-date
188+
* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now
189+
190+
A normal flow of these state transitions are as follows:
191+
192+
* A user launches a perf event, resulting in calling to *event_init*.
193+
* When being context-switched in, *add* is called by the perf core, with a flag
194+
PERF_EF_START, which means that the event should be started after it is added.
195+
At this stage, a general event is bound to a physical counter, if any.
196+
The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
197+
stopped, and the (software) event count does not need updating.
198+
** *start* is then called, and the counter is enabled.
199+
With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check
200+
previous section for detail).
201+
Nothing is written if the flag does not contain PERF_EF_RELOAD.
202+
The state now is reset to none, because it is neither stopped nor updated
203+
(the counting already started)
204+
* When being context-switched out, *del* is called. It then checks out all the
205+
events in the PMU and calls *stop* to update their counts.
206+
** *stop* is called by *del*
207+
and the perf core with flag PERF_EF_UPDATE, and it often shares the same
208+
subroutine as *read* with the same logic.
209+
The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
210+
211+
** Life cycle of these two pairs: *add* and *del* are called repeatedly as
212+
tasks switch in-and-out; *start* and *stop* is also called when the perf core
213+
needs a quick stop-and-start, for instance, when the interrupt period is being
214+
adjusted.
215+
216+
Current implementation is sufficient for now and can be easily extended to
217+
features in the future.
218+
219+
A. Related Structures
220+
---------------------
221+
222+
* struct pmu: include/linux/perf_event.h
223+
* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
224+
225+
Both structures are designed to be read-only.
226+
227+
*struct pmu* defines some function pointer interfaces, and most of them take
228+
*struct perf_event* as a main argument, dealing with perf events according to
229+
perf's internal state machine (check kernel/events/core.c for details).
230+
231+
*struct riscv_pmu* defines PMU-specific parameters. The naming follows the
232+
convention of all other architectures.
233+
234+
* struct perf_event: include/linux/perf_event.h
235+
* struct hw_perf_event
236+
237+
The generic structure that represents perf events, and the hardware-related
238+
details.
239+
240+
* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
241+
242+
The structure that holds the status of events, has two fixed members:
243+
the number of events and the array of the events.
244+
245+
References
246+
----------
247+
248+
[1] https://github.com/riscv/riscv-linux/pull/124
249+
[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA

MAINTAINERS

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12179,7 +12179,7 @@ F: drivers/mtd/nand/raw/r852.h
1217912179

1218012180
RISC-V ARCHITECTURE
1218112181
M: Palmer Dabbelt <[email protected]>
12182-
M: Albert Ou <[email protected]>
12182+
M: Albert Ou <[email protected]>
1218312183
1218412184
T: git git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git
1218512185
S: Supported
@@ -12939,6 +12939,14 @@ F: drivers/media/usb/siano/
1293912939
F: drivers/media/usb/siano/
1294012940
F: drivers/media/mmc/siano/
1294112941

12942+
SIFIVE DRIVERS
12943+
M: Palmer Dabbelt <[email protected]>
12944+
12945+
T: git git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git
12946+
S: Supported
12947+
K: sifive
12948+
N: sifive
12949+
1294212950
SILEAD TOUCHSCREEN DRIVER
1294312951
M: Hans de Goede <[email protected]>
1294412952

arch/riscv/Kconfig

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ config RISCV
3232
select HAVE_MEMBLOCK_NODE_MAP
3333
select HAVE_DMA_CONTIGUOUS
3434
select HAVE_GENERIC_DMA_COHERENT
35+
select HAVE_PERF_EVENTS
3536
select IRQ_DOMAIN
3637
select NO_BOOTMEM
3738
select RISCV_ISA_A if SMP
@@ -193,6 +194,19 @@ config RISCV_ISA_C
193194
config RISCV_ISA_A
194195
def_bool y
195196

197+
menu "supported PMU type"
198+
depends on PERF_EVENTS
199+
200+
config RISCV_BASE_PMU
201+
bool "Base Performance Monitoring Unit"
202+
def_bool y
203+
help
204+
A base PMU that serves as a reference implementation and has limited
205+
feature of perf. It can run on any RISC-V machines so serves as the
206+
fallback, but this option can also be disable to reduce kernel size.
207+
208+
endmenu
209+
196210
endmenu
197211

198212
menu "Kernel type"

arch/riscv/Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,9 @@ KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-relax)
7171
# architectures. It's faster to have GCC emit only aligned accesses.
7272
KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
7373

74+
# arch specific predefines for sparse
75+
CHECKFLAGS += -D__riscv -D__riscv_xlen=$(BITS)
76+
7477
head-y := arch/riscv/kernel/head.o
7578

7679
core-y += arch/riscv/kernel/ arch/riscv/mm/

arch/riscv/configs/defconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ CONFIG_INPUT_MOUSEDEV=y
4444
CONFIG_SERIAL_8250=y
4545
CONFIG_SERIAL_8250_CONSOLE=y
4646
CONFIG_SERIAL_OF_PLATFORM=y
47+
CONFIG_HVC_RISCV_SBI=y
4748
# CONFIG_PTP_1588_CLOCK is not set
4849
CONFIG_DRM=y
4950
CONFIG_DRM_RADEON=y

arch/riscv/include/asm/Kbuild

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ generic-y += kdebug.h
2525
generic-y += kmap_types.h
2626
generic-y += kvm_para.h
2727
generic-y += local.h
28+
generic-y += local64.h
2829
generic-y += mm-arch-hooks.h
2930
generic-y += mman.h
3031
generic-y += module.h

arch/riscv/include/asm/cacheflush.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ static inline void flush_dcache_page(struct page *page)
4747

4848
#else /* CONFIG_SMP */
4949

50-
#define flush_icache_all() sbi_remote_fence_i(0)
50+
#define flush_icache_all() sbi_remote_fence_i(NULL)
5151
void flush_icache_mm(struct mm_struct *mm, bool local);
5252

5353
#endif /* CONFIG_SMP */

0 commit comments

Comments
 (0)