Documentation / admin-guide / pstore-blk.rst


Based on kernel version 6.8. Page generated on 2024-03-11 21:26 EST.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
.. SPDX-License-Identifier: GPL-2.0

pstore block oops/panic logger
==============================

Introduction
------------

pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
block device and non-block device before the system crashes. You can get
these log files by mounting pstore filesystem like::

    mount -t pstore pstore /sys/fs/pstore


pstore block concepts
---------------------

pstore/blk provides efficient configuration method for pstore/blk, which
divides all configurations into two parts, configurations for user and
configurations for driver.

Configurations for user determine how pstore/blk works, such as pmsg_size,
kmsg_size and so on. All of them support both Kconfig and module parameters,
but module parameters have priority over Kconfig.

Configurations for driver are all about block device and non-block device,
such as total_size of block device and read/write operations.

Configurations for user
-----------------------

All of these configurations support both Kconfig and module parameters, but
module parameters have priority over Kconfig.

Here is an example for module parameters::

        pstore_blk.blkdev=/dev/mmcblk0p7 pstore_blk.kmsg_size=64 best_effort=y

The detail of each configurations may be of interest to you.

blkdev
~~~~~~

The block device to use. Most of the time, it is a partition of block device.
It's required for pstore/blk. It is also used for MTD device.

When pstore/blk is built as a module, "blkdev" accepts the following variants:

1. /dev/<disk_name> represents the device number of disk
#. /dev/<disk_name><decimal> represents the device number of partition - device
   number of disk plus the partition number
#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
   name of partitioned disk ends with a digit.

When pstore/blk is built into the kernel, "blkdev" accepts the following variants:

#. <hex_major><hex_minor> device number in hexadecimal representation,
   with no leading 0x, for example b302.
#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
   a partition if the partition table provides it. The UUID may be either an
   EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
   where SSSSSSSS is a zero-filled hex representation of the 32-bit
   "NT disk signature", and PP is a zero-filled hex representation of the
   1-based partition number.
#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
   partition with a known unique id.
#. <major>:<minor> major and minor number of the device separated by a colon.

It accepts the following variants for MTD device:

1. <device name> MTD device name. "pstore" is recommended.
#. <device number> MTD device number.

kmsg_size
~~~~~~~~~

The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
It's optional if you do not care about the oops/panic log.

There are multiple chunks for oops/panic front-end depending on the remaining
space except other pstore front-ends.

pstore/blk will log to oops/panic chunks one by one, and always overwrite the
oldest chunk if there is no more free chunk.

pmsg_size
~~~~~~~~~

The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
It's optional if you do not care about the pmsg log.

Unlike oops/panic front-end, there is only one chunk for pmsg front-end.

Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
appended to the chunk. On reboot the contents are available in
*/sys/fs/pstore/pmsg-pstore-blk-0*.

console_size
~~~~~~~~~~~~

The chunk size in KB for console front-end.  It **MUST** be a multiple of 4.
It's optional if you do not care about the console log.

Similar to pmsg front-end, there is only one chunk for console front-end.

All log of console will be appended to the chunk. On reboot the contents are
available in */sys/fs/pstore/console-pstore-blk-0*.

ftrace_size
~~~~~~~~~~~

The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
It's optional if you do not care about the ftrace log.

Similar to oops front-end, there are multiple chunks for ftrace front-end
depending on the count of cpu processors. Each chunk size is equal to
ftrace_size / processors_count.

All log of ftrace will be appended to the chunk. On reboot the contents are
combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.

Persistent function tracing might be useful for debugging software or hardware
related hangs. Here is an example of usage::

 # mount -t pstore pstore /sys/fs/pstore
 # mount -t debugfs debugfs /sys/kernel/debug/
 # echo 1 > /sys/kernel/debug/pstore/record_ftrace
 # reboot -f
 [...]
 # mount -t pstore pstore /sys/fs/pstore
 # tail /sys/fs/pstore/ftrace-pstore-blk-0
 CPU:0 ts:5914676 c0063828  c0063b94  call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
 CPU:0 ts:5914678 c039ecdc  c006385c  cpuidle_enter_state <- call_cpuidle+0x44/0x48
 CPU:0 ts:5914680 c039e9a0  c039ecf0  cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
 CPU:0 ts:5914681 c0063870  c039ea30  sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
 CPU:1 ts:5916720 c0160f59  c015ee04  kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
 CPU:1 ts:5916721 c05ca625  c015ee0c  __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
 CPU:1 ts:5916723 c05c813d  c05ca630  yield_to <- __mutex_lock_slowpath+0x314/0x358
 CPU:1 ts:5916724 c05ca2d1  c05ca638  __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358

max_reason
~~~~~~~~~~

Limiting which kinds of kmsg dumps are stored can be controlled via
the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
otherwise KMSG_DUMP_MAX.

Configurations for driver
-------------------------

A device driver uses ``register_pstore_device`` with
``struct pstore_device_info`` to register to pstore/blk.

.. kernel-doc:: fs/pstore/blk.c
   :export:

Compression and header
----------------------

Block device is large enough for uncompressed oops data. Actually we do not
recommend data compression because pstore/blk will insert some information into
the first line of oops/panic data. For example::

        Panic: Total 16 times

It means that it's OOPS|Panic for the 16th time since the first booting.
Sometimes the number of occurrences of oops|panic since the first booting is
important to judge whether the system is stable.

The following line is inserted by pstore filesystem. For example::

        Oops#2 Part1

It means that it's OOPS for the 2nd time on the last boot.

Reading the data
----------------

The dump data can be read from the pstore filesystem. The format for these
files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
``pmsg-pstore-blk-0`` for pmsg front-end and so on.  The timestamp of the
dump file records the trigger time. To delete a stored record from block
device, simply unlink the respective pstore file.

Attentions in panic read/write APIs
-----------------------------------

If on panic, the kernel is not going to run for much longer, the tasks will not
be scheduled and most kernel resources will be out of service. It
looks like a single-threaded program running on a single-core computer.

The following points require special attention for panic read/write APIs:

1. Can **NOT** allocate any memory.
   If you need memory, just allocate while the block driver is initializing
   rather than waiting until the panic.
#. Must be polled, **NOT** interrupt driven.
   No task schedule any more. The block driver should delay to ensure the write
   succeeds, but NOT sleep.
#. Can **NOT** take any lock.
   There is no other task, nor any shared resource; you are safe to break all
   locks.
#. Just use CPU to transfer.
   Do not use DMA to transfer unless you are sure that DMA will not keep lock.
#. Control registers directly.
   Please control registers directly rather than use Linux kernel resources.
   Do I/O map while initializing rather than wait until a panic occurs.
#. Reset your block device and controller if necessary.
   If you are not sure of the state of your block device and controller when
   a panic occurs, you are safe to stop and reset them.

pstore/blk supports psblk_blkdev_info(), which is defined in
*linux/pstore_blk.h*, to get information of using block device, such as the
device number, sector count and start sector of the whole disk.

pstore block internals
----------------------

For developer reference, here are all the important structures and APIs:

.. kernel-doc:: fs/pstore/zone.c
   :internal:

.. kernel-doc:: include/linux/pstore_zone.h
   :internal:

.. kernel-doc:: include/linux/pstore_blk.h
   :internal: