About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / watchdog / watchdog-api.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	Last reviewed: 10/05/2007
2	
3	
4	The Linux Watchdog driver API.
5	
6	Copyright 2002 Christer Weingel <wingel@nano-system.com>
7	
8	Some parts of this document are copied verbatim from the sbc60xxwdt
9	driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
10	
11	This document describes the state of the Linux 2.4.18 kernel.
12	
13	Introduction:
14	
15	A Watchdog Timer (WDT) is a hardware circuit that can reset the
16	computer system in case of a software fault.  You probably knew that
17	already.
18	
19	Usually a userspace daemon will notify the kernel watchdog driver via the
20	/dev/watchdog special device file that userspace is still alive, at
21	regular intervals.  When such a notification occurs, the driver will
22	usually tell the hardware watchdog that everything is in order, and
23	that the watchdog should wait for yet another little while to reset
24	the system.  If userspace fails (RAM error, kernel bug, whatever), the
25	notifications cease to occur, and the hardware watchdog will reset the
26	system (causing a reboot) after the timeout occurs.
27	
28	The Linux watchdog API is a rather ad-hoc construction and different
29	drivers implement different, and sometimes incompatible, parts of it.
30	This file is an attempt to document the existing usage and allow
31	future driver writers to use it as a reference.
32	
33	The simplest API:
34	
35	All drivers support the basic mode of operation, where the watchdog
36	activates as soon as /dev/watchdog is opened and will reboot unless
37	the watchdog is pinged within a certain time, this time is called the
38	timeout or margin.  The simplest way to ping the watchdog is to write
39	some data to the device.  So a very simple watchdog daemon would look
40	like this source file:  see samples/watchdog/watchdog-simple.c
41	
42	A more advanced driver could for example check that a HTTP server is
43	still responding before doing the write call to ping the watchdog.
44	
45	When the device is closed, the watchdog is disabled, unless the "Magic
46	Close" feature is supported (see below).  This is not always such a
47	good idea, since if there is a bug in the watchdog daemon and it
48	crashes the system will not reboot.  Because of this, some of the
49	drivers support the configuration option "Disable watchdog shutdown on
50	close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
51	the kernel, there is no way of disabling the watchdog once it has been
52	started.  So, if the watchdog daemon crashes, the system will reboot
53	after the timeout has passed. Watchdog devices also usually support
54	the nowayout module parameter so that this option can be controlled at
55	runtime.
56	
57	Magic Close feature:
58	
59	If a driver supports "Magic Close", the driver will not disable the
60	watchdog unless a specific magic character 'V' has been sent to
61	/dev/watchdog just before closing the file.  If the userspace daemon
62	closes the file without sending this special character, the driver
63	will assume that the daemon (and userspace in general) died, and will
64	stop pinging the watchdog without disabling it first.  This will then
65	cause a reboot if the watchdog is not re-opened in sufficient time.
66	
67	The ioctl API:
68	
69	All conforming drivers also support an ioctl API.
70	
71	Pinging the watchdog using an ioctl:
72	
73	All drivers that have an ioctl interface support at least one ioctl,
74	KEEPALIVE.  This ioctl does exactly the same thing as a write to the
75	watchdog device, so the main loop in the above program could be
76	replaced with:
77	
78		while (1) {
79			ioctl(fd, WDIOC_KEEPALIVE, 0);
80			sleep(10);
81		}
82	
83	the argument to the ioctl is ignored.
84	
85	Setting and getting the timeout:
86	
87	For some drivers it is possible to modify the watchdog timeout on the
88	fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
89	flag set in their option field.  The argument is an integer
90	representing the timeout in seconds.  The driver returns the real
91	timeout used in the same variable, and this timeout might differ from
92	the requested one due to limitation of the hardware.
93	
94	    int timeout = 45;
95	    ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
96	    printf("The timeout was set to %d seconds\n", timeout);
97	
98	This example might actually print "The timeout was set to 60 seconds"
99	if the device has a granularity of minutes for its timeout.
100	
101	Starting with the Linux 2.4.18 kernel, it is possible to query the
102	current timeout using the GETTIMEOUT ioctl.
103	
104	    ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
105	    printf("The timeout was is %d seconds\n", timeout);
106	
107	Pretimeouts:
108	
109	Some watchdog timers can be set to have a trigger go off before the
110	actual time they will reset the system.  This can be done with an NMI,
111	interrupt, or other mechanism.  This allows Linux to record useful
112	information (like panic information and kernel coredumps) before it
113	resets.
114	
115	    pretimeout = 10;
116	    ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
117	
118	Note that the pretimeout is the number of seconds before the time
119	when the timeout will go off.  It is not the number of seconds until
120	the pretimeout.  So, for instance, if you set the timeout to 60 seconds
121	and the pretimeout to 10 seconds, the pretimeout will go off in 50
122	seconds.  Setting a pretimeout to zero disables it.
123	
124	There is also a get function for getting the pretimeout:
125	
126	    ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
127	    printf("The pretimeout was is %d seconds\n", timeout);
128	
129	Not all watchdog drivers will support a pretimeout.
130	
131	Get the number of seconds before reboot:
132	
133	Some watchdog drivers have the ability to report the remaining time
134	before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
135	that returns the number of seconds before reboot.
136	
137	    ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
138	    printf("The timeout was is %d seconds\n", timeleft);
139	
140	Environmental monitoring:
141	
142	All watchdog drivers are required return more information about the system,
143	some do temperature, fan and power level monitoring, some can tell you
144	the reason for the last reboot of the system.  The GETSUPPORT ioctl is
145	available to ask what the device can do:
146	
147		struct watchdog_info ident;
148		ioctl(fd, WDIOC_GETSUPPORT, &ident);
149	
150	the fields returned in the ident struct are:
151	
152	        identity		a string identifying the watchdog driver
153		firmware_version	the firmware version of the card if available
154		options			a flags describing what the device supports
155	
156	the options field can have the following bits set, and describes what
157	kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
158	return.   [FIXME -- Is this correct?]
159	
160		WDIOF_OVERHEAT		Reset due to CPU overheat
161	
162	The machine was last rebooted by the watchdog because the thermal limit was
163	exceeded
164	
165		WDIOF_FANFAULT		Fan failed
166	
167	A system fan monitored by the watchdog card has failed
168	
169		WDIOF_EXTERN1		External relay 1
170	
171	External monitoring relay/source 1 was triggered. Controllers intended for
172	real world applications include external monitoring pins that will trigger
173	a reset.
174	
175		WDIOF_EXTERN2		External relay 2
176	
177	External monitoring relay/source 2 was triggered
178	
179		WDIOF_POWERUNDER	Power bad/power fault
180	
181	The machine is showing an undervoltage status
182	
183		WDIOF_CARDRESET		Card previously reset the CPU
184	
185	The last reboot was caused by the watchdog card
186	
187		WDIOF_POWEROVER		Power over voltage
188	
189	The machine is showing an overvoltage status. Note that if one level is
190	under and one over both bits will be set - this may seem odd but makes
191	sense.
192	
193		WDIOF_KEEPALIVEPING	Keep alive ping reply
194	
195	The watchdog saw a keepalive ping since it was last queried.
196	
197		WDIOF_SETTIMEOUT	Can set/get the timeout
198	
199	The watchdog can do pretimeouts.
200	
201		WDIOF_PRETIMEOUT	Pretimeout (in seconds), get/set
202	
203	
204	For those drivers that return any bits set in the option field, the
205	GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
206	status, and the status at the last reboot, respectively.  
207	
208	    int flags;
209	    ioctl(fd, WDIOC_GETSTATUS, &flags);
210	
211	    or
212	
213	    ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
214	
215	Note that not all devices support these two calls, and some only
216	support the GETBOOTSTATUS call.
217	
218	Some drivers can measure the temperature using the GETTEMP ioctl.  The
219	returned value is the temperature in degrees fahrenheit.
220	
221	    int temperature;
222	    ioctl(fd, WDIOC_GETTEMP, &temperature);
223	
224	Finally the SETOPTIONS ioctl can be used to control some aspects of
225	the cards operation.
226	
227	    int options = 0;
228	    ioctl(fd, WDIOC_SETOPTIONS, &options);
229	
230	The following options are available:
231	
232		WDIOS_DISABLECARD	Turn off the watchdog timer
233		WDIOS_ENABLECARD	Turn on the watchdog timer
234		WDIOS_TEMPPANIC		Kernel panic on temperature trip
235	
236	[FIXME -- better explanations]
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog