Devuan bug report logs - #227
nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted

version graph

Package: sysvinit-utils; Maintainer for sysvinit-utils is Devuan Developers <devuan-dev@lists.dyne.org>; Source for sysvinit-utils is src:sysvinit.

Affects: nbd-client

Reported by: David Kuehling <dvdkhlng@posteo.de>

Date: Mon, 16 Jul 2018 00:03:01 UTC

Severity: normal

Tags: upstream

Fixed in version 3.07-1

Done: Mark Hindley <mark@hindley.org.uk>

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to devuan-bugs@lists.dyne.org, David Kuehling <dvdkhlng@posteo.de>, owner@bugs.devuan.org:
bug#227; Package nbd-client. (full text, mbox, link).


Acknowledgement sent to David Kuehling <dvdkhlng@posteo.de>:
New bug report received and forwarded. Copy sent to David Kuehling <dvdkhlng@posteo.de>, owner@bugs.devuan.org. (full text, mbox, link).


Message #5 received at submit@bugs.devuan.org (full text, mbox, reply):

From: David Kuehling <dvdkhlng@posteo.de>
To: Devuan Bug Tracking System <submit@bugs.devuan.org>
Subject: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Mon, 16 Jul 2018 01:52:51 +0200
Package: nbd-client
Version: 1:3.15.2-3
Severity: normal

Dear Maintainer,

I'm currently running a Devuan desktop without local storage using
pxelinux and root located on a network block device provided via
nbd-client (kernel command line "root=/dev/mapper/vg-root
nbdroot=192.168.0.123,devicename,nbd0").

Unfortunately the sysvinit tools and init scripts and the nbd-client
binary do not properly work together causing nbd-client to disconnect at
shutdown before the filesystems can be properly unmounted.

This seems to be a very old problem documented e.g. here [1] and here
[2] and it seems that systemd already got specific patches to prevent
that problem.  With only sysvinit left to exhibit the problem, this sort
of makes the bug devuan-specific (or should I still bother to report it
to debian?).

Short summary of the problem: during shutdown /etc/init.d/sendsigs calls
killall5 binary from sysvinit-utils, killing almost all running
processes.

Of course it never should kill nbd-client, so the /etc/init.d/nbd-client
script is smart enough to register its PID to be exempt from sendsig's
action: by recording it in the /run/sendsigs.omit.d/nbd-cient file.

These PIDs are then collected by /etc/init.d/sendsig and passed as "-o
NNN" options to killall5 which spares those processes from premature
termination.

However, before killall5 goes on to kill all the other processes, it
does a:

	/* Now stop all processes. */
	kill(-1, SIGSTOP);

And when it's done, it does:

	/* And let them continue. */
	kill(-1, SIGCONT);

These SIGSTOP, SIGCONT signals are passed to all processes, including
nbd-client.  Unfortunately nbd-client is written in a way that makes it
unable to handle any signals delivered while it is inside an ioctl call,
and it looses its server connection on SIGSTOP, totally breaking the
block devices it provides.

What would be the right way to prevent this problem?  Fix sysvinit?  Fix
nbd-client?  Note that for nbd-root installations nbd-client is run via
(a copy of) /usr/share/initramfs-tools/scripts/local-top/nbd and not via
/etc/init.d/nbd-client.

Other than that I'm very happy with the stability and maturity of
Devuan.  Running the installer on a diskless system, installing into a
network block device (with cryptsetup and LVM on top), then still being
able to boot the system without problems was sort of surprising :)

cheers,

David

[1] https://github.com/NetworkBlockDevice/nbd/issues/51
[2] https://sourceforge.net/p/nbd/mailman/message/27368126/

-- System Information:
Distributor ID:	Devuan
Description:	Devuan GNU/Linux 2.0 (ascii)
Release:	2.0
Codename:	ascii

Architecture: x86_64

Kernel: Linux 4.9.0-6-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages nbd-client depends on:
ii  debconf [debconf-2.0]  1.5.61
ii  libc6                  2.24-11+deb9u3
ii  libgnutls30            3.5.8-5+deb9u3

nbd-client recommends no packages.

nbd-client suggests no packages.

-- debconf information:
  nbd-client/no-auto-config:
  nbd-client/killall_set:


Information forwarded to devuan-bugs@lists.dyne.org, devuan-dev@lists.dyne.org:
bug#227; Package nbd-client. (Thu, 16 Feb 2023 15:26:02 GMT) (full text, mbox, link).


Message #8 received at 227@bugs.devuan.org (full text, mbox, reply):

From: Mark Hindley <mark@hindley.org.uk>
To: David Kuehling <dvdkhlng@posteo.de>, 227@bugs.devuan.org
Cc: jsmith@resonatingmedia.com
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Thu, 16 Feb 2023 15:22:44 +0000
Control: tags -1 upstream

David,

I know it has been a long time, but thanks for this.

On Mon, Jul 16, 2018 at 01:52:51AM +0200, David Kuehling wrote:
> Short summary of the problem: during shutdown /etc/init.d/sendsigs calls
> killall5 binary from sysvinit-utils, killing almost all running
> processes.
> 
> Of course it never should kill nbd-client, so the /etc/init.d/nbd-client
> script is smart enough to register its PID to be exempt from sendsig's
> action: by recording it in the /run/sendsigs.omit.d/nbd-cient file.
> 
> These PIDs are then collected by /etc/init.d/sendsig and passed as "-o
> NNN" options to killall5 which spares those processes from premature
> termination.
> 
> However, before killall5 goes on to kill all the other processes, it
> does a:
> 
> 	/* Now stop all processes. */
> 	kill(-1, SIGSTOP);
> 
> And when it's done, it does:
> 
> 	/* And let them continue. */
> 	kill(-1, SIGCONT);
> 
> These SIGSTOP, SIGCONT signals are passed to all processes, including
> nbd-client.  Unfortunately nbd-client is written in a way that makes it
> unable to handle any signals delivered while it is inside an ioctl call,
> and it looses its server connection on SIGSTOP, totally breaking the
> block devices it provides.
> 
> What would be the right way to prevent this problem?  Fix sysvinit?

My inclination here is that killall5 shouldn't send any signals (including a
STOP CONT pair) to processes that have registered to be omitted.

Copying Jesse Smith the sysvinit maintainer.

Jesse what do you think? Is it possible to avoid that?

Thanks

Mark

Added tag(s) upstream. Request was from Mark Hindley <mark@hindley.org.uk> to 227-submit@bugs.devuan.org. (Thu, 16 Feb 2023 15:26:04 GMT) (full text, mbox, link).


Information forwarded to devuan-bugs@lists.dyne.org, devuan-dev@lists.dyne.org:
bug#227; Package nbd-client. (Thu, 16 Feb 2023 16:10:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jesse Smith <jessefrgsmith@yahoo.ca>:
Extra info received and forwarded to list. Copy sent to devuan-dev@lists.dyne.org. (Thu, 16 Feb 2023 16:10:09 GMT) (full text, mbox, link).


Message #15 received at 227@bugs.devuan.org (full text, mbox, reply):

From: Jesse Smith <jessefrgsmith@yahoo.ca>
To: Mark Hindley <mark@hindley.org.uk>, David Kuehling <dvdkhlng@posteo.de>, 227@bugs.devuan.org
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Thu, 16 Feb 2023 12:08:34 -0400
On 2023-02-16 11:22 a.m., Mark Hindley wrote:
> Control: tags -1 upstream
>
> David,
>
> I know it has been a long time, but thanks for this.
>
> On Mon, Jul 16, 2018 at 01:52:51AM +0200, David Kuehling wrote:
>> Short summary of the problem: during shutdown /etc/init.d/sendsigs calls
>> killall5 binary from sysvinit-utils, killing almost all running
>> processes.
>>
>> Of course it never should kill nbd-client, so the /etc/init.d/nbd-client
>> script is smart enough to register its PID to be exempt from sendsig's
>> action: by recording it in the /run/sendsigs.omit.d/nbd-cient file.
>>
>> These PIDs are then collected by /etc/init.d/sendsig and passed as "-o
>> NNN" options to killall5 which spares those processes from premature
>> termination.
>>
>> However, before killall5 goes on to kill all the other processes, it
>> does a:
>>
>> 	/* Now stop all processes. */
>> 	kill(-1, SIGSTOP);
>>
>> And when it's done, it does:
>>
>> 	/* And let them continue. */
>> 	kill(-1, SIGCONT);
>>
>> These SIGSTOP, SIGCONT signals are passed to all processes, including
>> nbd-client.  Unfortunately nbd-client is written in a way that makes it
>> unable to handle any signals delivered while it is inside an ioctl call,
>> and it looses its server connection on SIGSTOP, totally breaking the
>> block devices it provides.
>>
>> What would be the right way to prevent this problem?  Fix sysvinit?
> My inclination here is that killall5 shouldn't send any signals (including a
> STOP CONT pair) to processes that have registered to be omitted.
>
> Copying Jesse Smith the sysvinit maintainer.
>
> Jesse what do you think? Is it possible to avoid that?
>
>


Thanks for the detailed bug report and explanation.

The way I see it this probably is a bug in the way killall5 operates. If
we've been explicitly told not to send signals to a process, then we
shouldn't send any signals to that process. It shouldn't matter if the
signal is STOP, CONT, or TERM.

I see two possible ways we could fix this:

1. Create a command line flag which disables the SIGSTOP and SIGCONT
signals being sent. This is an easy fix, quick and dirty. The potential
downside is if someone disables the STOP signal then maybe processes
terminate, move groups, or are replaced before we get around to sending
them the KILL signal. This probably won't happen, but it means killall5
is working with a "moving target".

2. We can run SIGSTOP on all processes _except_ those in the omit list.
This will be a lot slower than the existing "kill(-1, SIGSTOP)" call we
currently make. But I think it's more correct.

Basically the new work flow would look like this:

1. Send all processes except those omitted the SIGSTOP command.
2. Send all processes except those omitted the SIGKILL command.
3. Send all processes except those omitted the SIGCONT command.

Option #2 is slow and ugly, but seems "correct" from a behaviour point
of view.

I'm open to comments before I patch this.

- Jesse


Information forwarded to devuan-bugs@lists.dyne.org, devuan-dev@lists.dyne.org:
bug#227; Package nbd-client. (Thu, 16 Feb 2023 16:20:01 GMT) (full text, mbox, link).


Acknowledgement sent to Mark Hindley <mark@hindley.org.uk>:
Extra info received and forwarded to list. Copy sent to devuan-dev@lists.dyne.org. (Thu, 16 Feb 2023 16:20:03 GMT) (full text, mbox, link).


Message #20 received at 227@bugs.devuan.org (full text, mbox, reply):

From: Mark Hindley <mark@hindley.org.uk>
To: Jesse Smith <jsmith@resonatingmedia.com>
Cc: David Kuehling <dvdkhlng@posteo.de>, 227@bugs.devuan.org
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Thu, 16 Feb 2023 16:17:52 +0000
Control: reassign -1 sysvinit-utils
Control: affects -1 nbd-client

Jesse,

Thanks for you quick reply.

On Thu, Feb 16, 2023 at 11:54:32AM -0400, Jesse Smith wrote:
> I see two possible ways we could fix this:
> 
> 1. Create a command line flag which disables the SIGSTOP and SIGCONT
> signals being sent. This is an easy fix, quick and dirty. The potential
> downside is if someone disables the STOP signal then maybe processes
> terminate, move groups, or are replaced before we get around to sending
> them the KILL signal. This probably won't happen, but it means killall5
> is working with a "moving target".
> 
> 2. We can run SIGSTOP on all processes _except_ those in the omit list.
> This will be a lot slower than the existing "kill(-1, SIGSTOP)" call we
> currently make. But I think it's more correct.
> 
> Basically the new work flow would look like this:
> 
> 1. Send all processes except those omitted the SIGSTOP command.
> 2. Send all processes except those omitted the SIGKILL command.
> 3. Send all processes except those omitted the SIGCONT command.
> 
> Option #2 is slow and ugly, but seems "correct" from a behaviour point
> of view.

I don't really like #1: it will mean patching callers to get the (more?) correct
behaviour. So #2 seems better. Will it be significantly slower?. You could only
use it when omitpid is specified? If there are no omitpids, the current
behaviour seems fine.

Thanks

Mark

bug reassigned from package 'nbd-client' to 'sysvinit-utils'. Request was from Mark Hindley <mark@hindley.org.uk> to 227-submit@bugs.devuan.org. (Thu, 16 Feb 2023 16:20:04 GMT) (full text, mbox, link).


Added indication that 227 affects nbd-client Request was from Mark Hindley <mark@hindley.org.uk> to 227-submit@bugs.devuan.org. (Thu, 16 Feb 2023 16:20:04 GMT) (full text, mbox, link).


Information forwarded to devuan-bugs@lists.dyne.org, Devuan Developers <devuan-dev@lists.dyne.org>:
bug#227; Package sysvinit-utils. (Thu, 16 Feb 2023 16:30:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jesse Smith <jessefrgsmith@yahoo.ca>:
Extra info received and forwarded to list. Copy sent to Devuan Developers <devuan-dev@lists.dyne.org>. (Thu, 16 Feb 2023 16:30:03 GMT) (full text, mbox, link).


Message #29 received at 227@bugs.devuan.org (full text, mbox, reply):

From: Jesse Smith <jessefrgsmith@yahoo.ca>
To: Mark Hindley <mark@hindley.org.uk>
Cc: David Kuehling <dvdkhlng@posteo.de>, 227@bugs.devuan.org
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Thu, 16 Feb 2023 12:26:59 -0400
On 2023-02-16 12:17 p.m., Mark Hindley wrote:
> Control: reassign -1 sysvinit-utils
> Control: affects -1 nbd-client
>
> Jesse,
>
> Thanks for you quick reply.
>
> On Thu, Feb 16, 2023 at 11:54:32AM -0400, Jesse Smith wrote:
>> I see two possible ways we could fix this:
>>
>> 1. Create a command line flag which disables the SIGSTOP and SIGCONT
>> signals being sent. This is an easy fix, quick and dirty. The potential
>> downside is if someone disables the STOP signal then maybe processes
>> terminate, move groups, or are replaced before we get around to sending
>> them the KILL signal. This probably won't happen, but it means killall5
>> is working with a "moving target".
>>
>> 2. We can run SIGSTOP on all processes _except_ those in the omit list.
>> This will be a lot slower than the existing "kill(-1, SIGSTOP)" call we
>> currently make. But I think it's more correct.
>>
>> Basically the new work flow would look like this:
>>
>> 1. Send all processes except those omitted the SIGSTOP command.
>> 2. Send all processes except those omitted the SIGKILL command.
>> 3. Send all processes except those omitted the SIGCONT command.
>>
>> Option #2 is slow and ugly, but seems "correct" from a behaviour point
>> of view.
> I don't really like #1: it will mean patching callers to get the (more?) correct
> behaviour. So #2 seems better. Will it be significantly slower?. You could only
> use it when omitpid is specified? If there are no omitpids, the current
> behaviour seems fine.
>
>
if I had to guess, I'd assume the new approach would be about 3x slower.
But we're probably talking about loops that will finish in a thousandths
of a second. From a practical point of view, I don't think anyone will
see the difference. And, as you pointed out, if there is no list of
omitted PIDs then we can revert to the old way of doing things.

I'll look at fixing this and testing it this weekend. See if I can fix
this without breaking anything else.

- Jesse


Information forwarded to devuan-bugs@lists.dyne.org, Devuan Developers <devuan-dev@lists.dyne.org>:
bug#227; Package sysvinit-utils. (Sun, 19 Feb 2023 05:06:01 GMT) (full text, mbox, link).


Acknowledgement sent to Jesse Smith <jessefrgsmith@yahoo.ca>:
Extra info received and forwarded to list. Copy sent to Devuan Developers <devuan-dev@lists.dyne.org>. (Sun, 19 Feb 2023 05:06:08 GMT) (full text, mbox, link).


Message #34 received at 227@bugs.devuan.org (full text, mbox, reply):

From: Jesse Smith <jessefrgsmith@yahoo.ca>
To: Mark Hindley <mark@hindley.org.uk>
Cc: David Kuehling <dvdkhlng@posteo.de>, 227@bugs.devuan.org
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Sun, 19 Feb 2023 01:03:16 -0400
[Message part 1 (text/plain, inline)]
On 2023-02-16 12:17 p.m., Mark Hindley wrote:
> Control: reassign -1 sysvinit-utils
> Control: affects -1 nbd-client
>
> Jesse,
>
> Thanks for you quick reply.
>
> On Thu, Feb 16, 2023 at 11:54:32AM -0400, Jesse Smith wrote:
>> I see two possible ways we could fix this:
>>
>> 1. Create a command line flag which disables the SIGSTOP and SIGCONT
>> signals being sent. This is an easy fix, quick and dirty. The potential
>> downside is if someone disables the STOP signal then maybe processes
>> terminate, move groups, or are replaced before we get around to sending
>> them the KILL signal. This probably won't happen, but it means killall5
>> is working with a "moving target".
>>
>> 2. We can run SIGSTOP on all processes _except_ those in the omit list.
>> This will be a lot slower than the existing "kill(-1, SIGSTOP)" call we
>> currently make. But I think it's more correct.
>>
>> Basically the new work flow would look like this:
>>
>> 1. Send all processes except those omitted the SIGSTOP command.
>> 2. Send all processes except those omitted the SIGKILL command.
>> 3. Send all processes except those omitted the SIGCONT command.
>>
>> Option #2 is slow and ugly, but seems "correct" from a behaviour point
>> of view.
> I don't really like #1: it will mean patching callers to get the (more?) correct
> behaviour. So #2 seems better. Will it be significantly slower?. You could only
> use it when omitpid is specified? If there are no omitpids, the current
> behaviour seems fine.
>

I looked at this issue some more and it turns out option #2 doesn't
really makes sense with the way killall5 works. The SIGSTOP signal is
sent out to basically pause everything on the system so we can sort
through what is available to be terminated and what should be spared. If
we want to go through the whole process tree to see what should or
should not be stopped first, then it removes the point of sending
SIGSTOP to give us a fixed collection of processes.

In other words, sending SIGSTOP only makes sense if we later want to
freeze everything and then sort processes into what we will terminate
and what will be omitted. If we are going to go through all the
processes and send some the SIGSTOP signal and not others, then we might
as well just skip the STOP and move straight on to terminating
appropriate processes.

I've tried to work in a reasonable middle ground. Now, if no processes
are in the "omit" list, we stick to the original behaviour. We stop
everything, sort through which processes we will terminate, then send
everything the SIGCONT signal. Just like before.

However, if there are any processes in the "omit" list, then we err on
the side of caution. Nothing is sent STOP or CONT, we just sort through
the processes we know about quickly and try to terminate everything
we're supposed to kill. And ignore anything in the process list.

It's not perfect, we could conceivably miss a process that was created
while we were sorting through the list after the point we'd normally
send SIGSTOP. However, this approach makes certain we do not damage any
process that can't handle receiving SIGSTOP.

The downside is that a process could get created after killall5 is run
and before it is terminated. However, this is always going to be the
case when we allow some programs to be in the "omit" list. Any of the
omitted programs could spawn a new process, so that risk already
existed. This doesn't really make the situation (much) worse.

I've tested this change and it seems to work. I'm attaching a new copy
of killall5 for testing. It's also available on GitHub in the 3.07
branch of sysvinit.

Also, as a bonus, a bunch of the killall5 code didn't initialize
variables, seeming to assume a specific code path or that the compiler
would initialize variables to zero/NULL. Since this isn't always true,
I've initialized variables we might now use in unexpected order or which
should be NULL/false at the start of the program.

- Jesse

[killall5.c (text/x-csrc, attachment)]

Information forwarded to devuan-bugs@lists.dyne.org, Devuan Developers <devuan-dev@lists.dyne.org>:
bug#227; Package sysvinit-utils. (Mon, 20 Feb 2023 11:26:01 GMT) (full text, mbox, link).


Acknowledgement sent to Mark Hindley <mark@hindley.org.uk>:
Extra info received and forwarded to list. Copy sent to Devuan Developers <devuan-dev@lists.dyne.org>. (Mon, 20 Feb 2023 11:26:09 GMT) (full text, mbox, link).


Message #39 received at 227@bugs.devuan.org (full text, mbox, reply):

From: Mark Hindley <mark@hindley.org.uk>
To: Jesse Smith <jessefrgsmith@yahoo.ca>
Cc: David Kuehling <dvdkhlng@posteo.de>, 227@bugs.devuan.org
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Mon, 20 Feb 2023 11:24:42 +0000
Control: tags -1 fixed-upstream

Jesse,

Thanks for you quick work on this.

On Sun, Feb 19, 2023 at 01:03:16AM -0400, Jesse Smith wrote:
> I looked at this issue some more and it turns out option #2 doesn't
> really makes sense with the way killall5 works.


Yes, I see that

> I've tested this change and it seems to work. I'm attaching a new copy
> of killall5 for testing. It's also available on GitHub in the 3.07
> branch of sysvinit.

Since Debian is frozen at the moment, I will do a build for experimental in due
course.

Thanks

Mark

Reply sent to Mark Hindley <mark@hindley.org.uk>:
You have taken responsibility. (Sat, 01 Jul 2023 15:12:01 GMT) (full text, mbox, link).


Notification sent to David Kuehling <dvdkhlng@posteo.de>:
bug acknowledged by developer. (Sat, 01 Jul 2023 15:12:04 GMT) (full text, mbox, link).


Message #44 received at 227-done@bugs.devuan.org (full text, mbox, reply):

From: Mark Hindley <mark@hindley.org.uk>
To: David Kuehling <dvdkhlng@posteo.de>
Cc: 227-done@bugs.devuan.org
Subject: Re: bug#227: nbd-client: At shutdown nbd-client disabled before file-systems could be cleanly unmounted
Date: Sat, 1 Jul 2023 16:09:29 +0100
Version: 3.07-1

David,

The upstream fix of killall5 to properly observe /run/sendsigs.omit.d/ is
included in this version.

Closing as fixed

Mark
-- 
Mark Hindley
GPG: 506C 15A4 2B0A F5A0 A854  23EE D28A 45BF 3287 D649

Send a report that this bug log contains spam.


Devuan BTS -- Powered by Debian bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.

Devuan Bugs Owner <owner@bugs.devuan.org>.
Last modified: Fri Apr 19 06:24:36 2024;