From horst@schirmeier.com Fri Oct 3 21:54:03 2014 Received: (at 626) by bugs.x2go.org; 3 Oct 2014 19:54:04 +0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on ymir.das-netzwerkteam.de X-Spam-Level: ** X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_50,FAKE_REPLY_C autolearn=ham version=3.3.2 X-Greylist: delayed 398 seconds by postgrey-1.34 at ymir.das-netzwerkteam.de; Fri, 03 Oct 2014 21:54:03 CEST Received: from quickstop.soohrt.org (quickstop.soohrt.org [85.131.246.152]) by ymir.das-netzwerkteam.de (Postfix) with ESMTPS id 031A83D460 for <626@bugs.x2go.org>; Fri, 3 Oct 2014 21:54:02 +0200 (CEST) Received: (qmail 18770 invoked by uid 1014); 3 Oct 2014 19:47:21 -0000 Date: Fri, 3 Oct 2014 21:47:21 +0200 From: Horst Schirmeier To: 626@bugs.x2go.org Subject: Re: NX agent dies not reliably remove socket files under /tmp/.X11-unix Message-ID: <20141003194721.GN10719@quickstop.soohrt.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Mutt/1.5.21 (2010-09-15) OK, to summarize the scenario: =20 - A normal user connects, gets the first display :50, runs his session. /tmp/.X50-lock and /tmp/.X11-unix/X50 are owned by this user. =20 - The user decides to killall -u username -9. (Don't ask. The actual story was that the logout process didn't complete, and he tried to clean up behind him, having accustomed to use kill -9 all the time.) =20 - This kicks the user's nxagent out of business (the same would happen if it simply crashes), which prevents it from cleaning up the sockets in /tmp. - Another user connects, and also gets assigned the first display (because, for some reason, x2go is convinced it's free again). His x2go processes are not permitted to remove and recreate /tmp/.X50-lock and /tmp/.X11-unix/X50, and the window manager dies immediately. All users besides the initial user are locked out of x2go from now on. =20 Manual workaround: Remove stale /tmp/.X??-lock and /tmp/.X11-unix/X??. Apply LART to users of kill -9 against nxagent. Automatic workaround: x2gocleansessions should probably take care of the /tmp file removal. LART will still need to be applied manually. A "real" fix would move the usual, immediate cleanup step out of the user's control. This could, for example, happen by a daemon running as root, that spawns a nxagent at a user's request, under the user's UID. Once the nxagent dies (from whatever cause), the daemon's SIGCHLD handler does the cleanup. This would also remove the race condition (up to 2s delay before the sockets are cleaned up; connection of new users may be impossible in this time window) introduced by the aforementioned "automatic workaround". Another, much simpler possibility would be to use randomized/uniqe socket names instead of the fixed /tmp/.X${DISPLAYNUM}-lock / /tmp/.X11-unix/X${DISPLAYNUM} scheme. But I don't know enough about X11 to judge whether this could work. -- Horst --=20 PGP-Key 0xD40E0E7A