X2Go Bug report logs - #1313
there is still a problem in getting a correct value for loadavgXX with loadchecker

version graph

Package: x2gobroker-agent; Maintainer for x2gobroker-agent is X2Go Developers <x2go-dev@lists.x2go.org>; Source for x2gobroker-agent is src:x2gobroker.

Reported by: Walid MOGHRABI <w.moghrabi@servicemagic.eu>

Date: Mon, 13 Aug 2018 10:45:02 UTC

Severity: normal

Found in version 0.0.4.0-0~1038~ubuntu16.04.1

Done: Mike Gabriel <mike.gabriel@das-netzwerkteam.de>

Bug is archived. No further changes may be made.

Full log


🔗 View this message in rfc822 format

X-Loop: owner@bugs.x2go.org
Subject: Bug#1313: [X2Go-Dev] Bug#1313: Bug#1313: there is still a problem in getting a correct value for loadavgXX with loadchecker
Reply-To: Mike Gabriel <mike.gabriel@das-netzwerkteam.de>, 1313@bugs.x2go.org
Resent-From: Mike Gabriel <mike.gabriel@das-netzwerkteam.de>
Resent-To: x2go-dev@lists.x2go.org
Resent-CC: X2Go Developers <x2go-dev@lists.x2go.org>
X-Loop: owner@bugs.x2go.org
Resent-Date: Fri, 14 Dec 2018 15:05:02 +0000
Resent-Message-ID: <handler.1313.B1313.154479978826838@bugs.x2go.org>
Resent-Sender: owner@bugs.x2go.org
X-X2Go-PR-Message: followup 1313
X-X2Go-PR-Package: x2gobroker-agent
X-X2Go-PR-Keywords: 
Received: via spool by 1313-submit@bugs.x2go.org id=B1313.154479978826838
          (code B ref 1313); Fri, 14 Dec 2018 15:05:02 +0000
Received: (at 1313) by bugs.x2go.org; 14 Dec 2018 15:03:08 +0000
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
	ymir.das-netzwerkteam.de
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=3.0 tests=BAYES_00,RDNS_NONE,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.2
Received: from localhost (localhost [127.0.0.1])
	by ymir.das-netzwerkteam.de (Postfix) with ESMTP id E8E685DACB
	for <1313@bugs.x2go.org>; Fri, 14 Dec 2018 16:03:04 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at ymir.das-netzwerkteam.de
Received: from ymir.das-netzwerkteam.de ([127.0.0.1])
	by localhost (ymir.das-netzwerkteam.de [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 4UZV4wI3m6zT for <1313@bugs.x2go.org>;
	Fri, 14 Dec 2018 16:02:58 +0100 (CET)
Received: from fregna.das-netzwerkteam.de (unknown [IPv6:2a01:4f8:202:1381::1])
	by ymir.das-netzwerkteam.de (Postfix) with ESMTPS id 7BB2C5DAEF
	for <1313@bugs.x2go.org>; Fri, 14 Dec 2018 16:02:58 +0100 (CET)
Received: from grimnir.das-netzwerkteam.de (grimnir.das-netzwerkteam.de [IPv6:2a01:4f8:202:1381::105])
	by fregna.das-netzwerkteam.de (Postfix) with ESMTPS id 653B561160;
	Fri, 14 Dec 2018 15:02:58 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by grimnir.das-netzwerkteam.de (Postfix) with ESMTP id 5ADC4C270C;
	Fri, 14 Dec 2018 16:02:58 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at grimnir.das-netzwerkteam.de
Received: from grimnir.das-netzwerkteam.de ([127.0.0.1])
	by localhost (grimnir.das-netzwerkteam.de [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id rvmJYMhsGK3z; Fri, 14 Dec 2018 16:02:52 +0100 (CET)
Received: from das-netzwerkteam.de (localhost [127.0.0.1])
	by grimnir.das-netzwerkteam.de (Postfix) with ESMTPS id AB845C26F5;
	Fri, 14 Dec 2018 16:02:52 +0100 (CET)
Received: from m-049.informatik.uni-kiel.de (m-049.informatik.uni-kiel.de
 [134.245.254.49]) by mail.das-netzwerkteam.de (Horde Framework) with HTTPS;
 Fri, 14 Dec 2018 15:02:52 +0000
Date: Fri, 14 Dec 2018 15:02:52 +0000
Message-ID: <20181214150252.Horde.byNG7EXbaadzROu-QLl9djD@mail.das-netzwerkteam.de>
From: Mike Gabriel <mike.gabriel@das-netzwerkteam.de>
To: 1313@bugs.x2go.org
Cc: Walid MOGHRABI <w.moghrabi@servicemagic.eu>
References: <883547417.4000531.1534156346103.JavaMail.root@servicemagic.eu>
 <1342096826.4000684.1534156440651.JavaMail.root@servicemagic.eu>
 <20180913131919.Horde.E_bg6JMBLAcV9jdu5upQ4C5@mail.das-netzwerkteam.de>
In-Reply-To: <20180913131919.Horde.E_bg6JMBLAcV9jdu5upQ4C5@mail.das-netzwerkteam.de>
User-Agent: Horde Application Framework 5
Accept-Language: de,en
Organization: DAS-NETZWERKTEAM
X-Originating-IP: 134.245.254.49
X-Remote-Browser: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Firefox/60.0
Content-Type: multipart/signed; boundary="=_6-78xO-KqXvV_VjDhzMr_NN";
 protocol="application/pgp-signature"; micalg=pgp-sha256
MIME-Version: 1.0
[Message part 1 (text/plain, inline)]
Control: close -1

On  Do 13 Sep 2018 15:19:19 CEST, Mike Gabriel wrote:

> Hi Walid,
>
> On  Mo 13 Aug 2018 12:34:00 CEST, Walid MOGHRABI wrote:
>
>> package: x2gobroker-agent
>> version: 0.0.4.0-0~1038~ubuntu16.04.1
>> priority: bug
>>
>> I don't have a "0" value anymore since latest fixes so the  
>> loadchecker process don't crash anymore but still, there is  
>> something strange.
>> Here is a fragment of my loadchecker logs from this morning.
>> Just to give you the context, I have 22 servers which are all  
>> automaticaly started at 6 AM (wake on lan) and they are absolutely  
>> the same (blade servers with same CPU, memory amount, bios version,  
>> ...).
>> I checked our monitoring to see if users were correctly distributed  
>> over the farm and at 7:30AM, I had about 7 or 8 users connected but  
>> 4 of them were on tce-server-21 where I should have had 1 user on 8  
>> servers.
>
> Have you seen this issues more often? Does it hop from one server to  
> another or occur on more than one server at a time?
>
>> Here is the loadchecker log fragment :
>>
>> root@tce-manager-01 [~] # grep -B 1 'loadavgXX:1;'  
>> /var/log/x2gobroker/loadchecker.log
>> ...
>> 2018-07-24 07:15:01,200 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:15:01,622 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23810;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:17:50,354 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:17:50,779 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23686; myMemAvail:23812;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:20:32,550 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:20:32,964 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:23:21,610 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:23:22,034 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:26:03,872 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:26:04,286 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:28:52,917 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:28:53,338 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:31:35,252 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:31:35,670 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811;  
>> numCPU:16; typeCPU:2400;
>> --
>> 2018-07-24 07:34:24,424 - loadchecker - INFO - Executing agent  
>> command on remote host tce-server-21 (10.50.0.221): sh -c  
>> '/usr/lib/x2go/x2gobroker-agent foo checkload'
>> 2018-07-24 07:34:24,842 - loadchecker - INFO - Broker agent  
>> answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809;  
>> numCPU:16; typeCPU:2400;
>
> The log message "Broker agent answered:" comes directly from X2Go  
> Broker Agent. It is basically its raw output.
>
> This means, that the flaw must be in x2gobroker-agent.pl on the  
> remote X2Go Server. Or that the loadchecker stops querying the  
> broker agent and re-uses old data.
>
> Looking at x2gobroker-agent.pl: If we focus on the loadavgXX for  
> now, we come to the conclusion, that the load was really "0" or it  
> was negative (both gives us a loadavgXX value of "1". The value  
> should normally be greater (system load of 1.0 brings a loadavgXX of  
> 100).
>
> Looking at x2gobroker.agent.py: As the values always change  
> slightly, we can't say that Python provides us the same return  
> result string all the time. The query to the broker agent must have  
> happened.
>
> We need to do more debugging if this issue reoccurs:
>
>   * run '/usr/lib/x2go/x2gobroker-agent foo checkload' on the  
> affected X2Go Server
>     and see if the reported values match with what the load checker sees.
>
>   * check if it is reoccuring on the same X2Go Server
>
>   * if /usr/lib/x2go/x2gobroker-agent returns a load of zero,
>     look at /proc/loadavg
>
>   * and /proc/sys/vm/min_free_kbytes,
>     /proc/meminfo
>     /proc/cpuinfo
>
> ... and report all back here...
>
>> As you can see, there is only 1 server with a loadavgXX = 1 (which  
>> means that in fact, we got a zero value from the broker agent).
>> This is not normal, at 7:34, there were 4 users already connected  
>> to this server and most of my other servers were empty.
>
>> Restarting x2gobroker-loadchecker service fixed the issue.
>
> Considering the above analysis that the issue must come from  
> x2gobroker-agent.pl, a restart of the loadchecker can in theory not  
> solve such an issue.
>
> Can you see the x2gobroker-agent.pl process appear and disappear in  
> the process list on the remote X2Go Server? Or does it stay open,  
> even zombied?
>
>> I think there is a problem in retrieving this informations ... even  
>> memAvail seem strange on this server to me ... with 4 connected  
>> users, it should have been lower than that.
>
> Hmmm... Ok... Maybe the wrong server got tested? Two identical IPs  
> on the subnet?
>
>> I also think the number of connected users should be taken into  
>> account when calculating the load factor (maybe this is already the  
>> case, not sure about that).
>
> Yes, we take the number of sessions into account. But that is not  
> provided by the broker agent, but is available in the X2Go Server  
> database and queried from there.
>
> Mike

Request from Walid on IRC. Not an issue anymore.

Thus, closing...

Mike
-- 

DAS-NETZWERKTEAM
mike gabriel, herweg 7, 24357 fleckeby
mobile: +49 (1520) 1976 148
landline: +49 (4354) 8390 139

GnuPG Fingerprint: 9BFB AEE8 6C0A A5FF BF22  0782 9AF4 6B30 2577 1B31
mail: mike.gabriel@das-netzwerkteam.de, http://das-netzwerkteam.de

[Message part 2 (application/pgp-signature, inline)]

Send a report that this bug log contains spam.


X2Go Developers <owner@bugs.x2go.org>. Last modified: Thu Nov 21 20:38:50 2024; Machine Name: ymir.das-netzwerkteam.de

X2Go Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.