From unknown Fri Mar 29 07:08:32 2024 X-Loop: owner@bugs.x2go.org Subject: Bug#1315: loadchecker bad auto sleep time leads to faulty service Reply-To: Walid MOGHRABI , 1315@bugs.x2go.org Resent-From: Walid MOGHRABI Resent-To: x2go-dev@lists.x2go.org Resent-CC: owner@bugs.x2go.org X-Loop: owner@bugs.x2go.org Resent-Date: Mon, 13 Aug 2018 10:45:03 +0000 Resent-Message-ID: Resent-Sender: owner@bugs.x2go.org X-X2Go-PR-Message: report 1315 X-X2Go-PR-Package: x2gobroker-loadchecker X-X2Go-PR-Keywords: Received: via spool by submit@bugs.x2go.org id=B.1534157037404 (code B); Mon, 13 Aug 2018 10:45:03 +0000 Received: (at submit) by bugs.x2go.org; 13 Aug 2018 10:43:57 +0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ymir.das-netzwerkteam.de X-Spam-Level: X-Spam-Status: No, score=0.8 required=3.0 tests=BAYES_50 autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (localhost [127.0.0.1]) by ymir.das-netzwerkteam.de (Postfix) with ESMTP id C38DA5DAEA for ; Mon, 13 Aug 2018 12:43:54 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at ymir.das-netzwerkteam.de Received: from ymir.das-netzwerkteam.de ([127.0.0.1]) by localhost (ymir.das-netzwerkteam.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lr5CAo2DXQp6 for ; Mon, 13 Aug 2018 12:43:46 +0200 (CEST) Received: from zm-01.servicemagic.eu (zm-01.servicemagic.eu [176.31.236.17]) by ymir.das-netzwerkteam.de (Postfix) with ESMTPS id C944B5DAE9 for ; Mon, 13 Aug 2018 12:43:46 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zm-01.servicemagic.eu (Postfix) with ESMTP id 9673180A9DF3B for ; Mon, 13 Aug 2018 12:43:46 +0200 (CEST) X-Amavis-Modified: Mail body modified (using disclaimer) - zm-01.servicemagic.eu X-Virus-Scanned: amavisd-new at servicemagic.eu Received: from zm-01.servicemagic.eu ([127.0.0.1]) by localhost (zm-01.servicemagic.eu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9H78cMCfj7aN for ; Mon, 13 Aug 2018 12:43:45 +0200 (CEST) Received: from zm-01.servicemagic.eu (localhost.localdomain [127.0.0.1]) by zm-01.servicemagic.eu (Postfix) with ESMTP id 9586080A6CB54 for ; Mon, 13 Aug 2018 12:43:45 +0200 (CEST) Date: Mon, 13 Aug 2018 12:43:45 +0200 (CEST) From: Walid MOGHRABI To: submit@bugs.x2go.org Message-ID: <97977576.4001790.1534157025573.JavaMail.root@servicemagic.eu> In-Reply-To: <2053729299.4001285.1534156689574.JavaMail.root@servicemagic.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [195.200.167.70] X-Mailer: Zimbra 7.2.0_GA_2669 (ZimbraWebClient - GC68 (Linux)/7.2.0_GA_2669) package: x2gobroker-loadchecker version: 0.0.4.0-0~1038~ubuntu16.04.1 priority: bug I had a problem with broker loadbalancing not working anymore even though the loadchecker service seemed to work well. Looking at the logs, I found an unusual "sleep time" for the loadchecker service then, after that, it seemed to never re-caclulate load on remote servers. --- 2018-08-08 08:43:40,379 - loadchecker - INFO - Executing agent command on remote host tce-server-14 (10.50.0.214): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload' 2018-08-08 08:43:40,812 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:49; memAvail:21904; myMemAvail:22030; numCPU:16; typeCPU:2400; 2018-08-08 08:43:40,812 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-14, new load factor is: 1716558 2018-08-08 08:43:40,813 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server 2018-08-08 08:43:48,051 - loadchecker - INFO - Executing agent command on remote host tce-server-13 (10.50.0.213): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload' 2018-08-08 08:43:48,497 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:126; memAvail:21117; myMemAvail:21242; numCPU:16; typeCPU:2400; 2018-08-08 08:43:48,497 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-13, new load factor is: 643565 2018-08-08 08:43:48,498 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server 2018-08-08 08:43:55,734 - loadchecker - INFO - Executing agent command on remote host tce-server-12 (10.50.0.212): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload' 2018-08-08 08:43:56,162 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:52; memAvail:22260; myMemAvail:22386; numCPU:16; typeCPU:2400; 2018-08-08 08:43:56,162 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-12, new load factor is: 1643815 2018-08-08 08:43:56,163 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server 2018-08-08 08:44:03,404 - loadchecker - INFO - Executing agent command on remote host tce-server-11 (10.50.0.211): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload' 2018-08-08 08:44:03,850 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:123; memAvail:21398; myMemAvail:21524; numCPU:16; typeCPU:2400; 2018-08-08 08:44:03,851 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-11, new load factor is: 668035 2018-08-08 08:44:03,851 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server 2018-08-08 08:44:11,093 - loadchecker - INFO - Executing agent command on remote host tce-server-10 (10.50.0.210): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload' 2018-08-08 08:44:11,532 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:84; memAvail:22133; myMemAvail:22259; numCPU:16; typeCPU:2400; 2018-08-08 08:44:11,532 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-10, new load factor is: 1011794 2018-08-08 08:44:11,533 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server 2018-08-08 08:44:18,774 - loadchecker - INFO - Executing agent command on remote host tce-server-09 (10.50.0.209): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload' 2018-08-08 08:44:19,206 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:93; memAvail:21758; myMemAvail:21883; numCPU:16; typeCPU:2400; 2018-08-08 08:44:19,206 - loadchecker - INFO - LoadChecker.loadchecker(): contacted remote broker agent for backend=inifile, profile_id=WIN-TRAVAUX, hostname=tce-server-09, new load factor is: 898394 2018-08-08 08:44:19,206 - loadchecker - DEBUG - LoadChecker.loadchecker(): sleeping for 6.976744186046512secs before querying next server 2018-08-08 08:44:26,189 - loadchecker - DEBUG - LoadChecker.loadchecker(): performed 44 queries (failures: 0), sleeping for -6.976744186046517secs before starting next query cycle 2018-08-08 08:46:47,377 - loadchecker - DEBUG - LoadCheckServiceHandler.handle_read(): received load check query: backend=inifile, profile_id=TRAVAUX, hostname= 2018-08-08 08:46:47,377 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-11: 763244 2018-08-08 08:46:47,377 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-23: 1742811 2018-08-08 08:46:47,377 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-26: 507295 2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-18: 899328 2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-19: 932160 2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-09: 879966 2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-28: 834700 2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-17: 721955 2018-08-08 08:46:47,378 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-25: 715089 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-29: 1195074 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-20: 1431168 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-27: 1196586 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-31: 1972650 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-13: 735827 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-24: 780717 2018-08-08 08:46:47,379 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-10: 932487 2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-12: 1237871 2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-30: 1371001 2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-21: 587846 2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-14: 1616787 2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-15: 741288 2018-08-08 08:46:47,380 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-22: 924074 2018-08-08 08:46:54,245 - loadchecker - DEBUG - LoadCheckServiceHandler.handle_read(): received load check query: backend=inifile, profile_id=TRAVAUX, hostname= 2018-08-08 08:46:54,245 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-11: 763244 2018-08-08 08:46:54,245 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-23: 1742811 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-26: 507295 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-18: 899328 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-19: 932160 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-09: 879966 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-28: 834700 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-17: 721955 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-25: 715089 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-29: 1195074 2018-08-08 08:46:54,246 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-20: 1431168 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-27: 1196586 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-31: 1972650 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-13: 735827 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-24: 780717 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-10: 932487 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-12: 1237871 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-30: 1371001 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-21: 587846 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-14: 1616787 2018-08-08 08:46:54,247 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-15: 741288 2018-08-08 08:46:54,248 - loadchecker - INFO - LoadCheckServiceHandler.handle_read(): load check result for backend=inifile, profile_id=TRAVAUX, hostname=tce-server-22: 924074 2018-08-08 08:47:21,453 - loadchecker - DEBUG - LoadCheckServiceHandler.handle_read(): received load check query: backend=inifile, profile_id=TRAVAUX, hostname= --- As you can see, there is a strange line with a '-6.976744186046517secs' value for sleep time which seems to be interpreted as an infinite sleep time by the process and then, load is never recalculated : --- 2018-08-08 08:44:26,189 - loadchecker - DEBUG - LoadChecker.loadchecker(): performed 44 queries (failures: 0), sleeping for -6.976744186046517secs before starting next query cycle --- I don't see any reason why this happens and I was unable to reproduce it manually, it is a rare case but it already happened more than once. --- DISCLAIMER: This e-mail is private and confidential and may contain proprietary or legally privileged information. It is for the intended recipient only. If you have received this email in error, please notify the author by replying to it and then destroy it. If you are not the intended recipient you must not use, disclose, distribute, copy, print or rely on this e-mail or any attachment. Thank you