From unknown Sun Apr 12 06:26:30 2026
X-Loop: owner@bugs.x2go.org
Subject: Bug#1229: [X2Go-Dev] x2go rejects usernames starting with digits, incorrectly
Reply-To: "Norman Gray" <gray@nxg.name>, 1229@bugs.x2go.org
Resent-From: "Norman Gray" <gray@nxg.name>
Resent-To: x2go-dev@lists.x2go.org
Resent-CC: X2Go Developers <x2go-dev@lists.x2go.org>
X-Loop: owner@bugs.x2go.org
Resent-Date: Sun, 29 Oct 2017 12:25:01 +0000
Resent-Message-ID: <handler.1229.B1229.15092797207008@bugs.x2go.org>
Resent-Sender: owner@bugs.x2go.org
X-X2Go-PR-Message: followup 1229
X-X2Go-PR-Package: x2goserver
X-X2Go-PR-Keywords: 
Received: via spool by 1229-submit@bugs.x2go.org id=B1229.15092797207008
          (code B ref 1229); Sun, 29 Oct 2017 12:25:01 +0000
Received: (at 1229) by bugs.x2go.org; 29 Oct 2017 12:22:00 +0000
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
	ymir.das-netzwerkteam.de
X-Spam-Level: 
X-Spam-Status: No, score=-4.7 required=3.0 tests=BAYES_00,HTML_MESSAGE,
	RCVD_IN_MSPIKE_H2,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.1
Received: from localhost (localhost [127.0.0.1])
	by ymir.das-netzwerkteam.de (Postfix) with ESMTP id A41EC5DAD1
	for <1229@bugs.x2go.org>; Sun, 29 Oct 2017 13:21:51 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at ymir.das-netzwerkteam.de
Received: from ymir.das-netzwerkteam.de ([127.0.0.1])
	by localhost (ymir.das-netzwerkteam.de [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id d1tX1EwUkmRF for <1229@bugs.x2go.org>;
	Sun, 29 Oct 2017 13:21:43 +0100 (CET)
X-Greylist: delayed 534 seconds by postgrey-1.35 at ymir.das-netzwerkteam.de; Sun, 29 Oct 2017 13:21:43 CET
Received: from smtp103.iad3a.emailsrvr.com (smtp103.iad3a.emailsrvr.com [173.203.187.103])
	by ymir.das-netzwerkteam.de (Postfix) with ESMTPS id 636C75DA81
	for <1229@bugs.x2go.org>; Sun, 29 Oct 2017 13:21:43 +0100 (CET)
Received: from smtp21.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
	by smtp21.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id EC9B6250EE;
	Sun, 29 Oct 2017 08:12:47 -0400 (EDT)
X-Auth-ID: gray@nxg.name
Received: by smtp21.relay.iad3a.emailsrvr.com (Authenticated sender: gray-AT-nxg.name) with ESMTPSA id 52C5E24FAF;
	Sun, 29 Oct 2017 08:12:47 -0400 (EDT)
X-Sender-Id: gray@nxg.name
Received: from [192.168.1.50] (164.70.2.81.in-addr.arpa [81.2.70.164])
	(using TLSv1.2 with cipher AES256-GCM-SHA384)
	by 0.0.0.0:25 (trex/5.7.12);
	Sun, 29 Oct 2017 08:12:47 -0400
From: "Norman Gray" <gray@nxg.name>
To: "Mihai Moldovan" <ionic@ionic.de>
Cc: 1229@bugs.x2go.org
Date: Sun, 29 Oct 2017 12:11:51 +0000
Message-ID: <77C784F6-998D-4DC3-AF1F-E44DB2159CFE@nxg.name>
In-Reply-To: <5ef891d1-4119-59e0-b503-ee301f539ba5@ionic.de>
References: <2862B49A-8FA8-4EF0-AB61-AC0B863EB3ED@nxg.name>
 <5ef891d1-4119-59e0-b503-ee301f539ba5@ionic.de>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="=_MailMate_38ED0D30-D097-4E3E-BEDE-8C2D9EB7419F_="
Content-Transfer-Encoding: 8bit
X-Mailer: MailMate (1.9.7r5425)


--=_MailMate_38ED0D30-D097-4E3E-BEDE-8C2D9EB7419F_=
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit


Mihai, hello.

On 28 Oct 2017, at 9:09, Mihai Moldovan wrote:

> On 10/27/2017 06:51 PM, Norman Gray wrote:
>> At present, x2goserver sanitises usernames with a regexp in 
>> x2goutils.pm
>> and in x2gosqlitewrapper.pl (same in both places). [...]
>
> Just to make it clear - we're not really "validating user names". I 
> couldn't
> care less about the user name as such - it's the system's 
> responsibility to deal
> with user names and if users managed to login, we can assume the user 
> name is
> valid, like you've already written on the -dev mailing list and here.
[...]
> What we really do in this part of code is validating a session ID, 
> which happens
> to contain a user name. Sadly, as such, what we see as a user name 
> must be
> correctly represented in order to check the session ID.
>
> Generally, a session ID should look like that:
>   <username>-<DISPLAY number>-<UNIX timestamp denoting session 
> creation
> time>_st<string representation of session type>_dp<DISPLAY bit depth>

Righto -- that makes perfect sense: there did look like there was more 
going on there than mere validation.

Parenthetically (because it would imply changes well beyond the scope of 
this bug report), something like u<uid>-... would be easy to assemble at 
this point, and be totally bomb-proof (but obviously doesn't help if 
it's the user _name_ you need later on).

But returning to your points...

>> Note that the test may in fact be redundant, since if this code is 
>> being
>> run, then the corresponding user has already logged on to the system, 
>> so
>> that the username has already been verified as valid and existing.
>
> In theory it's redundant. But there is a possibility that we are 
> reading garbage
> data, where ever that might come from. Any bug (including our scripts 
> messing up
> splitting up something, or inserting something invalid into the 
> database and
> reading it again later) could trigger such a situation, so IMHO 
> validation of
> input strings is really not redundant.

Sanity checks are good!

>>    * POSIX/Single Unix says of the username simply "To be portable
>> across systems conforming to POSIX.1-2008, the value is composed of
>> characters from the portable filename character set. The 
>> <hyphen-minus>
>> character should not be used as the first character of a portable 
>> user
>> name." (see <http://pubs.opengroup.org/onlinepubs/9699919799/>,
>> paragraph 3.437)
>
> So, hyphen is prohibited as the first character. Also, SUS recommends 
> (but does
> not enforce) using the portable filename character set[0] only for 
> portability,
> which is restricted to [A-Za-z0-9._-]. Specifically, this does not 
> include any
> special characters like umlauts, accented characters or generally any 
> other
> Unicode character.
>
>>    * The Debian useradd(8) page recommends something matching
>> /^[a-z_][a-z0-9_-]*$/, but goes on to say "On Debian, the only
>> constraints are that usernames must neither start with a dash ('-') 
>> nor
>> contain a colon (':') or a whitespace (space: ' ', end of line: '\n',
>> tabulation: '\t', etc.). Note that using a slash ('/') may break the
>> default algorithm for the definition of the user's home directory." 
>> (see
>> eg <https://www.unix.com/man-page/linux/8/useradd/>)
>
> This is a bit stricter than the SUS definition (ignoring the 
> portability
> recommendation). If taken at face value, Debian allows any Unicode 
> character but
> the restricted ones. Interestingly, the recommended matching regexp 
> doesn't
> include uppercase characters and, arguably more interestingly, doesn't 
> allow a
> user name to start with a digit (which would be problematic for you).
>
>
>>    * The corresponding RedHat/CentOS manpage doesn't even include 
>> that,
>> and instead says only "Usernames may only be up to 32 characters 
>> long."
>> FreeBSD is similarly laid-back about the username.
>
> This is interesting as well, since it's the first document that 
> mentions a
> maximum length. If interpreted directly, the previous documents did 
> not restrict
> the length (unless you haven't pasted some information relating to the 
> string
> length).


I wouldn't be at all astonished to see unicode usernames before long.  
It's the sort of thing Apple or RedHat would do, and which it appears 
Debian might already do in principle (if not much in practice); and 
since IRIs, for example, can now at least indirectly support all of 
Unicode in the DNS, the idea of a non-ASCII üsernamé is less 
outlandish than it might once have been.  It would go beyond SUS, and so 
would be a big deal, but it might be worth having x2go aim for a 
solution which is robust against that, and so which would solve the 
issue once and for all.

Hmm: one possibility would be to put the uid in the session string 
(though I appreciate, as above, that may not work for x2go for other 
reasons).

Another would be to run the username through a punycode converter 
<https://en.wikipedia.org/wiki/Punycode> as with IRIs: any characters in 
[a-zA-Z0-9-] would come through that unchanged, but others would be 
normalised.  This would be an invisible change for most usernames.  As a 
normalisation, it also has the advantage that it's reversible if need 
be.  I can't remember -- and that Wikipedia page doesn't refresh my 
memory -- exactly what subset of characters comes through a punycode 
conversion unchanged, so this would require a little further thought.

But again, these go beyond the immediate scope of this present issue.

> The @ character indeed has been added to allow email-address-like user 
> names as
> part of https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=573
>
>
> Allowing $ as trailing characters has been part from the start, though 
> I
> honestly don't understand why.

I share your puzzlement with that one.  Is it possible this was the 
end-of-string pattern in the regexp, which got in to the allowable 
trailing characters by mistake?

> In theory, /^[A-Za-z0-9._][A-Za-z0-9._-@]*/ should be more liberal, 
> not restrict
> the length, allow portable user names and expand on that by allowing
> domain-based user names as well. I'd drop the trailing $.

That looks very plausible to me.

Best wishes,

Norman


-- 
Norman Gray  :  https://nxg.me.uk
--=_MailMate_38ED0D30-D097-4E3E-BEDE-8C2D9EB7419F_=
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/xhtml; charset=3Dutf-8"=
>
<style>
div.markdown { white-space: normal; }
div.plaintext { white-space: normal; }
body { font-family: sans-serif; }
h1 { font-size: 1.4em; }
h2 { font-size: 1.2em; }
h3 { font-size: 1.1em; }
blockquote { margin: 0 0 5px; padding-left: 5px; border-left: 2px solid #=
777777; color: #777777; }
blockquote blockquote { border-left-color: #999999; color: #999999; }
blockquote blockquote blockquote { border-left-color: #BBBBBB; color: #BB=
BBBB; }
a { color: #3983C4 }
blockquote a { color: #777777; }
blockquote blockquote a { color: #999999; }
blockquote blockquote blockquote a { color: #BBBBBB; }
math[display=3D"inline"] > mrow { padding:5px; }
div.footnotes li p { margin: 0.2em 0; }
</style>
</head>
<body>
<div class=3D"markdown">
<p dir=3D"auto">Mihai, hello.</p>

<p dir=3D"auto">On 28 Oct 2017, at 9:09, Mihai Moldovan wrote:</p>

</div>
<div class=3D"plaintext"><blockquote><p dir=3D"auto">On 10/27/2017 06:51 =
PM, Norman Gray wrote:</p>
<blockquote><p dir=3D"auto">At present, x2goserver sanitises usernames wi=
th a regexp in x2goutils.pm<br>
and in x2gosqlitewrapper.pl (same in both places). [...]</p>
</blockquote><p dir=3D"auto">Just to make it clear - we&#39;re not really=
 &quot;validating user names&quot;. I couldn&#39;t<br>
care less about the user name as such - it&#39;s the system&#39;s respons=
ibility to deal<br>
with user names and if users managed to login, we can assume the user nam=
e is<br>
valid, like you&#39;ve already written on the -dev mailing list and here.=
</p>
</blockquote></div>
<div class=3D"markdown">

<p dir=3D"auto">[...]</p>

</div>
<div class=3D"plaintext"><blockquote><p dir=3D"auto">What we really do in=
 this part of code is validating a session ID, which happens<br>
to contain a user name. Sadly, as such, what we see as a user name must b=
e<br>
correctly represented in order to check the session ID.<br>
<br>
Generally, a session ID should look like that:<br>
  &lt;username&gt;-&lt;DISPLAY number&gt;-&lt;UNIX timestamp denoting ses=
sion creation<br>
time&gt;_st&lt;string representation of session type&gt;_dp&lt;DISPLAY bi=
t depth&gt;</p>
</blockquote></div>
<div class=3D"markdown">

<p dir=3D"auto">Righto -- that makes perfect sense: there did look like t=
here was more going on there than mere validation.</p>

<p dir=3D"auto">Parenthetically (because it would imply changes well beyo=
nd the scope of this bug report), something like u&lt;uid&gt;-... would b=
e easy to assemble at this point, and be totally bomb-proof (but obviousl=
y doesn't help if it's the user <em>name</em> you need later on).</p>

<p dir=3D"auto">But returning to your points...</p>

</div>
<div class=3D"plaintext"><blockquote><blockquote><p dir=3D"auto">Note tha=
t the test may in fact be redundant, since if this code is being<br>
run, then the corresponding user has already logged on to the system, so<=
br>
that the username has already been verified as valid and existing.</p>
</blockquote><p dir=3D"auto">In theory it&#39;s redundant. But there is a=
 possibility that we are reading garbage<br>
data, where ever that might come from. Any bug (including our scripts mes=
sing up<br>
splitting up something, or inserting something invalid into the database =
and<br>
reading it again later) could trigger such a situation, so IMHO validatio=
n of<br>
input strings is really not redundant.</p>
</blockquote></div>
<div class=3D"markdown">

<p dir=3D"auto">Sanity checks are good!</p>

</div>
<div class=3D"plaintext"><blockquote><blockquote><p dir=3D"auto">   * POS=
IX/Single Unix says of the username simply &quot;To be portable<br>
across systems conforming to POSIX.1-2008, the value is composed of<br>
characters from the portable filename character set. The &lt;hyphen-minus=
&gt;<br>
character should not be used as the first character of a portable user<br=
>
name.&quot; (see &lt;<a href=3D"http://pubs.opengroup.org/onlinepubs/9699=
919799/">http://pubs.opengroup.org/onlinepubs/9699919799/</a>&gt;,<br>
paragraph 3.437)</p>
</blockquote><p dir=3D"auto">So, hyphen is prohibited as the first charac=
ter. Also, SUS recommends (but does<br>
not enforce) using the portable filename character set[0] only for portab=
ility,<br>
which is restricted to [A-Za-z0-9._-]. Specifically, this does not includ=
e any<br>
special characters like umlauts, accented characters or generally any oth=
er<br>
Unicode character.<br>
</p>
<blockquote><p dir=3D"auto">   * The Debian useradd(8) page recommends so=
mething matching<br>
/^[a-z_][a-z0-9_-]*$/, but goes on to say &quot;On Debian, the only<br>
constraints are that usernames must neither start with a dash (&#39;-&#39=
;) nor<br>
contain a colon (&#39;:&#39;) or a whitespace (space: &#39; &#39;, end of=
 line: &#39;\n&#39;,<br>
tabulation: &#39;\t&#39;, etc.). Note that using a slash (&#39;/&#39;) ma=
y break the<br>
default algorithm for the definition of the user&#39;s home directory.&qu=
ot; (see<br>
eg &lt;<a href=3D"https://www.unix.com/man-page/linux/8/useradd/">https:/=
/www.unix.com/man-page/linux/8/useradd/</a>&gt;)</p>
</blockquote><p dir=3D"auto">This is a bit stricter than the SUS definiti=
on (ignoring the portability<br>
recommendation). If taken at face value, Debian allows any Unicode charac=
ter but<br>
the restricted ones. Interestingly, the recommended matching regexp doesn=
&#39;t<br>
include uppercase characters and, arguably more interestingly, doesn&#39;=
t allow a<br>
user name to start with a digit (which would be problematic for you).<br>=

<br>
</p>
<blockquote><p dir=3D"auto">   * The corresponding RedHat/CentOS manpage =
doesn&#39;t even include that,<br>
and instead says only &quot;Usernames may only be up to 32 characters lon=
g.&quot;<br>
FreeBSD is similarly laid-back about the username.</p>
</blockquote><p dir=3D"auto">This is interesting as well, since it&#39;s =
the first document that mentions a<br>
maximum length. If interpreted directly, the previous documents did not r=
estrict<br>
the length (unless you haven&#39;t pasted some information relating to th=
e string<br>
length).</p>
</blockquote></div>
<div class=3D"markdown">

<p dir=3D"auto">I wouldn't be at all astonished to see unicode usernames =
before long.  It's the sort of thing Apple or RedHat would do, and which =
it appears Debian might already do in principle (if not much in practice)=
; and since IRIs, for example, can now at least indirectly support all of=
 Unicode in the DNS, the idea of a non-ASCII =C3=BCsernam=C3=A9 is less o=
utlandish than it might once have been.  It would go beyond SUS, and so w=
ould be a big deal, but it might be worth having x2go aim for a solution =
which is robust against that, and so which would solve the issue once and=
 for all.</p>

<p dir=3D"auto">Hmm: one possibility would be to put the uid in the sessi=
on string (though I appreciate, as above, that may not work for x2go for =
other reasons).  </p>

<p dir=3D"auto">Another would be to run the username through a punycode c=
onverter <a href=3D"https://en.wikipedia.org/wiki/Punycode">https://en.wi=
kipedia.org/wiki/Punycode</a> as with IRIs: any characters in [a-zA-Z0-9-=
] would come through that unchanged, but others would be normalised.  Thi=
s would be an invisible change for most usernames.  As a normalisation, i=
t also has the advantage that it's reversible if need be.  I can't rememb=
er -- and that Wikipedia page doesn't refresh my memory -- exactly what s=
ubset of characters comes through a punycode conversion unchanged, so thi=
s would require a little further thought.</p>

<p dir=3D"auto">But again, these go beyond the immediate scope of this pr=
esent issue.</p>

</div>
<div class=3D"plaintext"><blockquote><p dir=3D"auto">The @ character inde=
ed has been added to allow email-address-like user names as<br>
part of <a href=3D"https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=3D573"=
>https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=3D573</a><br>
<br>
<br>
Allowing $ as trailing characters has been part from the start, though I<=
br>
honestly don&#39;t understand why.</p>
</blockquote></div>
<div class=3D"markdown">

<p dir=3D"auto">I share your puzzlement with that one.  Is it possible th=
is was the end-of-string pattern in the regexp, which got in to the allow=
able trailing characters by mistake?</p>

</div>
<div class=3D"plaintext"><blockquote><p dir=3D"auto">In theory, /^[A-Za-z=
0-9._][A-Za-z0-9._-@]*/ should be more liberal, not restrict<br>
the length, allow portable user names and expand on that by allowing<br>
domain-based user names as well. I&#39;d drop the trailing $.</p>
</blockquote></div>
<div class=3D"markdown">

<p dir=3D"auto">That looks very plausible to me.</p>

<p dir=3D"auto">Best wishes,</p>

<p dir=3D"auto">Norman</p>

<p dir=3D"auto">-- <br>
Norman Gray  :  <a href=3D"https://nxg.me.uk">https://nxg.me.uk</a></p>
</div>

</body>
</html>

--=_MailMate_38ED0D30-D097-4E3E-BEDE-8C2D9EB7419F_=--
