as with IRIs: any characters in [a-zA-Z0-9-] would come through that unchanged, but others would be normalised. This would be an invisible change for most usernames. As a normalisation, it also has the advantage that it's reversible if need be. I can't remember -- and that Wikipedia page doesn't refresh my memory -- exactly what subset of characters comes through a punycode conversion unchanged, so this would require a little further thought. But again, these go beyond the immediate scope of this present issue. > The @ character indeed has been added to allow email-address-like user > names as > part of https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=573 > > > Allowing $ as trailing characters has been part from the start, though > I > honestly don't understand why. I share your puzzlement with that one. Is it possible this was the end-of-string pattern in the regexp, which got in to the allowable trailing characters by mistake? > In theory, /^[A-Za-z0-9._][A-Za-z0-9._-@]*/ should be more liberal, > not restrict > the length, allow portable user names and expand on that by allowing > domain-based user names as well. I'd drop the trailing $. That looks very plausible to me. Best wishes, Norman -- Norman Gray : https://nxg.me.uk --=_MailMate_38ED0D30-D097-4E3E-BEDE-8C2D9EB7419F_= Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

Mihai, hello.

On 28 Oct 2017, at 9:09, Mihai Moldovan wrote:

On 10/27/2017 06:51 = PM, Norman Gray wrote:

At present, x2goserver sanitises usernames wi= th a regexp in x2goutils.pm
and in x2gosqlitewrapper.pl (same in both places). [...]

Just to make it clear - we're not really= "validating user names". I couldn't
care less about the user name as such - it's the system's respons= ibility to deal
with user names and if users managed to login, we can assume the user nam= e is
valid, like you've already written on the -dev mailing list and here.=

[...]

What we really do in= this part of code is validating a session ID, which happens
to contain a user name. Sadly, as such, what we see as a user name must b= e
correctly represented in order to check the session ID.

Generally, a session ID should look like that:
<username>-<DISPLAY number>-<UNIX timestamp denoting ses= sion creation
time>_st<string representation of session type>_dp<DISPLAY bi= t depth>

Righto -- that makes perfect sense: there did look like t= here was more going on there than mere validation.

Parenthetically (because it would imply changes well beyo= nd the scope of this bug report), something like u<uid>-... would b= e easy to assemble at this point, and be totally bomb-proof (but obviousl= y doesn't help if it's the user name you need later on).

But returning to your points...

Note tha= t the test may in fact be redundant, since if this code is being
run, then the corresponding user has already logged on to the system, so<= br> that the username has already been verified as valid and existing.

In theory it's redundant. But there is a= possibility that we are reading garbage
data, where ever that might come from. Any bug (including our scripts mes= sing up
splitting up something, or inserting something invalid into the database = and
reading it again later) could trigger such a situation, so IMHO validatio= n of
input strings is really not redundant.

Sanity checks are good!

* POS= IX/Single Unix says of the username simply "To be portable
across systems conforming to POSIX.1-2008, the value is composed of
characters from the portable filename character set. The <hyphen-minus= >
character should not be used as the first character of a portable user name." (see <http://pubs.opengroup.org/onlinepubs/9699919799/>,
paragraph 3.437)

So, hyphen is prohibited as the first charac= ter. Also, SUS recommends (but does
not enforce) using the portable filename character set[0] only for portab= ility,
which is restricted to [A-Za-z0-9._-]. Specifically, this does not includ= e any
special characters like umlauts, accented characters or generally any oth= er
Unicode character.

* The Debian useradd(8) page recommends so= mething matching
/^[a-z_][a-z0-9_-]*$/, but goes on to say "On Debian, the only
constraints are that usernames must neither start with a dash ('-'= ;) nor
contain a colon (':') or a whitespace (space: ' ', end of= line: '\n',
tabulation: '\t', etc.). Note that using a slash ('/') ma= y break the
default algorithm for the definition of the user's home directory.&qu= ot; (see
eg <https:/= /www.unix.com/man-page/linux/8/useradd/>)

This is a bit stricter than the SUS definiti= on (ignoring the portability
recommendation). If taken at face value, Debian allows any Unicode charac= ter but
the restricted ones. Interestingly, the recommended matching regexp doesn= 't
include uppercase characters and, arguably more interestingly, doesn'= t allow a
user name to start with a digit (which would be problematic for you).
=

* The corresponding RedHat/CentOS manpage = doesn't even include that,
and instead says only "Usernames may only be up to 32 characters lon= g."
FreeBSD is similarly laid-back about the username.

This is interesting as well, since it's = the first document that mentions a
maximum length. If interpreted directly, the previous documents did not r= estrict
the length (unless you haven't pasted some information relating to th= e string
length).

I wouldn't be at all astonished to see unicode usernames = before long. It's the sort of thing Apple or RedHat would do, and which = it appears Debian might already do in principle (if not much in practice)= ; and since IRIs, for example, can now at least indirectly support all of= Unicode in the DNS, the idea of a non-ASCII =C3=BCsernam=C3=A9 is less o= utlandish than it might once have been. It would go beyond SUS, and so w= ould be a big deal, but it might be worth having x2go aim for a solution = which is robust against that, and so which would solve the issue once and= for all.

Hmm: one possibility would be to put the uid in the sessi= on string (though I appreciate, as above, that may not work for x2go for = other reasons).

Another would be to run the username through a punycode c= onverter https://en.wi= kipedia.org/wiki/Punycode as with IRIs: any characters in [a-zA-Z0-9-= ] would come through that unchanged, but others would be normalised. Thi= s would be an invisible change for most usernames. As a normalisation, i= t also has the advantage that it's reversible if need be. I can't rememb= er -- and that Wikipedia page doesn't refresh my memory -- exactly what s= ubset of characters comes through a punycode conversion unchanged, so thi= s would require a little further thought.

But again, these go beyond the immediate scope of this pr= esent issue.

The @ character inde= ed has been added to allow email-address-like user names as
part of https://bugs.x2go.org/cgi-bin/bugreport.cgi?bug=3D573

Allowing $ as trailing characters has been part from the start, though I<= br> honestly don't understand why.

I share your puzzlement with that one. Is it possible th= is was the end-of-string pattern in the regexp, which got in to the allow= able trailing characters by mistake?

In theory, /^[A-Za-z= 0-9._][A-Za-z0-9._-@]*/ should be more liberal, not restrict
the length, allow portable user names and expand on that by allowing
domain-based user names as well. I'd drop the trailing $.

That looks very plausible to me.

Best wishes,

Norman

--
Norman Gray : https://nxg.me.uk

--=_MailMate_38ED0D30-D097-4E3E-BEDE-8C2D9EB7419F_=--