On 10/27/2017 06:51 PM, Norman Gray wrote:

At present, x2goserver sanitises usernames with a regexp in x2goutils.pm
and in x2gosqlitewrapper.pl (same in both places). [...]

Just to make it clear - we're not really "validating user names". I couldn't
care less about the user name as such - it's the system's responsibility to deal
with user names and if users managed to login, we can assume the user name is
valid, like you've already written on the -dev mailing list and here.

What we really do in this part of code is validating a session ID, which happens
to contain a user name. Sadly, as such, what we see as a user name must be
correctly represented in order to check the session ID.

Generally, a session ID should look like that:
<username>-<DISPLAY number>-<UNIX timestamp denoting session creation
time>_st<string representation of session type>_dp<DISPLAY bit depth>

Righto -- that makes perfect sense: there did look like there was more going on there than mere validation.

Parenthetically (because it would imply changes well beyond the scope of this bug report), something like u<uid>-... would be easy to assemble at this point, and be totally bomb-proof (but obviously doesn't help if it's the user name you need later on).

But returning to your points...

Note that the test may in fact be redundant, since if this code is being
run, then the corresponding user has already logged on to the system, so
that the username has already been verified as valid and existing.

In theory it's redundant. But there is a possibility that we are reading garbage
data, where ever that might come from. Any bug (including our scripts messing up
splitting up something, or inserting something invalid into the database and
reading it again later) could trigger such a situation, so IMHO validation of
input strings is really not redundant.

* POSIX/Single Unix says of the username simply "To be portable
across systems conforming to POSIX.1-2008, the value is composed of
characters from the portable filename character set. The <hyphen-minus>
character should not be used as the first character of a portable user
name." (see <http://pubs.opengroup.org/onlinepubs/9699919799/>,
paragraph 3.437)

So, hyphen is prohibited as the first character. Also, SUS recommends (but does
not enforce) using the portable filename character set[0] only for portability,
which is restricted to [A-Za-z0-9._-]. Specifically, this does not include any
special characters like umlauts, accented characters or generally any other
Unicode character.

* The Debian useradd(8) page recommends something matching
/^[a-z_][a-z0-9_-]*$/, but goes on to say "On Debian, the only
constraints are that usernames must neither start with a dash ('-') nor
contain a colon (':') or a whitespace (space: ' ', end of line: '\n',
tabulation: '\t', etc.). Note that using a slash ('/') may break the
default algorithm for the definition of the user's home directory." (see
eg <https://www.unix.com/man-page/linux/8/useradd/>)

This is a bit stricter than the SUS definition (ignoring the portability
recommendation). If taken at face value, Debian allows any Unicode character but
the restricted ones. Interestingly, the recommended matching regexp doesn't
include uppercase characters and, arguably more interestingly, doesn't allow a
user name to start with a digit (which would be problematic for you).

* The corresponding RedHat/CentOS manpage doesn't even include that,
and instead says only "Usernames may only be up to 32 characters long."
FreeBSD is similarly laid-back about the username.

This is interesting as well, since it's the first document that mentions a
maximum length. If interpreted directly, the previous documents did not restrict
the length (unless you haven't pasted some information relating to the string
length).

I wouldn't be at all astonished to see unicode usernames before long. It's the sort of thing Apple or RedHat would do, and which it appears Debian might already do in principle (if not much in practice); and since IRIs, for example, can now at least indirectly support all of Unicode in the DNS, the idea of a non-ASCII üsernamé is less outlandish than it might once have been. It would go beyond SUS, and so would be a big deal, but it might be worth having x2go aim for a solution which is robust against that, and so which would solve the issue once and for all.

Hmm: one possibility would be to put the uid in the session string (though I appreciate, as above, that may not work for x2go for other reasons).

Another would be to run the username through a punycode converter https://en.wikipedia.org/wiki/Punycode as with IRIs: any characters in [a-zA-Z0-9-] would come through that unchanged, but others would be normalised. This would be an invisible change for most usernames. As a normalisation, it also has the advantage that it's reversible if need be. I can't remember -- and that Wikipedia page doesn't refresh my memory -- exactly what subset of characters comes through a punycode conversion unchanged, so this would require a little further thought.

But again, these go beyond the immediate scope of this present issue.