Saying NO to messy user agents

I’ll ask you what should be a simple question and let’s see if you know the answer. What is the user agent of the browser you’re currently using? Well… granted it’s not something people try to remember, but that almost all the user agents start with “Mozilla/x.0 (…” doesn’t help.

Here are my current user agents:
Firefox: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
IE: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Sure you can tell them apart, but what’s with all the junk? Why do they both start with Mozilla, what’s the junk at the end of IE’s UA and why does Firefox has Windows in the string twice? In case you’re interested, the reason “Mozilla” exists in just about every UA string around is mentioned here

Why not start afresh and make it properly this time around. What works well? What kind of a system should the new UA be like? Well… think about it for a minute, the UA string usually holds information such as browser name, version, OS name and OS version. In essence it’s attribute, value, attribute, value. Why not make it like an XML tag? You could immediately scrap the < and > as they’d be pointless, but otherwise taking this approach should work.

Unlike XML the attributes should be fixed. Not having them fixed means we’ll end up with the same mess we have at the moment, on the other hand it means we have to put great thought to what the attributes are. Firstly the most important one would be UAtype which could equal browser, spider, download manager or other. The last option would include such things as PHP, applets loading a page and people’s own programs which download a file (unless they are download managers of course). It could also be used to inform the server that this could be a braille browser or a screen reader, though maybe they should come under another optional attribute.

The other attributes could be Organisation which would be the organisation which made the agent, UAname, which would be the name of the software, UAversion, which would be 1.5 in the case of firefox and 6.0 in the case of IE, OSname and OSversion. This would make my firefox UA string:
UAtype=”browser” Organisation=”Mozilla” UAname=”Firefox” UAversion=”1.5″ OSname=”Windows” OSversion=”XP Pro SP2″
This is much easier to read and understand. Obviously each attribute should be optional but you shouldn’t be able to add your own. Well… you could add your own but don’t expect it to make a difference.

Why do I want all this done? Image the difficulties that software developers have in creating the software that logs who visits your sites. With this is would suddenly be so very easy. If nothing else this would promote competition in that area which is always a good thing. It would also help the web developers who write JavaScript which is browser dependant and have to parse the UA string for various bits and bobs.

The change would not have to be overnight, it could be a gradual process of acceptance (or rejection, depending on how you look at it). New browsers could have their regular user agent and a second header called XMLUA which follows the pattern described here. Obviously there would be no way for PHP or JavaScript to read this new XMLUA header until the functions/variables are built into the languages. One possibility is to wait for the functions to be created or else the web servers could see if a XMLUA header exists and if so then replace the regular user agent with the new one. This would in fact brake the existing browser detection functions in every language so waiting for the languages to upgrade would be the ideal solution.

Eventually the regular UA string could be dropped.

Maybe you feel I missed out a crucial attribute, maybe you feel the current system works fine (why fix what ain’t broke?). Let me know.

  • ido

    I think on Microsoft here, that have history of “breaking the rules”. I can’t see how new standard will be implemented without any something that will cause companies such as Microsoft to use the standard “AS-IS” without add or modify things for it.

    IMHO you should start with making Microsoft and other companies to adopt standards first, then you can go and do something like changing ID of a user agent.

  • Ravi Char

    Nice idea Sid.

  • Jamie Anderson

    The biggest problem here is backwards compatability.

    The “Mozilla/” part of the user-agent is quite historical. Back in the dark ages of the net, some web pages would break quite severely in different browsers. One browser would would fine, others would show a mess. To fix this, the web server would serve up different versions of the page to different browsers.

    You then get the problem of what happens when a new browser comes out that can render the page correctly. The web server probably doesn’t know about the browser, and will dish out the “compatability” version of the page. So, to get the “new” version of the page, it has to pretend to be a known “non-broken” browser.

    Of course, web sites expect this now, so it’s hard to get out of the practice.

    As for the name-value pairs, you haven’t quite got it right. Looking at the IE user agent, you have:

    “Mozilla/4.0″ = I’m pretending to be Mozilla
    “(compatible;” = I’m just a compatible clone
    “MSIE 6.0;” = I’m actually IE 6.0
    “Windows NT 5.1;” = I’m running on Win XP
    “SV1;” = “Security Version 1″ (XP SP2)
    “.NET CLR 1.1.4322)” = The .NET 1.1 framework is available

  • John Lascurettes


    If there are still sites created in the “dark ages” of the Web that still rely on the “Mozilla/” string, then they aren’t even worth visiting. Get real. It’s time for sites that do that sort of thing to be left by the wayside. And it’s not like one cannot simply spoof the UA string with a modern browser (including IE7) if need be to fool such old stupid sites.

  • Juha-Matti

    An interesting comment.
    “Last modified” date is August 13, 2005, in turn.

  • Sid

    Thanks for that link… certainly a good read.

  • Joe

    What a User-Agent string should look like is already spelled out in RFC 1945 and RFC 2068.
    It should look like one or more of: a product token (no white space) followed by a slash (“/”) followed by the product-version token optionally followed by a comment in parentheses (“(“,”)”). Tokens consist of alpha-numerics and some other printable characters (not “tspecials”), no white space or controls.
    So for example:
    Mozilla/4.0 (this is a comment by the Mozilla product)

    BarneysBestBrowser/SecondVer (my favorite color is green with yellow stripes) WebRenderLibrary/6.2-3 ffCrypto/81 (no comment for web render but this comment for ffcrypto (comments can contain nested comments) but are otherwise not allowed to contain ( and ) independently)

    Products are supposed to be ordered in the significance of identifying the application.
    So for example if Microsoft could read the RFCs:
    MSIE/6.0 (.NET CLR 1; what ever comments that they feel are useful) Windows/XPsp2

    So, I would contend that, the User-Agent string if used as intended is not really that messy.

    Now yes I realize that the vendors (Microsoft) has done its own thing and that some servers act on the User-Agent string such as historically only sending “advanced content (e.g. frames)” to Mozilla/4.0 browsers. But the servers should not be looking at anything in the comments, and any format or meaning of the comment is relative to the product/version token pair before it. And the servers should be spelling out that they are sending different content with the Vary response header including User-Agent if that is one of the things that they vary over. (e.g. “Vary: User-Agent Accept-Language”)

    So unfortunately while your article recognizes problems with the User-Agent string; the vendors can not even get the current specifications correct – they would probably find some way to “bollux-up” your proposal as well.