Web Anlytics Wednesday
Web Analytics Wednesday - August 2009
Last night's Web Analytics Wednesday event went down well - at least sixty-five people attended (I'm sure some got past without being checked-off the list)

There are some pics here
Twitter is showing some nice activity via #wawlondon and the Web Analytics Wednesday London LinkedIn group is doing nicely.
Thank you to all who attended - hope to see you again in October - registrations are now open : http://www.sclanalytics.com/waw
Comments via here, email, twitter, linkedin or flickr.
Sets
Online set or list operations tool
This is a online, browser-based tool that implements set operations such as union, intersect, difference on lists of things. It is implemented in Javascript.
Scratch an itch. Right?
I wanted an easy way to take a list of email addresses, de-duplicate them and remove any other names that appear on another list.
This could have been done in several tools (I used a combination of Excel and the GNU utility comm (although it was probably possible to do it all in the spreadsheet), but I wanted a single-use tool that I could pass to other people that was simple, quick enough and I could make changes to.
It could probably be used for other things, but I can't think of any right now.
Web Analytics Wednesday
WAW London - June 17th
We have a Web Analytics Wednesday coming up on the 17th of June in London.

- Presenter - check. (Aurélie Pols)
- Venue - check. (bluu - Moorgate)
- Guests - check. (about eighty and running)
- Drinks - check. (sponsored by Mtracking)
- Shameless promotion - check. :-p
(although there must be things that I've forgotten)
If you want to know more and to register go to http://www.sclanalytics.com/waw
Selective Logging
Selective logging in IIS for fun and profit
A recent question from a client related to why data for part of their site wasn't appearing in their web analytics data (they happen to be using NetInsight).
For reasons that aren't important, this customer isn't using page tags, instead using the raw IIS server logfiles.
They had inadvertently deselected the 'log visits' checkbox for that portion of the site. This was duly re-enabled and the logging then continued and everyone was happy.
Of course logging is generally configured at the ISS site level, but there is an opportunity here to do something a bit smart...
WAW - October
October 2008 - Web Analytics Wednesday - London
Last night was the 10th Web Analytics Wednesday (WAW) event to be managed and sponsored by SCL Analytics. The event was co-sponsored by Coremetrics who also provided the use of their user conference venue.
Over 250 people registered and a 180 people turned-up on the night.

Keyword frequency
long tail of keywords
A long time ago, in land far far away my friend and, at the time, colleague, Matt created a little flash tool to split text or list of keyphrases into individual keywords.
I needed a little keyphrase something that did the same thing, but more hackable and much faster. In hindsight JavaScript was an odd choice, but it seems to work okay, even on stupid inputs.
Axis change
Simple data visualisation
There is a very simple trick that, while obvious once you know it is something that may not be obvious at first.
Many reports, not just from web analytics products look like this:
In this case we are looking at a 'Most Frequent Referrers' report, with a classic bar chart. Don't worry about the data (it happens to be from this site), what I want you to pay attention to is the chart and how pointless it is.
AVG Response
Initial thoughts and response to AVG linkscanner #wa
AVG is doing some interesting things. I think that my own perceptions of what they are up to are biased by my own interest in web analytics - after all, regular users of the software really don't care what their AV is doing.
Useragent filtering
The web analytics platform that I am most used to is Unica Affinium NetInsight, both in the on-premise (and possibily logfile based) and on-demand (hosted, and thus most likely to be JavaScript pagetag based) versions.
Due to the logfile-based nature (at least historically) of NetInsight it has always had the need to filter out Robots/Spiders, monitoring agents and all sorts of other garbage that litters the data. As such it's trivial to segment away or exclude the current AVG useragent, either in your own installation or, with a brief request to the on-demand team, from your hosted install.
Of course, this is already broken - AVG already seem to be altering the useragent string to something that looks completely real, and thus impossible to block all by itself.
Understanding AVG
The main thing that I would like to know about right now is the sort of environment that AVG presents to JavaScript - what sort of screen resolution, locale, plugin list, cookies etc.
If the above presents a recognisable fingerprint it would then be possible to filter based on these multiple criteria.
Of course, it may be the case that it presents the actual environment of the host, which would make things much harder to work with, although I don't think that this is likely to be the case.
How AVG executes
JavaScript pagetags typically create the URL that they are going to request from a complex block of code. I propose (and I stand to be corrected, as AV isn't my thing) there are four main options for how AVG can function.
- Static analysis of the JavaScript
- Sandboxed execution of JavaScript
- Sandboxed execution of JavaScript that allows the tag to 'fire' to the outside world
- Actual execution of JavaScript
Now - I don't *think* it's doing static analysis, although I have colleagues that know about such things - I'll have a word on Monday.
I hope (for the sake of AVG) that it isn't executing the code for real - that would open-up the opportunity for malicious exploitation - although we may be able to exploit it ourselves. :-)
Which leaves some form of sandbox. This should be easy enough to implement as JavaScript runs in one anyway. AVG would just need a separate instance. The real question is what does the sandbox provide for an environment and how is it allowed to interact with the rest of the world - at least we know that it allows extra requests to be made.
References - further reading
http://www.grisoft.com/ww.72
http://www.grisoft.com/ww.faq.num-1066#faq_1066
http://www.grisoft.com/ww.faq.num-1188#faq_1188
Disclaimer
All this is pure speculation, but it almost makes me want to sign-up to see what it does.
All for now. Comments/thoughts via usual channels
Link Visualisation
Thoughts on a link visualisation tool
I have been having a thought - something to do with visualising the relationships between sites/blogs/posts/pages.
Clearly others have gone before me, I rather like:
http://www.touchgraph.com/TGGoogleBrowser.html, http://www.aharef.info/static/htmlgraph/ and http://home.snafu.de/tilman/xenulink.html for various reasons.
None of these quite do the job that I need - so if I'm going to create something myself I need some:
- network visualisation, including some de-cluttering algorithms
- site indexer (perhaps using web analytics data)
- source of link information for links going in the other direction
To be fancy this could all be done in 3D, but I'm not sure it would be any more useful than something in 2D.
And then I'll become fabulously rich.
Web Analytics Reading List
Web Analytics - blogs and books reading list
Blogs
This is a quick list of worthwhile blogs to help in getting and keeping up-to-date in the world of web analytics. (Checked and revised December 2009)
Occam’s Razor by Avinash Kaushik
Lies, Damned Lies... (Ian Thomas)
Multichannel Marketing Metrics with Akin
Official Google Analytics Blog
June Dershewitz on Web Analytics
And not forgetting:
Forums
WAA Web Analytics Forum (Yahoo! groups)
Books
If you want something that you can read on the train or hold in your hand then these may be of interest.
(Yes - I'll make a small affiliate fee if you buy via one of these links)
WAW - March
March 2008 - Web Analytics Wednesday - London
This is not really a review of Web Analytics Wednesday (WAW) that was held in London on Monday 31st March 2008. The revised date was to allow our special guest speaker, Mr Eric T. Peterson.
Yes, this is stupidly late, but out of completeness I still feel compelled to post. Besides, these images have been sitting on my desktop for the past weeks and I need to do something with them
Mr Peterson spoke on the subject of the 'Future of Web Analytics'. His presentation was both entertaining and insightful.
Here is the 'official' March WAW round-up post. Unfortunately there was one picture missing:

Here is Mr Wayne Byrne on the left with his eyes almost shut. Dr Alan Hall (with his eyes shut) in the middle and myself, sporting open eyes and my stupid attempt at a beard.
The beard competition that I was sort-of competing in was started by the web team of a customer of ours - they will be playing this until the end of May, but I had to quit early.
The next London WAW will be on the 20th of May (a Tuesday - designed to interact with e-metrics).
Web Analytics Lecture
After delivering a lecture for the Consumer Metrics course at the Uni of Southampton
It all went rather well. Despite the 0500h start and the nightmare journey down to Southampton that meant that I only arrived just before nine o-clock.
I spent two hours talking through a first introduction to Web Analytics - A 'Web Analytics 101' if you will.
Here is the presentation that I used - it's mostly made it through the converstion to flash in once piece.
Use here or open.
References used in the presentation :
Glossary of WA terms : http://www.sclanalytics.com/resources/glossary
Web Analytics on Wikipedia : http://en.wikipedia.org/wiki/Web_analytics (not everything is correct, but it's a reasonable read)
Web Analytics the Nokia Way : http://tinyurl.com/224szr (a guide to the use of KPI's within a large organization)
Web Analytics Princess : http://www.marianina.com (a blog, not just WA, but many insightful things.)
Avinash Kaushik : http://www.kaushik.net (another blog from the respected WA evangelist.)
Back To School
Preparation for delivering a lecture for the Consumer Metrics course at the Uni of Southampton
It looks like I'm going back to school, except this time I'll be the chap at the front of the room waving his hands around.
I have been asked to present the Web Analytics section of the 'Consumer Metrics' module that is part of a couple of the University of Southampton's school of Management MSc programmes.
More information, links, comments and stuff will follow - but at some point I need to settle down and put the slides together for the session.
The department is launching a blog :
http://thirstforknowledge.wordpress.com
Also, references :
http://www.management.soton.ac.uk/StudyOpportunities/pg-ft/marketing-analytics.php
http://www.soton.ac.uk/postgraduate/pgstudy/programmes/2007/management/msc_marketing_man.html
August WAW Review
August 2007 - Web Analytics Wednesday - London
The August Web Analytics Wednesday in London seemed to be a success - although we don't have all the feedback yet to make objective measurements.
I had missed the July session, having been in Iceland /travel/reykjavik - so this was my first time at the venue (A big thank you to the Crown and Anchor - who provided us with our own bar. Fools!)
I have been asked to publish the presentation that I used for the pre-networking session, while there isn't a lot of context on the slides, it may give you a little flavour of what happened.
You should be able to click through the slides below :
Use here or open.
We didn't manage to cover all of the points, but here was the gist of the discussion :
- Not everyone agreed that 'mobile content' / 'mobile sites' were worth doing at all.
- Effectively measuring mobile sites is non-trivial, although it should be possible to get something of use (even if it's not 100 percent good (not that anything is)
- Some people are waiting on standards support from operators and manufacturers before attempting anything.
- I figure (maybe someone agrees) that we may need to remember what the web was like 10 - 15 years ago and just get on with it and code defensively around lack of standards / support.
- There is a greater requirment to support the mobile multi-channel mix, but having %somewhere% for an online 'campaign'/message to go back to would be a good idea.
Also - BlackBerry quirk
I have an interesting trick (noooo, not %that% one, the other one!) If you want to track the network that a mobile device belongs to then you can simply use the IP information and look it up in a sensible GeoIP database... BUT if you try to do this with enterprise BlackBerrys then it will tell you the organisation that they are attached to (useful in it's own right, but still doesn't tell you the network). So, IF you get in touch with me (email address on /bob ) then maybe I'll tell how you can add the network operator for the BlackBerry into the mix.
References :
November WAW : http://www.sclanalytics.com/resources/events/waw_november2007
The WAA : http://www.webanalyticsassociation.org/
Tags and Logs
Logfiles can be your friend
Sorry it's been a while - it's been a crazy couple of months.

I had started this entry whilst I was working in Reykjavik, I think I'm now allowed to let you know (#1) that I was spending a week with the lovely people at Landsbankinn. (#2)
While I was there we had a problem that we have addressed a number of times in slightly different ways.... "If I have a website with regular pages as well as 'resources' (pdf files, spreadsheets, whatever) how do I track the usage of these if I am using page tags for my data collection?"
Throughout this I will refer, interchangeably to resources, files or downloads.
There are two solutions that I can think of right now :
1. Track the links leading to the file in question.
2. Identify the downloads based on web (or proxy) log files.
Now, tracking the links does work. You can do this the hard way (by hand) or you can use a pagetag that auto-instruments the links in question (like ours). The problem that I see with this approach is that resources like PDF files are highly rated by search engines (#3) and some visitors are going to land directly on your site on a resource, without the chance to trigger a pagetag. This sucks, especially if you're in an SEO mood.
The log-based method works fine, but there is a problem. If you *just* use logs (does anyone still do that?) then you miss-out on all the benefits that tagging gets you.
So, there is another way (two actually) (I wouldn't be writing this otherwise).
Real solution number 1. You can use Unica NetTracker or NetInsight in its hybrid mode, where it manages tags and logs, but that can sometimes be more trouble than it's worth, it's very easy to end-up double counting requests for regular pages.
Real solution number 2.
- Identify a parameter that can *only* be obtained from a page-tag based request. I extracted our 'lc' parameter.
- Identify a parameter that can *only* be obtained on the 'resources' (files, downloads, whatever) that you need to extract from just the logs. I extracted, using a regular expression, from the page, anything that ended-with .pdf and another untagable page (could be an RSS feed, anything really)
- Create a third (are you keeping up?) parameter that joins these two together (what we call a meta-parameter)
- Finally, specify a filter, so that only when this third parameter has a value, do we bother loading the line into the database.
This may all seem like a lot of work and the product really ought to do some of these things automatically, but it isn't too much effort, is nicely maintainable and produces a lovely clean profile containing mostly pagetag-based information, with some extra requests from things that just can't be tagged.
References :
#1 http://www.sclanalytics.com/resources/news/landsbanki-announcement
#3 http://www.google.com/search?q=landsbanki.com (to see some PDF results in a search)
#4 http://members.mrtc.com/anvk/fielddaycart04/fielddaycart04.html (for the picture of the man with the beard and the log and the tags)
May WAW Review
May 2007 - Web Analytics Wednesday - London
And so another Web Analytics Wednesday passes. This is the second such even that I have assisted with and I think that it was even more successful than the first.
The most obvious change was the new venue - from some deep-underground basement bar, that anybody could wander through we have moved to the rather more up-market Royale room in the RubyBlue bar located off Leicester Square. Plenty of light and even some fresh air from the balcony overlooking the square itself.
The dedicated room made it much easier to mingle, as there was much less of a chance of wandering up to somebody at random and launch into some conversation about long tails before realising that they'd just come in for a drink.
The 'Networking' aspect was also helped by the lovely name badges that we managed to hand-out to just about everyone ... no more guessing that you already know someone and really ought to recognise them by now.
The main session (1800h onwards) was prefaced by an open discussion about the use of Web Analytics tools for SEO tasks (part led by myself and part by m'colleague Matt). This was the first time that either of us had done anything like this and I think we have learned the following lessons :
- Make sure that people are expecting to contribute with an opinion or questions.
- With the above point in mind, pre-announce the full agenda.
- Less, but better (perhaps more inflammatory) points. :-)
- Make sure that everyone can hear (duh!) and that there is a real 'circle' effect in the seating.
- Try and avoid a focus in the circle (although with a projector this can be difficult)
- ... any other ideas?
This time we (the remaining SCL mob decanted ourselves into a nearby restaurant where we mostly had really manky salmon fishcakes. I wish I knew the name of the place so I could suggest that you avoid it.
This time we all made it home without drunkenly disgracing ourselves.
Next event : http://www.sclanalytics.com/resources/events/waw_july2007
Pretty Dashboards
How to make pretty AND useful dashboards in NetTracker and NetInsight
One of the things that I do at work is support the Unica NetTracker and Affinium NetInsight products. In the course of my work I sometimes find nice things that it would be good to share with a wider audience.
The products have had two sorts of dashboards for some time - the pretty graphical dashboard with nothing but pictures and the informative but ugly 'Executive' dashboard with nothing but numbers.


Wouldn't it be nice to combine the two? (can you tell where I'm going with this yet?)
Disclaimer :
While I have tested this myself in a few different environments and it all seems okay neither I, my employer (SCL) or Unica can be held responsible for any loss or damage of you trying-out any of this. This article is written by me (Bob Mitchell) and is not produced or endorsed in any way by either my employer (SCL) or Unica.
These instructions apply to version 7.1 of NetTracker and NetInsight, I would imagine that there will be a neater, gui-driven way of doing this in future versions of the product.
(edit: possible in NetInsight 7.4 and better)
1. Create a graphical dashboard containing the graphical elements that you want. Save it as a custom report.
2. Take a look at the reportxxx.xml file (where xxx is the number of the report) in inst_dir/data/profilename.

3. Now take a look at the file 'execdash.xml' (It's for the Executive Dashboard - the one with all the numbers). Look familiar? (You should notice that the 'section' is of type 'executive', but otherwise it looks a bit like the graphical dashboard.
4. Transplant a section from execdash.xml into your reportxxx.xml

5. Force NetTracker to regenerate the report - perhaps just click on a single day and it will regenerate the report from scratch.
6. Observe the results :

Now, I think you'll agree that this both looks nice, while also presenting 'real' numbers.
Further options :
1. Change the 'link' attribute - this will alter, or prevent the report you get when you click on it to drill-down.
2. Change the 'label' attribute to rename an item
Postscript
Following the release of NetInsight 7.4 all this is now built-in to the GUI (but there still isn't a GUI for it in NetTracker).
Past Items
- Linux Ramdisks
- pwned
- onblur textbox
- SGD Cleartype
- Igloo
- School Privacy
- Hyderabad Hotel Pano
- My Three Boys
- PPC Networking
- Web Anlytics Wednesday
- Sea King
- Asa in the press
- Where is Bob
- First Light (3)
- Sets
- Web Analytics Wednesday

