bob-o-rama.com

The native home of anything sufficiently 'Bob'.

Tags and Logs

Logfiles can be your friend

13th July 2007 - 18:42 - bob

Sorry it's been a while - it's been a crazy couple of months.

man_with_a_log

I had started this entry whilst I was working in Reykjavik, I think I'm now allowed to let you know (#1) that I was spending a week with the lovely people at Landsbankinn. (#2)

While I was there we had a problem that we have addressed a number of times in slightly different ways.... "If I have a website with regular pages as well as 'resources' (pdf files, spreadsheets, whatever) how do I track the usage of these if I am using page tags for my data collection?"

Throughout this I will refer, interchangeably to resources, files or downloads.

There are two solutions that I can think of right now :

1. Track the links leading to the file in question.

2. Identify the downloads based on web (or proxy) log files.

Now, tracking the links does work. You can do this the hard way (by hand) or you can use a pagetag that auto-instruments the links in question (like ours). The problem that I see with this approach is that resources like PDF files are highly rated by search engines (#3) and some visitors are going to land directly on your site on a resource, without the chance to trigger a pagetag. This sucks, especially if you're in an SEO mood.

The log-based method works fine, but there is a problem. If you *just* use logs (does anyone still do that?) then you miss-out on all the benefits that tagging gets you.

So, there is another way (two actually) (I wouldn't be writing this otherwise).

Real solution number 1. You can use Unica NetTracker or NetInsight in its hybrid mode, where it manages tags and logs, but that can sometimes be more trouble than it's worth, it's very easy to end-up double counting requests for regular pages.

Real solution number 2.

  • Identify a parameter that can *only* be obtained from a page-tag based request. I extracted our 'lc' parameter.
  • Identify a parameter that can *only* be obtained on the 'resources' (files, downloads, whatever) that you need to extract from just the logs. I extracted, using a regular expression, from the page, anything that ended-with .pdf and another untagable page (could be an RSS feed, anything really)
  • Create a third (are you keeping up?) parameter that joins these two together (what we call a meta-parameter)
  • Finally, specify a filter, so that only when this third parameter has a value, do we bother loading the line into the database.

This may all seem like a lot of work and the product really ought to do some of these things automatically, but it isn't too much effort, is nicely maintainable and produces a lovely clean profile containing mostly pagetag-based information, with some extra requests from things that just can't be tagged.

References :

#1 http://www.sclanalytics.com/resources/news/landsbanki-announcement

#2 http://www.landsbanki.is/

#3 http://www.google.com/search?q=landsbanki.com (to see some PDF results in a search)

#4 http://members.mrtc.com/anvk/fielddaycart04/fielddaycart04.html (for the picture of the man with the beard and the log and the tags)