- You are here : Home >
- About Urchin 5 Software >
- Urchin Tips
Tips and Tricks to Help Optimise Your Urchin 5 Software Installation
Below are some useful Urchin 5 tips for getting the most out of your Urchin software installation. In order to implement any of these techniques, the software should be installed and working on your server and you should have a reasonable understanding of server administration as well as how web analytics works.
All Urchin Software tips have been tested on our own servers, however please note GA-Experts.com accepts no responsibility for any issues arising from the use of our advice below.
Contents:
- Freeserve Correction
- Reporting on Pay-Per-Click campaigns - without the CTM
- How many people add my site to Favourites?
- Filters - some regex tips
- Auto email Urchin reports to clients
- Log file rotation - How to
- CTM correction for Google UK (cost data import)
- Reporting on internal search
- Recommended logformat for Apache
1. Freeserve Correction
Reported to Urchin [USCT-15041]
Freeserve uses a combination of variables in its query_stem that include qt, p and q. This confuses 'Referral Keywords Match'. Within Urchin reports, under Page Query Terms you will see the term 'b' or sometimes '_searchbox' for searches that have originated from Freeserve.
Apply the following Report Filter to correct for this:

In English, this reads:
'if &q= exists in the query_stem, replace any previous value of Page Query Terms with this one'.
Back to top2. Reporting on Pay-Per-Click (PPC) campaigns - without the CTM
Using the base license of Urchin, you can gain an insight into the performance of PPC terms from different providers, without using the campaign tracking module. The set-up involves 'tagging' your links (also known as tracking urls) at Google Adwords, Overture, Mirago, Espotting etc.
e.g. In your adverts at your PPC provider, use:
http://www.omegadm.co.uk/?googleads=term1+term2+...
http://www.omegadm.co.uk/?overtureuk=term1+term2+...
http://www.omegadm.co.uk/?espotting=term1+term2+...
Where term1, term2 etc., are the words in the phrase you are paying for. The portion that reads 'googleads=' is known as the Page Query Term and is automatically reported on within Urchin under Referrals.
Using the above technique in your PPC set-up will produce the following Page Query Terms for immediate comparison:

Then click on the small blue arrow next to each PPC source to view the individual query terms that came from that PPC source:

Note, if you use different landing pages for different keyword combinations, you will have a list of page query terms for each landing page.
The above technique is a simple method for comparing your PPC campaigns. For a more detailed PPC analysis, purchase the Urchin Campaign Tracking Module or consider setting up a Google Analytics account.
Back to top3. How many people add my site to Favourites
In MS Internet Explorer v5 and above, when a visitor adds your site to their favourites, IE requests a small icon file. Tracking this download allows you to monitor this.
As Admin, within Reporting, append ico the following to the 'Downloads Match' field e.g.
pdf,zip,exe,sh,tar,gz,dmg,pkg,doc,xls,ppt,ico
Firefox Update (March 2005):
Firefox is gaining momentum as new alternative to MS IE. Although Firefox usage is still low (~5% of all visitors), an issue has come to our attention in the way it requests the favicon.ico file. Firefox will load the favicon.ico file for every page viewed (subject to cache settings), even if not bookmarked. As you might guess, this can have a dramatic effect on the perceived number of people adding your site to favourites. It is thought other Mozilla type browsers (e.g. Netscape) behave in a similar way.
To differentiate IE from Firefox requests, you can make the following change in your apache.conf file (a similar hack can be used for IIS):
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} !.+MSIE.+
RewriteRule ^/favicon.ico$ /favicon2.ico
In English this reads, "if the browser is not MS IE and requests the favicon.ico file, change the request to favicon2.ico".
Of course you will need to upload the favicon2.ico file into your web root. Note, this does not tell you how many Firefox users (or other Mozilla type browsers) are adding your site to their favourites, but it will keep the MS IE figure accurate.
Back to top4. Filters - some regex tips
The following pdf file (1 page) is an excellent overview of using regular expressions when creating filters.
Back to top5. Auto email Urchin reports to clients
Tested on Linux, this perl script is a customisation of the original Urchin-supplied u5data_extractor.pl script. It is set to run on the 1st of each month emailing a custom defined report to you and the client. In this example, Search Engine visitors.
A cron job is set as follows:
/usr/local/urchin/util/u5data_extractor.pl --profile <> |
mutt -x -s "Last month's SE visitors" -c <<your@email.addr>> <<client@email.addr>>
where <<client profile>> is the profile name you set in the Urchin Admin console.
Rename the existing u5data_extractor.pl script and install this modified one u5data_extractor.txt (save and rename to .pl). Ensure the permissions are set correctly to execute this:
chmod ugo+x u5data_extractor.pl
The file is fully commented and has two main differences from the original:
- Start and end time period is set to the 1st and last day of the previous month
- Table values are summed to give a report total - similar to web interface.
6. Log file rotation - How to
A "Howto" document for web server log file rotation with compression - Unix and Windows examples. Download as a log file rotation PDF.
Why Rotate?
Urchin stores aggregated logfile information in its own database enabling the end user to build 'real-time' visitor reports. With is own 'Log Tracker' keeping track of how far into your server logfiles it has processed, you could say Logfile rotation can be ignored. However this isn't recommend for the following reasons:
- Disk Space - logfiles in use are uncompressed plain text and can consume large amounts of space. Typically compression will reduce file sizes by 20:1, so it make sense to do this. However, a web server logfile can not be compressed while in use (locked), so it first must be rotated out of use.
- Periodically, you will need to check logfiles for warnings, error messages (scripts not working), Search Engine detection etc, and this is best done using manageable file sizes.
- Opening, closing and manipulating data for very large file sizes consumes system resources and will therefore slow down your server. It is much more efficient from both a system and application standpoint to manage several smaller logs than one very large log.
- Smaller files are much easier to back up and restore in the event of system failure.
Logfile rotation is achieved quite simply on unix machines using crontab and logrotate. We describe a separate method below for Windows.
System
This example was developed and tested on:
Unix: RedHat 6.x and 7.1/2/3 using Apache v1.3.9-29.
Windows: NT4/SP6, Windows 2000 SP1.
Schematic Unix Example
- Each night (or any set time period), Urchin runs on the selected log file*
- Urchin Log Tracker keeps an internal marker as to how far into the file it has analysed.
- Each month (or any set time period), rotate and compress logfiles
- Keep rotated logfiles for 12 months (or any number) - in case you need to re-analyse!
If you wish to rotate/compress files each and every time Urchin is scheduled to run, you can do this within Urchins' Log Manager. However, this can create a large number of logfiles, especially if you have multiple logfiles/hosts.
Our logfile rotation results in the following monthly files being created:
- httpd_log.1.gz, httpd_log.2.gz, httpd_log.3.gz etc...
In this example, apache is using the 'combined' log format described in httpd.conf e.g.
# The following directives define some format nicknames for use
# with a CustomLog directive (see below).
#
LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\"" combined
...
CustomLog /home/httpd/logs_dir/httpd_log combined
Note the Logformat directive maybe slightly different than what you see in some installations of apache. Ours is recommended as it allows you to simply use the default Urchin format: 'Log Format = auto'.
Unix Method
Each minute, the system crontab checks what jobs require scheduling. Scheduling is set in the /etc/crontab file.
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# column headings - thanks Toby
# mins, hr, date, month, day, command
# run-parts
# Min Hr Date Month Day Owner Command File
01 * * * * root run-parts /etc/cron.hourly
02 1 * * * root run-parts /etc/cron.daily
50 23 * * 0 root run-parts /etc/cron.weekly
01 00 1 * * root run-parts /etc/cron.monthly
and what jobs are to be run is described in for example /etc/cron.monthly. In the above example, the directory /etc/cron.monthly is checked at 00:01 on the first day of every month.
My /etc/cron.monthly directory contains a file logrotate, contents of which are:
#!/bin/sh
/usr/sbin/logrotate /etc/logrotate.conf -f
The first line is required and simply informs the operating system to use the system shell to run the next line (command). Line 2 does the rotation, using the program located at /usr/sbin/logrotate and the configuration file /etc/logrotate.conf . My logrotate.conf contains:
# system-specific logs may be configured here
#########################################################
# #
# MONTHLY rotations #
# #
#########################################################
# rotate apache log files:
/home/httpd/logs/*_log {
ifempty
copytruncate
rotate 12
monthly
compress
}
Note the first part of this file (up to # system-specific logs may be configured here) are default parameters and are ignored here for clarity. Below this comment, parameters over-ride the defaults.
[One caveat of the default parameters is:
# send errors to root
errors your@emailaddress
This does not work as etc/crontab has: MAILTO=root which over-rides any set in logrotate.conf.]
The part that does the rotating/compressing (you can even even e-mail the rotated file), follows the comment:
# system-specific logs may be configured here
/home/httpd/logs/*_log {
defines which files are to be rotated and must end in a sing closing brace '}'.
ifempty
defines that rotation will continue even if the file is empty.
copytruncate
defines a copy of the logfile is created first and then its contents are removed (instead of simply creating a new one). This is required by apache as it can not be told to close a logfile (release) without stopping the service. By this method the apache server does not have to be restarted.
rotate 12
Over-rides the default (4) by keeping 12 previous files.
monthly
Over-rides the default (weekly) by performing rotations monthly.
compress
Compress the logfile usually by as much as 20:1. Auto-compression is probably logrotates' most powerful feature - something Window's struggles with! See below.
Read man logrotate for details concerning what other options may be useful to you.
[A caveat from the man logrotate page is that it appears to indicate the order of the commands is un-important. For instance, viewing the /var/log/news/* example nocompress appears after endscript. However changing this to compress will not work. It must come above postrotate. i.e. nocompress is actually ignored, but occurs as the default action.]
Windows Schematic Method
This is very similar to the Unix method above. Setup is much easier than for unix (simply select a radio button in the IIS control panel). However there is little flexibility with this and compression is more complicated to achieve. The difference between Windows and unix Schematics are:
- Windows log rotation is system wide - all virtual websites on the same server must therefore have the same log rotation settings (for unix, this can be controlled on a per website basis).
- Windows rotation time periods are set to midnight only, on a daily, weekly or monthly basis (for unix, any time period can be specified).
- Weekly rotation takes place on Sunday (not Monday), the first day of the week in the US.
- For compression, additional software is required e.g. winzip, and a separate batch (script) file to run the compression with command line parameters.
Windows Method
The following discusses how you can add compression functionality and assumes you have already selected the required logfile rotation frequency in the IIS control panel and have installed winzip and the winzip commandline add-on on your Windows machine in their respective default directories.
Create a file ziplogs.bat with your text editor containing the following:
ren "c:\Inetpub\weblog\w3svc1\monthly.zip" "monthly-old.zip"
"c:\program files\winzip\wzzip" -exomT
"c:\Inetpub\weblog\w3svc1\monthly.zip" "c:\weblog\w3svc1\*.log"
The first line renames the previous zip file. The second line calls the wzzip program (actually the winzip commandline add-on) with options = exomT, and compresses all logfiles from the default IIS logfile location into monthly.zip.
[Note, *.log is used here because Windows names its log files using a unique timestamp. However, following the initial run of the script, there should in fact only be one logfile available - the last one just rotated.]
Options:
ex = Set the compression level to maximum
o = Change the zip file's date to the same as the newest file in the Zip file
m = Move files into the zip file
T = Include files older than the current date (if no date specified)
The most important option is 'T' (case sensitive). 'T' ensures only the last logfile is included in the compression, not the newly created one. Use the Windows scheduler to run your ziplogs.bat batch file at 01:00 on the day in question i.e. just after the rotation.
As you will have noticed, the Windows method only gives you two backups of your logfiles (monthly.zip and monthly-old.zip). A more advance batch file is required if you wish to keep further (numbered) copies.
Back to top7. CTM correction for Google UK (cost data import)
This correction is for the cost data import of PPC campaigns from Google UK. Because the log format of Google US is different than Google UK, a correction is required when specifying UK adwords import (this may also be applicable for other country specific adwords accounts).
Copy:
[path-to-urchin]/lib/reporting/logformats/google.lf to google-uk.lf
Edit google-uk. lf by changing the following line from:
CustomDateFormat: "%m/%d/%y"
to:
CustomDateFormat: "%d/%m/%Y"
For the full procedure to import cost data, see http://www.google.com/support/urchin45/bin/topic.py?topic=7964
Back to top8. Reporting on internal search
Having an internal Search Engine on your website to help visitors quickly find information, is a powerful way for you to understand what your visitors are looking for. Studies have shown, that no matter how good your navigation scheme is, visitors to large sites will always prefer to search than to drill-down. Analysing visitor search is therefore obviously powerful.
Such a feature is standard in Urchin software - see the Pages & Files/Page Query Terms section. However, what if you have two search options? For example, a real estate agent may have separate search boxes for users to specify property type and location:

To combine both these two variables into a single search term requires an advanced filter. Below is the example, assuming the names of the form fields above are 'area' and 'type' respectively.

Then, within the Page Query Terms report, you will see a 'combinedQuery' entry - a report of how often each combined query was used.
Note, that implementing this filter will overwrite your entire query for any request that contains the variables 'area' or 'type' in the query stem. If there is other information in your query terms that you might want to save for later analysis, you cannot accomplish this while simultaneously combining your keywords. Also, ensure that there are no other pages that use the variables 'area' or 'type', because that page's entire query will be overwritten by this filter.
Back to top9. Recommended logformat for Apache
Ensure UTM is installed on all pages and use the full NCSA logformat in httpd.conf:
LogFormat '%h %v %u %t '%r' %>s %b '%{Referer}i' '%{User-Agent}i' '%{Cookie}i'' combined
NOTE: YOU MUST REPLACE ALL SINGLE QUOTES ABOVE WITH DOUBLE QUOTES i.e.
"%{Cookie}i"
ends in two DOUBLE quotes
To order, please call +44 (0) 870 7273 566
Purchases are subject to our standard Terms & Conditions
