The other day I turned on referer logging on my Apache instance. Almost immediately, I noticed Google searches that had led to my site. I though to myself, "wouldn't it be cool if I could display a sidebar box on my site containing the last n Googles that had led to my site?" "Yes", myself replied.
gemcast, the weblogging software I wrote for this site, has a feature that looks for '.box' files in its content directories. When it finds a '.box' file, it creates a sidebar box for the page it's building. Simple. So, about a half-hour later I had a shell script that generates a list of the last ten Google searches, along with the content that makes the output good for a .box entry. Then I created a cron job run this script every 15 minutes and send its output to my root gemcast content directory, so it would display on the "top" page.
Here is the script:
1 echo "10 Recent Googles Leading to samoht.com"
2 grep '\.google.*search' /var/apache/logs/referer_log |
3 awk {'print $2'} |
4 sed 's/http.*q=/<li>/;s/%22/"/g;s/\&.*$//' |
5 uniq |
6 tail -10
7 echo "Generated on `date`"
For those of you who aren't Unix script hackers, a line-by-line explanation is in order:
- Print the title of the box.
- Search the apache referer log for lines containing Google search URLs.
- Use 'awk' to extract the 2nd field, which is the URL. Given that I'm just using awk to extract a field, I could have used 'cut' as well, but I'm much more familiar with awk's syntax.
- Use 'sed' to replace the first bit of the URL, up to the 'q=' query string, with HTML list item markup, replace '%22' with a '"' character, and eat off everything after the query parameter. The result is the encoded query parameter, prefixed with an '<li>' HTML element, like so: "<li>this+is+the+query".
- Strip out duplicates.
- Only show the 10 most recent queries.
- Print the time that the script was run.
Note that lines 2-6 constitute a pipeline -- each command's result is fed into the next command as input. In reality I could have used awk to the entire script. However, that would have required me to write a much more sophisticated awk script. I'd rather string together Unix commands that do a single job (or few jobs) well. To me, it makes the script more obvious, and since I know the Unix commands pretty well, I was definitely done more quickly than if I'd have to hack out and debug an awk script. Also, if I was going to do an awk script that complicated, I might as well use Ruby.
Posted: Fri Apr 25 05:29:01 -0700 2003