Refer Madness: An Early History of Spamdexing

//drivel.com/jlick/cgi-bin/follow Refer Madness: An Early History of Spamdexing Refer Madness: An Early History of Spamdexing By James Lick

Sometime on or before September 5, 1995, I discovered what was later to be called Spamdexing. this to document some of the early history of Spamdexing. I believe I was either the first one to discover or the first one to implement Spamdexing. I cannot prove so conclusively, but have provided log files and third party historical documentation to support the claim. (This is an expanded and updated version of an earlier account of my discovery in an old version of my home page.)

Once upon a time I was looking through the logs for my home page and I noticed that some people were getting to my page due to searches in the various web search engines. A lot of them had to do with certain sexual subjects, mostly because of my last name and some other amusing false hits or double meanings.

Unsatisfied with the limitations of static html pages, I had come up with a cgi-bin style filter called 'pagefilt' which would interpret the server side file and send back a customized version to each client. Initially this was done so that I could do in-page visit counting (or page views). Later more features were added such as being able to send different HTML code depending on if Mosaic or Netscape 1.0 or Netscape 1.1 was being used because their feature sets were different. (No, those version numbers are not typos.) You can think of pagefilt as a very crude precursor to PHP.

On July 31 I added a feature to add per-page logging in addition to per-page visit counting. By this time my home page had received a whopping 679 visits since visit counting started, and was usually getting around 30 visits a day. One of the variables that the CGI interface supplies is one which includes the page the visitor is coming from to reach your page. At the time, the major web servers did not otherwise make this field visible and it did not show in the server logs.

If the user came to your page from a search engine, the referral URL would also include the search string the visitor used to look up your site. This URL was a bit difficult to read but with a bit of work with sed it was easy enough to pull out most of the search strings. This is when I discovered that some people were getting to my page because it just happened to have the search terms they were looking for, but in different parts of the page and in different contexts.

Once this was noticed, I decided it might be fun to try to attract even more users. Thus the list of naughty words born. It started out with the words already used to get to my home page, plus some obvious ones. Some friends and I sat down one evening and started perusing reference materials and searching out and thinking up some good naughty words to put in the attract people. Our new naughty words list grew to about 700 words this way.

This tactic proved to be overwhelmingly successful. On September 5, 1995 at 10:28pm Pacific, the WebCrawler bot came to visit my new page as visit number 1,609. and I was typically getting around 30 per day. On September 9, 1995 my home page topped 120 visits which was a record at the time. The following day was when the floodgates were unleashed and I suddenly had 2,920 visits in one day. The number of visits subsequently was regularly getting 3,000-5,000 visits each day.

At this point I did another clever/stupid thing by taking the new query strings and adding them to the original list of naughty words. This was done by hand initially, and later I set up an automatic process to do it. After a while this grew to an unmanageable 4.8 megabytes of search terms so I scaled things back and replaced the original naughty words list with two lists, one with any complete search query which had been used at least 200 times and the other with individual words which had been used at least 50 times. This effectively set up a feedback loop.

In October 1995 I moved my page to the pleasure.com domain. This is a domain I registered long ago and I figured that if the InterNIC was going to start charging me $50 a year to keep the domain name, I might as well start using it (it was quite a few years before $10/year registrations became possible). And besides, it seemed all the more appropriate taking into account the type of traffic this page was attracting.

The WebCrawler bot again returned to my web page on November 11, 1995 at 7:03pm Pacific and at this point it clocked in as visit number 251,625. My home page was getting around 4,000-5,000 visits per day with a peak of 5542 visits in one day. By November 15, 1995 the visit count again started rising. Soon I was regularly seeing 25,000 visits per day and the highest visit count for 1995 was 28,228 visits in one day.

interesting to me are the misspellings of the various forms of masturbation, masturbate, and masturbating. They are spelled correctly only about 6,200 times, and incorrectly over 10,000 times! Also interesting to note is who was popular on the web at the time, such as Alyssa Milano who was the most common person in the search strings.

All this was giving one of my friends a lot of ideas, so he got me to let him put up some more appropriate material up under the pleasure.com domain. He put up an Asian Babe of the Month page similar to the classic page, and got a sponsor for it selling picture CD-ROMs. Things went quite well, although the amount of traffic handled by it was stunning by 1995 standards, and the server had literally destroyed two hard disks and overrun both the available bandwidth and available CPU power we were originally using. At the time I believe the server was a SparcStation 2 and our Internet link was a 128k ISDN line. The server was later upgraded to a Pentium 120mhz server running on a dedicated T1 high-speed link to the Internet. (This all seems very quaint by 2007 standards.)

Because of the load of my home page and the newer stuff on pleasure.com, it was all too much for just one server and network link, so I moved my home page back to www.tcp.com at the end of November 1995. Later in March 1996, I came up with a new domain name suited to drivel.com. It remained there for at least a decade before I started the transition to using jameslick.com. The name drivel.com came about because one of my visitors wrote me e-mail saying my page was "amusing drivel".

came up with an Asian Babe of the Week page. He wrote to me on February 4, 1996 asking for a link from my home page. With this kind of content I was only too happy to oblige. With the tremendous load handled by the original Asian Babe of the Month page, I could only wonder how long his page could last until his web server collapsed under the load.

he did find out quite quickly that he is charged based on the amount of traffic generated, and the measly 2,500 visitors I sent his way in just two days would end up costing him about $10. He realized there was no way to afford even this amount of traffic over a whole month. He my pages were attracting by this point. Anyways, this was yet more

Eventually such things came to an end. Search engines at first banned pages using such techniques and later got smarter through the use of more sophisticated algorithms to give the user what they are looking for. This combined with more competitors arriving on the scene, meant that the effect of such techniques diminished over time. In fact many web sites simply copied my search strings onto their own sites to use and remnants of this exist to this day. (Try googling on the unlikely search query '"james lick" masterbate' for examples.)

It was nonetheless a fun and interesting diversion to try on the web, and my original home page managed to get 6,019,803 visits before it was replaced. Others have over time refined techniques to what is now called Search Engine Optimization (SEO). Legitimate practitioners of SEO work to reword the content of web sites so that it will be more easily found in search engines.

the first uuencoded picture to usenet group alt.sex.bondage which resulted in the creation of the group alt.sex.pictures, which was later renamed to alt.binaries.picture.erotica and was possibly the genesis of the file sharing craze.

I used the term 'Refer Madness' to describe this technique because the term Spamdexing had not yet come about. This was a play of the fact that I got my query strings from the referral string and the classic anti-marijuana film 'Reefer Madness'.

As I stated at the beginning, I believe I was the first to discover evidence to support the claim. If nothing else, this is the earliest documented case of Spamdexing. If you have any information on earlier attempts, or better documentation of my own implementation I would be happy to add them here.

My first piece of evidence is the earlier account of what I did. The Wayback Machine was a bit unreliable in updating in the early days, but it does clearly show an April 28, 1997 version of my home page which includes

My second piece of evidence is the existence of other sites using my old search terms verbatim. You can Google the unlikely search query '"james lick" masterbate' for examples. These lists contain peculiar terms that early Spamdexers copied my efforts, and that such copying was widespread enough that hundreds of examples remain over a decade later.

You can also see a USENET post by John Cramer from May 1, 1996 referencing the fact that searching on naughty words would bring up my home page. There is a later post by Decklin Foster from September 24, 1996 which points to my home page as a reference for explanations of the HTTP_REFERER variable which I used to get search strings. Then there is a post form TheChicagoKid from October 8, 1996 which includes a list of naughty words which is clearly copied from my own list. After that time there are numerous USENET posts which use my naughty words list.

Meanwhile, the earliest USENET message I can find that explicitely references a Spamdexing technique as opposed to normal keyword indexing was posted on October 15, 1995, over a month after I claim to have first used the technique. The post by Alexander Medwedew on USENET was in an August 12, 1996 post by Kuo-Sheng Chang.

Next I provide my pagefilt logs for my home page from their inception on July 31, 1995 through the end of 1995. This is a custom tab-delimited log file format with the following fields in order: visit number, server date, word that they are genuine however I think it would be hard to fake this accurately. (All files are bzip compressed text files.)

July 1995 (1K)

August 1995 (18K)

September 1995 (1.5M)

October 1995 (1.7M)

November 1995 (3.1M)

December 1995 (12M)

I also provide a summary of the daily visits for 1995 showing the dramatic increases in early September and late November: daily-visits.txt

This snippet shows the only two entries of when the WebCrawler bot visited my home page, documenting my claim that the naughty word list was added at least by September 5, 1995: webcrawler.txt

And finally, here for your reference is the final version of the search terms and search words lists that appeared on my home page. This compilation covers visits #679 through #2,431,055, after which I stopped compiling new versions:

search-strings.txt (9.7K) All complete search queries used at least 200 times.

search-words.txt (14K) All individual words used at least 50 times.

Author: James Lick / E-mail: james.lick@gmail.com