Welcome to Serpia's blog!

Blog Entry

Simple search engine in Python

A simple search engine in Python

I think that a search engine for the contents of a website is a great thing to improve a website. No need to browse through all the pages for something you're looking for, in stead just type a keyword and let some script do all of the work. Anyway, I wrote a simple search engine and I'll explain here how I did it. It is very basic, but easy to improve. For example, in stead of indexing the local files of this site you can use the urllib package to search any other site on the internet!

The basic things you'll need to do

  • setting up a database with all words and the occurrence of those words (I'll use a MySQL database for this example)
  • script that looks for words and their occurrences and inserts them into the database
  • html search form for the user
  • script that retrieves the user's search word from that database
  • page that displays the results with links to the relevant pages

Setting up the database

Create a table in your MySQL database and name it search. We'll need five columns:
  • search_id, INTEGER, NOT NULL, AUTO_INCREMENT, UNSIGNED
  • word, VARCHAR(50), NOT NULL
  • occurrence, INTEGER, NOT NULL, UNSIGNED
  • url, VARCHAR(200), NOT NULL
  • link, VARCHAR(200), NOT NULL

and set search_id as the PRIMARY KEY
If you feel uncomfortable setting up a MySQL database, fear not, there are lots of resources on the internet about this subject. Also check the tutorial on this website here. Be sure to put this database table on your webserver and pay attention to the security issues. More about this later...

The Python indexer

The following script crawls through the content of a page and searches for every word and the occurrence of that word, it was based on code I found here. If you are the one that wrote this code and think that I should not publish it here, please contact me.
The Python code is: After you have run this script (changing yourpage into your actual pages of course), the table you have created earlier is populated with every word and their occurrences of your pages. You should only run this script once the content of your website has changed. Probably once or twice a day, but it really depends on your individual situation.

The html search form

Very often you will find a search box at the top of a page, on the left or right side. Put this HTML code somewhere on your page: For more information on how to create HTML forms look here. The key element of this form is the cgi-bin/search.py script, the next paragraph will delve into this sript.

The actual search script

The next script, convientetly called search.py, will retrieve the input name "word" from the database:

The result page

And finally we will display the results on a page where the searched word is displayed with the number of occurrences and the link to the appropriate page.

For comments on this tutorial, please use the comment textbox below. Feedback is very welcome!

Posted on December 21, 2007
1 Comment

Comments

#1   peter, August 25, 2008 at 5:52 p.m.:

You could also use MySQL fulltext search.

Post a comment




Django!