How Burf.com works : part 1
The new search engine (beta) currently uses the MS Indexing Server (the previous used a big database I wrote but did not work too well). The indexing service acts as a webservice (.net) for my main program (url recorder).
The main program has a database or URL which it recieves in 2 ways, submitted and crawled via other pages. These urls are crawled (to find more urls) and if the page is usable then it is stored in the indexing service (i know i probably hit file system problems soon).
My spider is written in .net and basically is a small multithreaded program that takes a log of X number of urls and returns the page data which then the main program extracts the urls.
Because its a new engine, it on only has about 100,000 pages indexed but I hope to hit 1 million by the end of the week.
It currently sites on (due for a change) on a AMD 64 3000 with 1gb of ram and 160gb Sata raid drives (striped)
Last night I recieved over 60,000 urls submitted and not many searches (Sad) If this new engine seems popular then I hope to get a faster connection and machine soon.