An old live search post of mine generated a comment the other day regarding the feasibility of searching all of your records on each keystroke made by the user. This is one area of the post that I admittedly gave little (or no) attention to partly because it was out of the scope of the post (the usual excuse,) and partly because I didn't have any good answers to the problem (the real excuse.)
The first thing that needs to be decided is if there really exists a problem. I say this in both the performance sense (is your live search noticeably slow,) and the usability sense (does it make sense for your 'live' search to wait until your done typing to be submitted.) I personally feel that there should be a very short delay between the time the user starts typing and the time they see some results. Otherwise, we would just stick a submit button on there and be done with it. However, there is a point to be made in that each keystroke could potentially fire off a rather expensive http request, which in turn will search through a potentially large number of records and return a potentially large response to the user.
I think its safe to say that we have a problem here. We want results, we don't want to wait for them, and we want to include a lot of records in the search. I'm sorry to say, but something has to give. On this puny little blog, its no problem for my live search to perform an sql 'like' search of all my blog posts and render the results on each keystroke with a very short delay. For some people (and possibly for me in the future,) this will not be possible.
There is actually an interesting phenomenon going on here. If the response is not finished being prepared until after the next request comes in, then we are placing an extreme load on our server. By increasing the delay, we could potentially see no performance improvement from the user's perspective, but considerably reduce the strain on the server. It seems that there would be some kind of 'sweet spot' here for the value of this delay. The calculation of this value may be harder to come up with than the realization of its existence, but I think the following would need to be considered:
- Average number of simultaneous requests on your web server
- The number of total records to be searched
- The average length of each record
- The performance of the search algorithm (which may be hard to quantify for our purposes here)
- Network traffic on the internet
Any number that could be calculated would certainly be based on heuristics, since things like network traffic on the internet are extremely hard to model.
After all this babbling, I have to wonder if any of this is even worth the effort. If your live search is too slow, you probably need to re-think what your search is doing. Maybe you need to index your entries with search keywords. Maybe you just have too many entries for a live search. I guess I don't have a whole lot of motivation here since at the current time my blog consists of only 40 entries. However, if the concept of 'live search' really catches on, then I believe research on this topic will be inevitable as clients like The New York Times want to add a live search to their home page (available only to the registered users of their site ofcourse.)