Greg Linden points to a new paper out of Yahoo Research, talking about the challenges ahead for distributed web search [PDF]. If you are interested in the various pieces which need to scale as you build up any sort of search infrastructure, read the paper.
A couple bits which caught my eye were the mention of P2P search and doing more things on the client-side. I'm curious as to how P2P could help with the distribution of data with machines being both client and server. I would think you wouldn't be using outside, non-trusted machines for this instead creating an internal, trusted P2P network. Could this help? It would be fun to find out.
On the client-side, what about using clients for caches? I realize it would be out of touch with the main index but I would think it could be useful for some cases.
At any rate, the next few years are going to see an increase in the necessity of scaling search as it moves to additional aspects of our lives. Companies which can make themselves scale are going to be out in front of others who can't.
As an aside, one way I like gage research papers are the other papers they cite. If you were to read each paper in the bibliography, you would get a foundational introduction to all aspects of search as well as distributed systems.
Technorati Tags:
distributed, search, scability, greg+linden, distributed+search, p2p+search