Building a Xapian index of Wikipedia in less time than this talk takes
|Time:||13:30 - 14:15|
|Day:||Thursday 21 January 2010|
|Location:||Renouf 1 (MFC)|
|Project:||The Xapian Search Engine Library|
Xapian is a fast, flexible, and scalable search engine library, currently used by Debian, Gmane, One Laptop per Child, and many other projects. It's written in C++ with bindings for C#, Java, Perl, PHP, Python, Ruby, and Tcl. Amongst its many features are Unicode support, spelling correction, probabilistic ranking, and a full set of boolean operators.
Wikipedia is a Free Encyclopedia with about three million English articles.
In this presentation, you'll see how to use a number of Xapian's many features, plus a few tricks of the trade and some low cunning, to index more than 1000 articles per second and so build a working free text search for the English version of Wikipedia in just 45 minutes.
Olly is the lead developer of the Xapian search engine library. He's spent 13 years working in the field of information retrieval, including running the EuroFerret website, which was the most comprehensive index of European web pages in its day.
He's been working on Xapian for 10 years, and makes a living as a freelance developer and consultant on Xapian-related projects.
Olly is originally from the UK where he studied mathematics and computer science at Cambridge University, but now lives near Wellington, New Zealand. He once broke a toe falling off a cliff in Majorca.