Friday, May 4, 2007

Indexing with Xapian and Omega

Continuing my sojourn to index my file system and having a web-based application to search, i came across Xapian. Xapian is an Open Source Search Engine Library, released under the GPL. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

If you are looking for much simplistic stuff to use Xapian, you could use Omega, an application that is built upon Xapian.

Xapian's versatility allows you to extend Omega to meet your needs as they grow. You can download all of them here.

The installation is very easy and documented here. A complete tutorial on using Omega is available here. The only thing you need to worry about is the space available in your HDD to store the indexes.

The relevant searching is good and the default cgi application of omega is effieient enough for a basic search appliance. Make sure that your files to be indexed are in the http path ( use virtual hosts) so that you can follow the search results when you click on them







Powered by ScribeFire.

Thursday, April 26, 2007

File system indexing

I was looking for ways to index an entire array of websites to create a repository. the idea behind is that people can download and dump their favourite websites like tldp.orf ( The Linux Documentation Project), wikipedia etc., and create an index out of them that facilitates searching among these repositories.

wget is a small linux utility that allows you to recursively dump websites into your hard drive. For the indexing purpose, we can use Zebra which is a high-performance, general-purpose structured text indexing and retrieval engine.
Zebra also supports large databases (more than ten gigabytes of data, tens of millions of records). you can download Zebra at http://ftp.indexdata.dk/pub/zebra/idzebra-2.0.12.tar.gz and install the same after installing dependencies like yaz etc.,

Zebra documenattion is available at http://www.indexdata.dk/zebra/doc/ read specifically the sections on Administering zebra. you can do different types of indexing ( see zebra.cfg options). For indexing your dumped websites, you should use the indexing with File Record IDs which will also support incremental updates to your repository.

You can access data stored in Zebra using a variety of
Index Data tools (eg. YAZ and PHP/YAZ) as well as commercial and freeware Z39.50 clients and toolkits.






Powered by ScribeFire.

Wednesday, April 25, 2007

khana khazana review

i dined yesterday at the khana khazana restaurant at kilpauk with my cousins. its a fairly good restarant. the ambient atmosphere was good and it was all manned by people who were looking typically chinese. they could actually have been north-easteners. anyways the food was good, especially the mughlai and handi murg were mouth-watering. plus the menu had loads of chinese stuff.

i recommed this restaurant to people who like mughlai and chinese and also have lots of time and money to spare





Powered by ScribeFire.

Tuesday, April 24, 2007

scribefire use

i used the scribe fire to post this message..

a cool tool to manage my blog..it has lots of options that allows me to post entries in multiple blogs..u can also sort your blogs by category and stuff..also there are tool sfor bookmarking and editing...plus integration into technocrati, del.icio.us etc.,

u can also store your posta as note for continuation afterwards..

its available as a firefox plugin at http://addons.mozilla.org/en-US/firefox/addon/1730





Powered by ScribeFire.

Wednesday, April 18, 2007

Melvisharam community

we have forged a community for the GenNext of our town

This group is for melvisharam, by melvisharam, to melvisharam.

The whole intent of this grouping is to be enablersof change and
help new generation of students realize their potential and climb the
ladder of knowledge and success.

Dream about the world. right in your town!









Google Groups

Subscribe to Melvisharam (e)lite Community

Email:


Visit this group