Implementation of a Mini Search Engine

The main objective of this project is to develop a mini search engine for searching through a collection of documents. This project is user-friendly and works very efficiently. It provides an overview about the working of search engine. This project has three data structures. Tables are used for indexing the documents, files are used for storing information and trees are used for searching the documents. Here, the documents are stored using a query, files and given a set of texts. The search engine will locate all the documents that contain the keywords in that query. The main purpose of this project is to gain hands-on experience using hash tables, files and trees.

Indexing

The documents are stored as files. Based on their words/tokens, we can give index to the files using hashing functions. By giving index to the files, we can retrieve the required documents very easily.

Searching

By using trees, we can search the documents. Based on the efficiency and complexity of the algorithm, we will use balanced binary search trees or AVL trees. The queries may contain simple Boolean operators. AND operator, OR operator are few Boolean operators. They will act in a similar manner with the well-known analogous logical operators. For each such query, the document that satisfies that query will be displayed. This search engine will search the documents very efficiently.

For instance, a query:

Keyword1 AND Keyword2 — should retrieve all documents that contain both these keywords (elements).

Keyword1 OR Keyword2 — instead will retrieve documents that contain one of the two keywords.