Menu

[Solved]Java Programing Project Design Simple Yet Effective Document Retrieval System Respond Sing Q37079948

java programing !!!

In this project, you will design very simple yet effectivedocument retrieval system that will respond single word queries.This system will allow users to enter single word, then the systemwill return list of documents containing this word.

Consider a senario where there are approximetly 10,000 textfiles, and you would like to develop a program that will enableusers to search for specific keyword among all documents. Users areinterested in documents containing this keyword.

Simple and first attempt to develop a solution to this problemwould be to construct a program that first read keyword from userand then program will scan all files and list the files containingthis keyword. This approach would be the most expensive method interms of effciency and time.

Instead more efficinet method document retrieval based on user’ssingle word query is achieved by a technique called indexing.Document indexing in its simplest form refers to a means oforganizing and storing documents for later retrieval based on wordsthey contain.

Task: Design and implement a system that will index thedocuments by their content words considering linked list as a datastructure.
While reading text documents:

Get all words, where a word is a string of alpha charactersterminated by a non-alpha character (white space is not alpha). Thealpha characters are defined to be [a-z]. Therefore, the sequenceof characters for the word “apple+78&’^+orange” would be‘apple’ and ‘orange’.

Lowercase all words,

Filter out all the words that are in the stop words list, suchas ‘a’, ‘an’, ‘the’. (“stop words” usually refers to the mostcommon words in a language ). Why do you think this filtering isdone? (Read:https://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html) List of stop words are provided on webonline.

Limitations and Assumptions

1. The collection of documents is closed (content and number ofdocuments are fixed will never change).

2. Each document stored in a single text file. Hence if thereare 10,000 documents, then there are 10,000 text files.  (Collection of documents is provided on webonline)

Figure 1 and Figure 2 summarizes the aim of this project.

Text Document I Pease porridge hot, pease porridge cold, 2 Pease porridge in the pot, Nine days old. Some like it hot, some l

Figure 1

Docunent Text Pease porridge hot, pease porridge cold. Pease porridge in the pot Nine days old. Some like it hot, some like i

Figure 2: Documents are Indexed by their word contents.

Since we may not easly estimate the number of words among alldocumet, one possible solution idea would be to use linked lists tomaintain list of words. Also for each word a list of files needs tobe kept, the number of documents(or files) may not be estimatedagain, again the use of linked list is suggested.

Head days hot like porridge cold 1 3 1 4 1 4 6 4. 2

stopwords File

aaboutaboveafteragainagainstallamanandanyarearen’tasatbebecausebeenbeforebeingbelowbetweenbothbutbycan’tcannotcouldcouldn’tdiddidn’tdodoesdoesn’tdoingdon’tdownduringeachfewforfromfurtherhadhadn’thashasn’thavehaven’thavinghehe’dhe’llhe’sherherehere’shersherselfhimhimselfhishowhow’sii’di’lli’mi’veifinintoisisn’titit’sitsitselflet’smemoremostmustn’tmymyselfnonornotofoffononceonlyorotheroughtourours ourselvesoutoverownsameshan’tsheshe’dshe’llshe’sshouldshouldn’tsosomesuchthanthatthat’sthetheirtheirsthemthemselvesthentherethere’sthesetheythey’dthey’llthey’rethey’vethisthosethroughtotoounderuntilupverywaswasn’twewe’dwe’llwe’rewe’vewereweren’twhatwhat’swhenwhen’swherewhere’swhichwhilewhowho’swhomwhywhy’swithwon’twouldwouldn’tyouyou’dyou’llyou’reyou’veyouryoursyourselfyourselves

_________________

-Docs-

250 600 bas zed 201 196 112 002 199 201觬鬻013 197 69 44 35 42 62 57 75 19 62 94 60 53 48 49 996 200 013 201 000 $12 15 197 $20

Text Document I Pease porridge hot, pease porridge cold, 2 Pease porridge in the pot, Nine days old. Some like it hot, some like it cold, 4 5Some like it in the pot, Nine days old. Example text; each line is one document. Docunent Text Pease porridge hot, pease porridge cold. Pease porridge in the pot Nine days old. Some like it hot, some like it cold Some like i in the pot, Nine days old. Term Documents days 3, 6 like mne old pease porridge 3, 6 3,6 2. 5 some Head days hot like porridge cold 1 3 1 4 1 4 6 4. 2 250 600 bas zed 201 196 112 002 199 201觬鬻013 197 69 44 35 42 62 57 75 19 62 94 60 53 48 49 996 200 013 201 000 $12 15 197 $20 EST and 996 193 90 9 92 9 04. 9 06. 07, 95 95 95 95 95 95 95 95 Show transcribed image text Text Document I Pease porridge hot, pease porridge cold, 2 Pease porridge in the pot, Nine days old. Some like it hot, some like it cold, 4 5Some like it in the pot, Nine days old. Example text; each line is one document.
Docunent Text Pease porridge hot, pease porridge cold. Pease porridge in the pot Nine days old. Some like it hot, some like it cold Some like i in the pot, Nine days old. Term Documents days 3, 6 like mne old pease porridge 3, 6 3,6 2. 5 some
Head days hot like porridge cold 1 3 1 4 1 4 6 4. 2
250 600 bas zed 201 196 112 002 199 201觬鬻013 197 69 44 35 42 62 57 75 19 62 94 60 53 48 49 996 200 013 201 000 $12 15 197 $20 EST and 996 193 90 9 92 9 04. 9 06. 07, 95 95 95 95 95 95 95 95

Expert Answer


Answer to java programing !!! In this project, you will design very simple yet effective document retrieval system that will respo… . . .

OR


Leave a Reply

Your email address will not be published. Required fields are marked *