[Solved]1 Consider Ranking Documents Using Binary Independence Model Bim Shown Rsva 2 Log O P Prob Q37185550

1. Consider ranking of documents using the Binary Independence Model (BIM) as shown below: RSVa-2 log where O p is the probability of a term t appearing in a document relevant to the query O (1-p is the probability of a term t not appearing in a document relevant to the query O (1-u) is the probability of a term t not appearing in a document non-relevant to the Shown in Table 1 below are documents, and occurrence of terms (melodic, musical, compo- O ut is the probability of a term t appearing in a document non-relevant to the query query sitions) in the documents. Also shown (in the last row) are relevance judgments (N-non relevant, R relevant) for a query q. Rank the documents for the query q melodic 1 1 0 000 1 1 musical 001 1 0 0 10 1 compositions 0 0 riq,d) RN NRRRR NNR Table 1: Data for ranking by BIM 2. Consider ranking of documents using the Best Match 25 (BM25) algorithm as shown below: RSVd tfi.4 + k2 ti.q tfi.ą +k2 (rt +0.5)/(R – ri0.5) (dfi -ri 0.5)/(D -R dfi r0.5) log tfidki -b)+b idi-a-1 avgidl where O ki, k2, and b are empirically-set parameters. Typical values are: ki 리2,0 k2 s1000, and b-0.75. For this problem, use k2-100. o the number of relevant documents (in the collection) in which the term i occurs O dfi is the document frequency of the teri. That is, the number of documents (in the collection) in which the term i occurs O D is the total number of documents in the collection O R is the number of documents relevant to the query O tfia is the number of times the term i occurs in document d O tfi.4 is the number of times the term i occurs in the query q O dl is document length O avg(dl) average document length, which is the average length across all documents in the collection When no relevance information is given, R- 0. This is the case for this problem. Assume D- 50,000, the term information occurs in 40,000 documents in the collection, and the term retrieval occurs in 30,000 documents in the collection. Consider a document di in which the term information occurs 40 times, and the term retrieval occurs 30 times. Assume the ratio gidli 0.9). Consider the query of document length to average document length is 0.9 (i.e., avg(dl) 4-information retrieval. Calculate the RSV for (di)pair using the BM25 algorithm. 3. Consider ranking of documents using a Language Modeling-based IR algorithm. distinct term t in q where O tr.4 is the term frequency -number of occurrences of t in query a We estimate the parameters PitMa) using Maximum Likelihood Estimate (MLE) as: f,d where O Idl is the length of document d O tfr,d is the term frequency -number of occurrences of t in document d To avoid problem with zero probabilities, we smooth the estimates. First, we define where O Mc is the collection model O cfr is the number of occurrences of t in the collection O T = Σ1cfr is the total number of tokens in the collection We use Pt Mc) to smooth PtId) using Jelinek-Mercer smoothing as Let di The woman who opened fire at YouTube headquarters in Northern California may have been a disgruntled user of the video-sharing site, dz-Authorities are investigating a website that appears to show the same woman accusing YouTube of restricting her videos, and q YouTube fire. Rank di and d with respect to q using Jelinek-Mercer smoothing. Use λ-0.5 Clearly indicate any text normalization procedures performed on documents d1 and d2. Show transcribed image text 1. Consider ranking of documents using the Binary Independence Model (BIM) as shown below: RSVa-2 log where O p is the probability of a term t appearing in a document relevant to the query O (1-p is the probability of a term t not appearing in a document relevant to the query O (1-u) is the probability of a term t not appearing in a document non-relevant to the Shown in Table 1 below are documents, and occurrence of terms (melodic, musical, compo- O ut is the probability of a term t appearing in a document non-relevant to the query query sitions) in the documents. Also shown (in the last row) are relevance judgments (N-non relevant, R relevant) for a query q. Rank the documents for the query q melodic 1 1 0 000 1 1 musical 001 1 0 0 10 1 compositions 0 0 riq,d) RN NRRRR NNR Table 1: Data for ranking by BIM
2. Consider ranking of documents using the Best Match 25 (BM25) algorithm as shown below: RSVd tfi.4 + k2 ti.q tfi.ą +k2 (rt +0.5)/(R – ri0.5) (dfi -ri 0.5)/(D -R dfi r0.5) log tfidki -b)+b idi-a-1 avgidl where O ki, k2, and b are empirically-set parameters. Typical values are: ki 리2,0 k2 s1000, and b-0.75. For this problem, use k2-100. o the number of relevant documents (in the collection) in which the term i occurs O dfi is the document frequency of the teri. That is, the number of documents (in the collection) in which the term i occurs O D is the total number of documents in the collection O R is the number of documents relevant to the query O tfia is the number of times the term i occurs in document d O tfi.4 is the number of times the term i occurs in the query q O dl is document length O avg(dl) average document length, which is the average length across all documents in the collection When no relevance information is given, R- 0. This is the case for this problem. Assume D- 50,000, the term information occurs in 40,000 documents in the collection, and the term retrieval occurs in 30,000 documents in the collection. Consider a document di in which the term information occurs 40 times, and the term retrieval occurs 30 times. Assume the ratio gidli 0.9). Consider the query of document length to average document length is 0.9 (i.e., avg(dl) 4-information retrieval. Calculate the RSV for (di)pair using the BM25 algorithm.
3. Consider ranking of documents using a Language Modeling-based IR algorithm. distinct term t in q where O tr.4 is the term frequency -number of occurrences of t in query a We estimate the parameters PitMa) using Maximum Likelihood Estimate (MLE) as: f,d where O Idl is the length of document d O tfr,d is the term frequency -number of occurrences of t in document d To avoid problem with zero probabilities, we smooth the estimates. First, we define where O Mc is the collection model O cfr is the number of occurrences of t in the collection O T = Σ1cfr is the total number of tokens in the collection We use Pt Mc) to smooth PtId) using Jelinek-Mercer smoothing as Let di The woman who opened fire at YouTube headquarters in Northern California may have been a disgruntled user of the video-sharing site, dz-Authorities are investigating a website that appears to show the same woman accusing YouTube of restricting her videos, and q YouTube fire. Rank di and d with respect to q using Jelinek-Mercer smoothing. Use λ-0.5 Clearly indicate any text normalization procedures performed on documents d1 and d2.

Expert Answer

Answer to 1. Consider ranking of documents using the Binary Independence Model (BIM) as shown below: RSVa-2 log where O p is the p… . . .

[Solved]1 Consider Ranking Documents Using Binary Independence Model Bim Shown Rsva 2 Log O P Prob Q37185550

Expert Answer

Related

Leave a Reply Cancel reply

Recent Assignments

See what happens after you have accessed solutions.