博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
coursera课程Text Retrieval and Search Engines之Week 2 Overview
阅读量:5973 次
发布时间:2019-06-19

本文共 6509 字,大约阅读时间需要 21 分钟。

Week 2 Overview

Week 2

On this page:

  •  
  •  
  •  
  •  
  •  
  •  
  •  

Instructional Activities

Below is a list of the activities and assignments available to you this week. See the  page to know which assignments pertain to the badge or badges you are pursuing. Click on the name of each activity for more detailed instructions.

Relevant Badges Activity Due Date* Estimated Time Required
  Sunday, April 5 
(Suggested)
3 hours
Sunday, April 5 2-3 hours
Sunday, April 19 ~0.5 hours

* All deadlines are at 11:55 PM Central Time () unless otherwise noted.

Time

This module will last 7 days and should take approximately 6 hours of dedicated time to complete, with its readings and assignments.

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

  • Explain what an inverted index is and how to construct it for a large set of text documents that do not fit into the memory.
  • Explain how variable-length encoding can be used to compress integers and how unary coding and gamma-coding work.
  • Explain how scoring of documents in response to a query can be done quickly by using an inverted index.
  • Explain what Zipf’s law is. 
  • Explain what the Cranfield evaluation methodology is and how it works for evaluating a text retrieval system.
  • Explain how to evaluate a set of retrieved documents and how to compute precision, recall, and F1.
  • Explain how to evaluate a ranked list of documents.
  • Explain how to compute and plot a precision-recall curve.
  • Explain how to compute average precision and mean average precision (MAP).
  • Explain how to evaluate a ranked list with multi-level relevance judgments.
  • Explain how to compute normalized discounted cumulative gain.
  • Explain why it is important to perform a statistical significance test.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

  • Inverted index; postings
  • Binary coding; unary coding; gamma-coding; d-gap
  • Zipf’s law
  • Cranfield evaluation methodology
  • Precision; recall
  • Average precision; mean average precision (MAP); geometric mean average precision (gMAP)
  • Reciprocal rank; mean reciprocal rank
  • F-measure
  • Normalized discounted cumulative gain (nDCG)
  • Statistical significance test

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

  • What is the typical architecture of a text retrieval system?
  • What is an inverted index?
  • Why is it desirable for compressing an inverted index?
  • How can we create an inverted index when the collection of documents does not fit into the memory?
  • How can we leverage an inverted index to score documents quickly?
  • Why is evaluation so critical for research and application development in text retrieval?
  • How does Cranfield evaluation methodology work?
  • How do we evaluate a set of retrieved documents?
  • How do you compute precision, recall, and F1?
  • How do we evaluate a ranked list of search results?
  • How do you compute average precision? How do you compute mean average precision (MAP) and geometric mean average precision (gMAP)?
  • What is mean reciprocal rank?
  • Why is MAP more appropriate than precision at k documents when comparing two retrieval methods?
  • Why is precision at k documents more meaningful than average precision from a user’s perspective?
  • How can we evaluate a ranked list of search results using multi-level relevance judgments?
  • How do you compute normalized discounted cumulative gain (nDCG)?
  • Why is normalization necessary in nDCG? Does MAP need a similar normalization?
  • Why is it important to perform a statistical significance test when we compare the retrieval accuracies of two search engine systems?

Readings and Resources

The following readings are optional:

  • Mark Sanderson. "Test Collection Based Evaluation of Information Retrieval Systems." Foundations and Trends in Information Retrieval 4(4): 247-375 (2010).
  • Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition. Morgan Kaufmann, 1999.

Video Lectures

Video Lecture Lecture Notes Transcript Video Download SRT Caption File Forum
(00:21:27)    
 
(28.3 MB)
   
(00:18:21)    
 
(24.4 MB)
   
(00:17:11)    
 
(23.0 MB)
   
(00:10:10)    
 
(14.1 MB)
   
(00:12:54)    
 
(17.3 MB)
   
(00:15:51)    
 
(20.5 MB)
   
(00:10:01)    
 
(13.8 MB)
   
(00:10:48)    
 
(14.3 MB)
   
(00:15:14)    
 
(20.8 MB)
   

Tips for Success

To do well this week, I recommend that you do the following:

  • Review the video lectures a number of times to gain a solid understanding of the key questions and concepts introduced this week.
  • When possible, provide tips and suggestions to your peers in this class. As a learning community, we can help each other learn and grow. One way of doing this is by helping to address the questions that your peers pose. By engaging with each other, we’ll all learn better.
  • It’s always a good idea to refer to the video lectures and chapter readings we've read during this week and reference them in your responses. When appropriate, critique the information presented.
  • Take notes while you read the materials and watch the lectures for this week. By taking notes, you are interacting with the material and will find that it is easier to remember and to understand. With your notes, you’ll also find that it’s easier to complete your assignments. So, go ahead, do yourself a favor; take some notes!

Getting and Giving Help

You can get/give help via the following means:

  • Use the  to find information regarding specific technical problems. For example, technical problems would include error messages, difficulty submitting assignments, or problems with video playback. You can access the Help Center by clicking on theHelp Center link at the top right of any course page. If you cannot find an answer in the documentation, you can also report your problem to the Coursera staff by clicking on the Contact Us! link available on each topic's page within the Learner Help Center.
  • Use the  forum to report errors in lecture video content, assignment questions and answers, assignment grading, text and links on course pages, or the content of other course materials. University of Illinois staff and Community TAs will monitor this forum and respond to issues.

As a reminder, the instructor is not able to answer emails sent directly to his account. Rather, all questions should be reported as described above.

 

from: https://class.coursera.org/textretrieval-001/wiki/Week2Overview

转载地址:http://jhbox.baihongyu.com/

你可能感兴趣的文章
【51NOD-0】1006 最长公共子序列Lcs
查看>>
profiler内存优化:警惕回调函数
查看>>
django rest framework 解析器组件 接口设计,视图组件 (1)
查看>>
学以致用十七-----shell脚本之比较数字和字符串及if else
查看>>
【python3的学习之路十】模块
查看>>
JMS
查看>>
php常用函数之Math、GD篇
查看>>
Pierce振荡器设计
查看>>
STS新建MavenProject后java文件夹不出来的问题
查看>>
专心做事~
查看>>
通过U盘安装Windows 7
查看>>
day03-高阶函数、eval函数
查看>>
使用注解来构造IoC容器
查看>>
知问前端——按钮UI
查看>>
基于Python的数据分析(2):字符串编码
查看>>
android intent.setDate方法
查看>>
每天一个linux命令(20):find命令之exec
查看>>
重绘TabControl
查看>>
Excel导出文件流下载
查看>>
<知识库的构建> 2-3 消歧 Disambiguaion
查看>>