James McNeil's Blog: 11/22 Reading Responses

Web Search Engines: Part 1 and Part 2

This article discussed information that I found to be new and informative, because I was not familiar with the complex processes of web crawling. I thought that the first article efficiently described the basic aspects of web crawling and algorithms, although a few more diagrams or tables would’ve been more helpful in my opinion. Considering the arduous processes that search engines require to crawl and index data, one may wonder how much greater the efficiency of search engines will become within the next decade. I think that the second article was more interesting because of the discussions of data compression and phrasing within the article. There were many terms used in both articles with which I was not familiar, although I found the sections about anchor texts and query processing algorithms in the second article to be the most interesting sections in the second article. The sections on anchor texts and query processing algorithms discussed two complex and important processes in data searching that were fairly easy to understand. I was not aware of the importance of anchor texts and query processing algorithms before reading the second article.

Current developments and future trends for the OAI protocol for metadata harvesting

I think that this article described just how vast and important the intent of the OAI is. The article efficiently described what the future intent of the OAI is, and it also demonstrated how important the development of information repositories is for the future of metadata. One section out of many in the article that I found to be interesting and important was the section about ERRoLs. It seems as though the process of resolving oai-identifiers with ERRoL service URLs is a good example of the process of simplified searching, even though the results of this process show that there is much more work to be done to make it more effective. After observing the various kinds of metadata in the article, I am interested in what the OAI will implement regarding the issues of structured rights statements and controlled vocabularies.

The Deep Web: Surfacing Hidden Value

I think that this article described the complexity of the Web better than the Hawking article, because it provided many more diagrams and tables to demonstrate the vastness of the Surface Web and the Deep Web. Based on the estimations of the percentage of subject coverage in the Deep Web, I found it interesting that the majority of the Deep Web’s observed content is associated with media and humanities. I think that Table 7 in the article effectively demonstrated the difference in query yield between the Surface Web and the Deep Web. Considering that the yield for the queries in the Deep Web for specific subjects is over double the yield in the Surface Web, one may wonder just how many of the results from the Deep Web are accurate, and how many are mostly inaccurate. I was also found it interesting to discover that the amount of information contained within the Deep Web has been increasing at such a rapid rate since 1997, that the amount of original Deep Web content has almost doubled the amount of all global printed content.

1 comment:

Amy JamesNovember 17, 2010 at 7:24 PM
I think it's important to note too that the White Paper by Bergman is 9 years old- I think some of the things he writes about have changed- even if you look at the 60 largest deep websites, at least some of those have been searched by Google... they're not all hidden any more.

Wednesday, November 17, 2010

11/22 Reading Responses

1 comment: