<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I have something to say about that... &#187; search</title>
	<atom:link href="http://hadleybeeman.net/tag/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://hadleybeeman.net</link>
	<description>Contributions to the conversation from Hadley Beeman</description>
	<lastBuildDate>Thu, 18 Feb 2010 14:11:11 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Questions on social searching</title>
		<link>http://hadleybeeman.net/2006/10/06/questions-on-social-searching/</link>
		<comments>http://hadleybeeman.net/2006/10/06/questions-on-social-searching/#comments</comments>
		<pubDate>Fri, 06 Oct 2006 11:15:44 +0000</pubDate>
		<dc:creator>Hadley Beeman</dc:creator>
				<category><![CDATA[social tools]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://collaborator.wordpress.com/2006/10/06/questions-on-social-searching/</guid>
		<description><![CDATA[Is there any documentation on the value of saving searches?  It seems to me that once you&#8217;ve looked for something and found it, you&#8217;ll either not need to find it again (you&#8217;ll have your question answered) or you&#8217;ll remember what path got you to the pot of gold in the first place.  Does [...]]]></description>
			<content:encoded><![CDATA[<p>Is there any documentation on the value of saving searches?  It seems to me that once you&#8217;ve looked for something and found it, you&#8217;ll either not need to find it again (you&#8217;ll have your question answered) or you&#8217;ll remember what path got you to the pot of gold in the first place.  Does the rest of the world not work like this?</p>
<p>Similarly, is there any value in sharing searches?  Does anybody else care what terms I used to find site x, and if so is there any meaningful way to convey this beyond pages and pages of &#8220;Query: x y z/Results: www., www., www.&#8221; that someone else would have to sort through?</p>
<p>It has come to my attention that these might be useful approaches, but I think I might be missing the boat.</p>
]]></content:encoded>
			<wfw:commentRss>http://hadleybeeman.net/2006/10/06/questions-on-social-searching/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Searching: then and now</title>
		<link>http://hadleybeeman.net/2006/10/05/3/</link>
		<comments>http://hadleybeeman.net/2006/10/05/3/#comments</comments>
		<pubDate>Thu, 05 Oct 2006 13:22:43 +0000</pubDate>
		<dc:creator>Hadley Beeman</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://collaborator.wordpress.com/2006/10/05/3/</guid>
		<description><![CDATA[Let&#8217;s talk about information retrieval algorithms.
I&#8217;ve been comparing search engines, looking for something suitable for a large amount of unstructured data from a lot of repositories. Now, option 1 says it operates on the probabilistic model of information retrieval (a description of this model is in this paper: part 1 and part 2), though the [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s talk about information retrieval algorithms.</p>
<p>I&#8217;ve been comparing search engines, looking for something suitable for a large amount of unstructured data from a lot of repositories. Now, option 1 says it operates on the probabilistic model of information retrieval (a description of this model is in this paper: <a href="http://www.soi.city.ac.uk/~ser/blockbuster/pmir-pt1-reprint.pdf">part 1</a> and <a href="http://www.soi.city.ac.uk/~ser/blockbuster/pmir-pt2-reprint.pdf">part 2</a>), though the implementers are extremely vague on exactly how they&#8217;re using it.</p>
<p>As far as I can tell, the probabilistic model creates a score for each document based on the probabilities of each of your search terms being in that document.  Probability of term 1 being there + probability of term 2 being there (etc) = matching score, which you can then use to rank this document against other documents.</p>
<p>In this implementation, they then weight the search terms that are rarest in all the documents (so that if you&#8217;re in a law firm and search for &#8220;Smith litigation&#8221;, &#8220;Smith&#8221; will be more important than &#8220;litigation&#8221;.  Your firm will probably have a lot using the term &#8220;litigation&#8221; so it won&#8217;t be as useful to pick out the docs you need).     It then normalises for document length, balances repeated terms (so that searching for &#8217;smith smith litigation&#8217; doesn&#8217;t mean it looks for documents with &#8220;smith&#8221; twice as often) and trims words to their stems using something like <a href="http://www.tartarus.org/martin/PorterStemmer/">the Porter Stemming algorithm</a>.</p>
<p>Okay, now I&#8217;ll admit I&#8217;m learning.  But this algorithm isn&#8217;t new: this &#8216;City model&#8217; of the probabilistic was initially proposed by Robertson and Sparck Jones in 1976 (&#8216;Relevance weighting of search terms&#8217;. <em>Journal of the American Society for Information Science</em>, 27, 129-146)   Is this still as good as we can get?</p>
<p>[Tune in next time: we'll weigh this up against <a href="http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf">probabilistic latent semantic analysis</a> and I can finally get around to asking my question!]</p>
]]></content:encoded>
			<wfw:commentRss>http://hadleybeeman.net/2006/10/05/3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
