<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>it’s all semantics &#187; Jake Zarnegar</title>
	<atom:link href="http://blog.silverchair.com/author/jakezarnegar/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.silverchair.com</link>
	<description>Semantic Strategy Insights for Publishers</description>
	<lastBuildDate>Fri, 19 Mar 2010 00:30:24 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='blog.silverchair.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/c65cb61b7068f8507109857024ad8976?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>it’s all semantics &#187; Jake Zarnegar</title>
		<link>http://blog.silverchair.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.silverchair.com/osd.xml" title="it’s all semantics" />
	<atom:link rel='hub' href='http://blog.silverchair.com/?pushpress=hub'/>
		<item>
		<title>Evaluation of Automated Tagging Solutions</title>
		<link>http://blog.silverchair.com/2010/02/04/evaluation-of-automated-tagging-solutions/</link>
		<comments>http://blog.silverchair.com/2010/02/04/evaluation-of-automated-tagging-solutions/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 14:43:29 +0000</pubDate>
		<dc:creator>Jake Zarnegar</dc:creator>
				<category><![CDATA[classification/tagging]]></category>
		<category><![CDATA[semantic enrichment]]></category>
		<category><![CDATA[automated tagging]]></category>
		<category><![CDATA[Cortex]]></category>
		<category><![CDATA[semantic tagging]]></category>
		<category><![CDATA[Tagmaster]]></category>

		<guid isPermaLink="false">http://blog.silverchair.com/?p=360</guid>
		<description><![CDATA[ 
As we at Silverchair and Semedica see more and more interest in automated tagging solutions (such as our Tagmaster system), we are more frequently encountering questions about how to evaluate their results. Here are a few ideas on the subject:
Evaluation: Humans Required!
It is hard to get around the fact that you will need human editors (or professional indexers) and your [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=360&subd=semedica&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:center;"> </p>
<p>As we at Silverchair and Semedica see more and more interest in automated tagging solutions (such as our <a href="http://www.semedica.com/tagmaster.aspx" target="_blank">Tagmaster</a> system), we are more frequently encountering questions about how to evaluate their results. Here are a few ideas on the subject:</p>
<h1>Evaluation: Humans Required!</h1>
<p>It is hard to get around the fact that you will need human editors (or professional indexers) and your human technology team (who will use the tags to create interesting new features) to verify that an automated system is working correctly and that the tagging is accurate and useful. </p>
<p>Recently, someone asked our CEO Thane Kerner if we had an automated system to verify the accuracy of our automated tagging. Thane replied (rather cheekily, I must say): “If we had an automated review system that could measure tagging accuracy more precisely than the current tagging system, we wouldn’t use it to verify tags, we’d use it to tag the content to begin with!” The lesson: Once you’ve deployed your best automated system to do the tagging, humans are the next logical reviewers. </p>
<p>Here are four factors your humans should consider in their review:</p>
<div id="attachment_372" class="wp-caption alignright" style="width: 310px"><a href="http://semedica.files.wordpress.com/2010/02/tagmaster_content_page1.gif"><img class="size-medium wp-image-372" title="View inside Semedica's Tagmaster" src="http://semedica.files.wordpress.com/2010/02/tagmaster_content_page1.gif?w=300&#038;h=192" alt="View inside Semedica's Tagmaster" width="300" height="192" /></a><p class="wp-caption-text">View inside Semedica&#39;s Tagmaster, showing tags automatically inserted at the paragraph level</p></div>
<h2>1.  Expert/Editorial Accuracy Confidence</h2>
<p>One key target for evaluation is to assess how much confidence your key stakeholders (journal boards, editors, etc.) express in the output of the system. But confidence is not a linear equation. I posit the following values:</p>
<ul>
<li>Impeccable tag placement: +1</li>
<li>Debatable tag placement: −1</li>
<li>Debatable tag omission: −1</li>
<li>Obvious tag omission: −10</li>
<li>Obvious irrelevant tag placement: −50</li>
</ul>
<p>The first thing you’ll notice is the weight of positive to negative. In high-stakes fields (including science and medicine), humans are naturally biased to more heavily favor negative experiences.  (Of course, this has aided us well in survival: “Don’t eat that type of berry again, it made you sick last time!”) What that means in terms of confidence is that stakeholders will need a<em> disproportionate amount</em> of positive reassurance to get over negative outcomes. And the impact of a particularly egregious negative outcome (resulting from a particularly poorly placed tag) can be devastating to your stakeholder’s impression of a tagging system. (This is why Silverchair’s system defaults to using conservative methods with very little “guessing” to avoid obvious irrelevant tag placement.) </p>
<h2>2.  Usefulness!</h2>
<p>The next key target for evaluation for both editorial and technical stakeholders to assess is <em>usefulness</em> of the tagging applied. Tags should be highly relevant in a domain-specific context and they should drive better discoverability and linking. Primary care, genetics, surgery, and emergency care all take very different approaches to the same topics, and their tagging should reflect their uses. </p>
<p>The tagging system you are evaluating may have added tagged concepts that are tangential or irrelevant to the use model of the content, and such tags would not be capable of driving innovative site features (in many cases, tangential tagging actually <em>inhibits</em> the ability for new systems to work effectively). For example, it is a nice-to-have if your tagging system can recognize place names and person names, but if it misses or miscategorizes important topics like clinical trial names it doesn’t matter how many people or places it can tag. (Clinical trial acronyms can be particularly tricky to tag―<a href="http://blog.silverchair.com/2010/01/26/searches-for-clinical-trials-we-can-do-better/" target="_blank">see our post</a> about them.)</p>
<h2>3.  Granularity</h2>
<p>Does the system still work with “documents” or can it identify topics down to the section/paragraph/figure/table/equation level? At Silverchair we work with many dense medical chapters that may cover more than 200 distinct topics, so we see it as a necessity for our tagging system to break those documents down into smaller parts in order to deliver precise packets of highly relevant information to our users.</p>
<h2>4.  Control and Ongoing Improvement</h2>
<p>Any system selected is not going to be extremely accurate “out-of-the-box.” (I write that as a realist, not as a pessimist!) So during evaluation you must ask, “How easy is it to make impactful positive changes to the system?” This can take a variety of methods—some systems suggest manually selecting training documents for each topic or category (which can get onerous when you have 20,000 topics), some systems allow your software developers to go in and tinker with the code (you have data classification expert software developers, right?!?), and some systems allow you to load and use a taxonomy or thesaurus to aid in topic identification and tagging (assumes a taxonomy/thesaurus exists or can be created for your domain).</p>
<p>At Silverchair, we work primarily in medicine, which is a taxonomy-rich domain with an ever-growing list of topics. For that reason, we’ve chosen the last method as our control and improvement strategy. Our editors update our <a href="http://semedica.com/cortex.aspx" target="_blank">Cortex</a> medical taxonomy and its related thesaurus every day to keep pace with the topics being written about and searched for. </p>
<h1>Summary</h1>
<p>If you choose a system that 1) is accurate enough to instill confidence in your editorial team, 2) is useful enough to drive meaningful new features and improvements, 3) classifies your data at a granular level, and 4) is flexible enough to allow explicit control and ongoing improvements―you’ve made a wise purchase!</p>
<br />Filed under: <a href='http://blog.silverchair.com/category/classificationtagging/'>classification/tagging</a>, <a href='http://blog.silverchair.com/category/semantic-enrichment/'>semantic enrichment</a> Tagged: <a href='http://blog.silverchair.com/tag/automated-tagging/'>automated tagging</a>, <a href='http://blog.silverchair.com/tag/classificationtagging/'>classification/tagging</a>, <a href='http://blog.silverchair.com/tag/cortex/'>Cortex</a>, <a href='http://blog.silverchair.com/tag/semantic-tagging/'>semantic tagging</a>, <a href='http://blog.silverchair.com/tag/tagmaster/'>Tagmaster</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/semedica.wordpress.com/360/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/semedica.wordpress.com/360/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/semedica.wordpress.com/360/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/semedica.wordpress.com/360/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/semedica.wordpress.com/360/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/semedica.wordpress.com/360/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/semedica.wordpress.com/360/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/semedica.wordpress.com/360/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/semedica.wordpress.com/360/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/semedica.wordpress.com/360/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=360&subd=semedica&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.silverchair.com/2010/02/04/evaluation-of-automated-tagging-solutions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f98c3087939c2c744ccaa4a42b38d3e9?s=96&#38;d=http%3A%2F%2Fa.wordpress.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">Jake Zarnegar</media:title>
		</media:content>

		<media:content url="http://semedica.files.wordpress.com/2010/02/tagmaster_content_page1.gif?w=300" medium="image">
			<media:title type="html">View inside Semedica's Tagmaster</media:title>
		</media:content>
	</item>
		<item>
		<title>Internal Memory vs. External Memory</title>
		<link>http://blog.silverchair.com/2009/12/07/internal-memory-vs-external-memory/</link>
		<comments>http://blog.silverchair.com/2009/12/07/internal-memory-vs-external-memory/#comments</comments>
		<pubDate>Mon, 07 Dec 2009 19:39:09 +0000</pubDate>
		<dc:creator>Jake Zarnegar</dc:creator>
				<category><![CDATA[classification/tagging]]></category>
		<category><![CDATA[semantic enrichment]]></category>
		<category><![CDATA[computer memory]]></category>
		<category><![CDATA[memory]]></category>

		<guid isPermaLink="false">http://blog.silverchair.com/?p=330</guid>
		<description><![CDATA[As we were setting up a new external SAN (storage area network) on the Silverchair production web farm recently, the network engineer said something that caught my attention: “The web servers will be able to use the external SAN drives faster than their own internal memory.” At first that defied my expectations of “internal vs. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=330&subd=semedica&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>As we were setting up a new external SAN (storage area network) on the Silverchair production web farm recently, the network engineer said something that caught my attention: “The web servers will be able to use the external SAN drives <strong><em>faster</em></strong> than their own internal memory.” At first that defied my expectations of “internal vs. external,” but when I thought about more, it made perfect sense.</p>
<p>The web servers are designed to execute application logic, store session tracking data, handle user interaction input, and synthesize, parse, and display data from a variety of sources—they are logic processing engines that handle data storage only when necessary. On the other hand, the SAN has one purpose—to store a large amount of data and enable a super-efficient data delivery channel that rapidly responds to content requests from the web servers.</p>
<p>The more I thought about it, the more I realized it was a fitting metaphor for how humans work. We are fantastic logic processing engines. We parse, synthesize, analyze, and use data input from a variety of sources to perform creative problem solving. And most importantly to this metaphor, we only store data internally when absolutely necessary. In the present day, the comprehensiveness and ubiquity of the Internet have allowed us to store an unprecedented amount of collective memory in external sources and access it from wherever we may be.</p>
<p>To be clear, human use of external memory did not arrive with the Internet—it has been around since the beginning of civilization. We are used to storing memory in external sources and freeing up our internal resources. Papyrus eliminated the need to memorize long epic poems. Abaci eliminated the need to memorize multiplication tables. (<em>NB</em>: Don’t try telling that to a 2nd grade teacher.) In modern medicine, drug handbooks store dosage and safety information that is too complex for doctors to memorize <em>in</em> <em>toto</em>. Phone numbers stored in our mobile phones eliminate the need to memorize the phone numbers of friends. We even store memories in our friends and family—I recently asked my wife, “What was the name of that hotel we liked in Chicago?” She knew, and voila, I had accessed my external memory successfully.</p>
<p>Alas, my comparison of human activity to Silverchair’s web farm breaks down at a key point. In many cases, accessing our external memory is <em>not</em> fast and efficient. Currently the external memory sources of humans are not deployed as efficiently as a SAN. Internet content sources can be hard to access, store content in highly variable forms, require a special vocabulary or technique to query, and return data in a way that does not suit our purpose.</p>
<p>This is the fundamental problem that Silverchair’s Semedica division addresses with semantic enrichment of data sources. We’re organizing a specific external memory category (in our case, online medical and health care information) in a way that allows it to be accessed more quickly and to return data in the right form for efficient use by clinicians and researchers. The less data that health care workers need to store internally, the more of their “processing time” can be used toward envisioning creative solutions for preventing and curing diseases. That is something that the Internet cannot do. (Yet.)</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/f9513dac-6816-4e9e-8fdf-f32ea02d43aa/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_e.png?x-id=f9513dac-6816-4e9e-8fdf-f32ea02d43aa" alt="Reblog this post [with Zemanta]" /></a></div>
<br />Posted in classification/tagging, semantic enrichment Tagged: classification/tagging, computer memory, memory <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/semedica.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/semedica.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/semedica.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/semedica.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/semedica.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/semedica.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/semedica.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/semedica.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/semedica.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/semedica.wordpress.com/330/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=330&subd=semedica&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.silverchair.com/2009/12/07/internal-memory-vs-external-memory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f98c3087939c2c744ccaa4a42b38d3e9?s=96&#38;d=http%3A%2F%2Fa.wordpress.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">Jake Zarnegar</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_e.png?x-id=f9513dac-6816-4e9e-8fdf-f32ea02d43aa" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>NIH Makes Big Strides Toward Funding Clarity, But Still Could Be Better!</title>
		<link>http://blog.silverchair.com/2009/11/06/nih-makes-big-strides-toward-funding-clarity-but-still-could-be-better/</link>
		<comments>http://blog.silverchair.com/2009/11/06/nih-makes-big-strides-toward-funding-clarity-but-still-could-be-better/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 15:33:18 +0000</pubDate>
		<dc:creator>Jake Zarnegar</dc:creator>
				<category><![CDATA[classification/tagging]]></category>
		<category><![CDATA[semantic enrichment]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[Agency for Healthcare Research and Quality (AHRQ)]]></category>
		<category><![CDATA[Grant funding]]></category>
		<category><![CDATA[National Institutes of Health (NIH)]]></category>
		<category><![CDATA[RePORT (Research Portfolio Online Reporting Tool)]]></category>

		<guid isPermaLink="false">http://blog.silverchair.com/?p=253</guid>
		<description><![CDATA[The NIH has rolled out their new RePORT (Research Portfolio Online Reporting Tool) web site for information on funding, grants, and NIH research. As someone who works on government grants and contracts, I’m happy with this new level of transparency and clarity as to what topics (and who!) is being funded. It is a big [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=253&subd=semedica&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_257" class="wp-caption alignright" style="width: 190px"><a href="http://en.wikipedia.org/wiki/Apples_and_oranges"><img class="size-full wp-image-257" title="Apples_to_Oranges" src="http://semedica.files.wordpress.com/2009/11/apples_to_oranges.jpg?w=180&#038;h=124" alt="Apples to oranges comparison" width="180" height="124" /></a><p class="wp-caption-text">Image via Wikipedia</p></div>
<p>The NIH has rolled out their new <a href="http://report.nih.gov/" target="_blank">RePORT (Research Portfolio Online Reporting Tool) web site</a> for information on funding, grants, and NIH research. As someone who works on government grants and contracts, I’m happy with this new level of transparency and clarity as to what topics (and who!) is being funded. It is a big upgrade from the incumbent system, which was hard to navigate and understand.</p>
<p>The most useful area of the site to me is the <a href="http://report.nih.gov/rcdc/categories/" target="_blank">categorical spending section</a>. It really gives you an idea of NIH’s funding priorities—it offers over 200 categories of funding.</p>
<p>However, it still has ample room for improvement. Currently it is an alphabetical list that contains items that are hard to compare. Here are some example categories that are not equivalent in scope:</p>
<ul>
<li>Allergic Rhinitis (Hay Fever)</li>
<li>American Indians / Alaska Natives</li>
<li>Burden of Illness</li>
<li>Cancer</li>
<li>Cardiovascular</li>
<li>Clinical Trials</li>
<li>Conditions Affecting Unborn Children</li>
<li>Gene Therapy</li>
<li>Gene Therapy Clinical Trials</li>
<li>Genetic Testing</li>
<li>Genetics</li>
</ul>
<p>Some are very specific (hay fever), some are broad (cancer), some are ambiguous (cardiovascular), some take a completely different approach than the dominant disease/condition approach (American Indians/Alaska Natives), and some seem to be repetitive.</p>
<p>With a bit of work, this information could be turned from its current flat list expression into a multilevel taxonomy that allows users to slice it up in the ways that appeal to them (conditions or target populations, for example). Silverchair does this for the Agency for Healthcare Research and Quality on their <a href="http://psnet.ahrq.gov/" target="_blank">PSNet</a> patient safety clearinghouse. A small amount of classification work can go a long way in creating valuable new features—NIH has proven that with their RePORT upgrade, but I’d like to see them go farther.</p>
<p>I’d be happy to help out with the NIH site, but I’m not sure what category that would be funded under…</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/11b89d62-1cff-4672-917b-e96703a67171/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_e.png?x-id=11b89d62-1cff-4672-917b-e96703a67171" alt="Reblog this post [with Zemanta]" /></a></div>
<br />Posted in classification/tagging, semantic enrichment, taxonomy Tagged: Agency for Healthcare Research and Quality (AHRQ), classification/tagging, Grant funding, National Institutes of Health (NIH), RePORT (Research Portfolio Online Reporting Tool), taxonomy <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/semedica.wordpress.com/253/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/semedica.wordpress.com/253/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/semedica.wordpress.com/253/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/semedica.wordpress.com/253/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/semedica.wordpress.com/253/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/semedica.wordpress.com/253/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/semedica.wordpress.com/253/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/semedica.wordpress.com/253/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/semedica.wordpress.com/253/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/semedica.wordpress.com/253/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=253&subd=semedica&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.silverchair.com/2009/11/06/nih-makes-big-strides-toward-funding-clarity-but-still-could-be-better/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f98c3087939c2c744ccaa4a42b38d3e9?s=96&#38;d=http%3A%2F%2Fa.wordpress.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">Jake Zarnegar</media:title>
		</media:content>

		<media:content url="http://semedica.files.wordpress.com/2009/11/apples_to_oranges.jpg" medium="image">
			<media:title type="html">Apples_to_Oranges</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_e.png?x-id=11b89d62-1cff-4672-917b-e96703a67171" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>The “Simple” Payoff</title>
		<link>http://blog.silverchair.com/2009/09/18/the-%e2%80%9csimple%e2%80%9d-payoff/</link>
		<comments>http://blog.silverchair.com/2009/09/18/the-%e2%80%9csimple%e2%80%9d-payoff/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 16:33:41 +0000</pubDate>
		<dc:creator>Jake Zarnegar</dc:creator>
				<category><![CDATA[semantic enrichment]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[Amazon.com]]></category>
		<category><![CDATA[medical terminology]]></category>
		<category><![CDATA[simplicity]]></category>
		<category><![CDATA[UMLS]]></category>

		<guid isPermaLink="false">http://blog.silverchair.com/?p=175</guid>
		<description><![CDATA[Top-selling books for the search phrase “medical terminology” on Amazon:

Medical Terminology: A Short Course
Quick Medical Terminology
Medical Terminology: The Basics
Medical Terminology Simplified
Medical Terminology for Dummies

Anyone else sensing a theme? Considering that the Unified Medical Language System (UMLS) has more than 2,000,000 terms, I’m not surprised simplicity is in demand.
For publishers, taking measures to make medical terminology [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=175&subd=semedica&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Top-sellin<a href="http://www.amazon.com/Medical-Terminology-Dummies-Health-Fitness/dp/0470279656/ref=sr_1_8?ie=UTF8&amp;s=books&amp;qid=1253290685&amp;sr=1-8" target="_blank"><img class="size-full wp-image-176 alignright" title="Medical Terminology for Dummies" src="http://semedica.files.wordpress.com/2009/09/medicalterminologyfordummies.jpg?w=192&#038;h=192" alt="Medical Terminology for Dummies" width="192" height="192" /></a>g books for the search phrase “medical terminology” on Amazon:</p>
<ul>
<li>Medical Terminology: A Short Course</li>
<li>Quick Medical Terminology</li>
<li>Medical Terminology: The Basics</li>
<li>Medical Terminology Simplified</li>
<li>Medical Terminology for Dummies</li>
</ul>
<p>Anyone else sensing a theme? Considering that the Unified Medical Language System (UMLS) has more than 2,000,000 terms, I’m not surprised simplicity is in demand.</p>
<p>For publishers, taking measures to make medical terminology (and life) easier for health care professionals has a direct payback [re-read list, above]. Consider it mission critical!</p>
<br />Posted in semantic enrichment, taxonomy Tagged: Amazon.com, medical terminology, simplicity, UMLS <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/semedica.wordpress.com/175/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/semedica.wordpress.com/175/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/semedica.wordpress.com/175/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/semedica.wordpress.com/175/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/semedica.wordpress.com/175/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/semedica.wordpress.com/175/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/semedica.wordpress.com/175/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/semedica.wordpress.com/175/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/semedica.wordpress.com/175/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/semedica.wordpress.com/175/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=175&subd=semedica&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.silverchair.com/2009/09/18/the-%e2%80%9csimple%e2%80%9d-payoff/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f98c3087939c2c744ccaa4a42b38d3e9?s=96&#38;d=http%3A%2F%2Fa.wordpress.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">Jake Zarnegar</media:title>
		</media:content>

		<media:content url="http://semedica.files.wordpress.com/2009/09/medicalterminologyfordummies.jpg" medium="image">
			<media:title type="html">Medical Terminology for Dummies</media:title>
		</media:content>
	</item>
		<item>
		<title>Finding Hidden Text With a Specialized Thesaurus</title>
		<link>http://blog.silverchair.com/2009/09/14/finding-hidden-text-with-a-specialized-thesaurus/</link>
		<comments>http://blog.silverchair.com/2009/09/14/finding-hidden-text-with-a-specialized-thesaurus/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 13:26:10 +0000</pubDate>
		<dc:creator>Jake Zarnegar</dc:creator>
				<category><![CDATA[linking]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[equivalents]]></category>
		<category><![CDATA[thesaurus]]></category>

		<guid isPermaLink="false">http://blog.silverchair.com/?p=151</guid>
		<description><![CDATA[When good authors write, they choose the terminology they want to describe the topics they are addressing and use that terminology consistently throughout the text. This, of course, is good for readers in terms of internal clarity and consistency.
But this authoring strategy is distinctly disadvantageous to discovery (search) and integration (linking) in modern web applications. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=151&subd=semedica&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_155" class="wp-caption alignright" style="width: 231px"><a href="http://en.wikipedia.org/wiki/File:Treasure-Island-map.jpg" target="_blank"><img class="size-full wp-image-155" title="Map created by Robert Lewis Stevenson in Treasure Island" src="http://semedica.files.wordpress.com/2009/09/treasure-island-map.jpg?w=221&#038;h=359" alt="Map created by Robert Lewis Stevenson in Treasure Island (image from Wikipedia)" width="221" height="359" /></a><p class="wp-caption-text">Map created by Robert Lewis Stevenson in Treasure Island (image from Wikipedia)</p></div>
<p>When good authors write, they choose the terminology they want to describe the topics they are addressing and use that terminology consistently throughout the text. This, of course, is good for readers in terms of internal clarity and consistency.</p>
<p>But this authoring strategy is distinctly disadvantageous to discovery (search) and integration (linking) in modern web applications. Why? Because every time an author makes a terminology choice they EXCLUDE other equivalent options. These excluded options could include terminology that other authors have chosen or are the preferred terminology of their potential readers. I’m not blaming the authors, of course—their writing would be nonsense if they included all equivalent choices in their text.</p>
<p>So how do you deal with these missing options? Thesauri to the rescue! Every web search, linking, and categorization system should employ some form of thesaurus behind the scenes. And in specialized areas like medicine, you’ll need a specialized thesaurus rather than a basic broad one. This thesaurus should include synonyms, acronyms, abbreviations, and jargon, and should be based on real-world authoring and searching behavior (rather than academic nit-picking).</p>
<p>In essence, a thesaurus expands the author’s original text into much richer data for automated searching and linking algorithms. Let’s look at an example:</p>
<p style="padding-left:30px;"><strong>ACTUAL TEXT:</strong> Chemoreceptors in the carotid bodies and medulla are activated by hypoxemia, acute hypercapnia, and acidemia.</p>
<p style="padding-left:30px;"><strong>EXPANDED TEXT<sup>1</sup>:</strong> Chemoreceptors in the carotid bodies <span style="color:#ff0000;">(carotid glomus, glomera carotica, glomus caroticum, glomus caroticus)</span> and medulla <span style="color:#ff0000;">(adrenal medulla, medulla oblongata, glandula suprarenalis, suprarenal medulla, adm, metepencephalon, medullary, myelencephalon)</span> are activated by hypoxemia <span style="color:#ff0000;">(hypoxaemia, arterial hypoxemia)</span>, acute hypercapnia <span style="color:#ff0000;">(blood carbon dioxide increased, blood co2 increased, carbon dioxide retention, carbon dioxide, increased level, hypercapnemia, hypercapnaemia, hypercarbia, pco2 increased on arterial blood gas, elevated pco2, retention carbon dioxide, serum carbon dioxide increased)</span>, and acidemia <span style="color:#ff0000;">(acidaemia).</span></p>
<p>The actual text as written was 15 words. The expanded text was 71 words, or approximately 4.7 times longer. Humans read the first sentence, and machines read the second.</p>
<p>No matter how a user searches for this text (“hypercapnia” vs. “hypercarbia,” for example) they will match this text with a good thesaurus.</p>
<p>Are readers finding what they want on your web site this easily?</p>
<p><em><sup>1</sup></em><em>Thesaurus Source: Silverchair’s Cortex taxonomy—with references to SNOMED, Read Codes, MeSH, Digital Anatomist, NCI Thesaurus, NeuroNames Brain Hierarchy, MedDRA, WHO Adverse Reaction Terminology, OMIM, DXplain, CRISP Thesaurus, Clinical Problem Statements, and COSTART.</em></p>
<br />Posted in linking, search, taxonomy Tagged: equivalents, linking, search, taxonomy, thesaurus <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/semedica.wordpress.com/151/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/semedica.wordpress.com/151/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/semedica.wordpress.com/151/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/semedica.wordpress.com/151/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/semedica.wordpress.com/151/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/semedica.wordpress.com/151/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/semedica.wordpress.com/151/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/semedica.wordpress.com/151/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/semedica.wordpress.com/151/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/semedica.wordpress.com/151/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.silverchair.com&blog=8554914&post=151&subd=semedica&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.silverchair.com/2009/09/14/finding-hidden-text-with-a-specialized-thesaurus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f98c3087939c2c744ccaa4a42b38d3e9?s=96&#38;d=http%3A%2F%2Fa.wordpress.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">Jake Zarnegar</media:title>
		</media:content>

		<media:content url="http://semedica.files.wordpress.com/2009/09/treasure-island-map.jpg" medium="image">
			<media:title type="html">Map created by Robert Lewis Stevenson in Treasure Island</media:title>
		</media:content>
	</item>
	</channel>
</rss>