Just another WordPress.com weblog

Archive for February 2009

How to describe HTC?

leave a comment »

Here is what I am currently saying:

HighTechCville is a website that attempts to aggregate in a single website information about people, organizations related to “high tech” industries in the Charlottesville, Virginia geographical area. HighTechCville attempts to facilitate the connections between people and organizations to support the growth of technology related businesses and therefore the quality of life in Charlottesville.

HighTechCville is also a research project focusing on enabling the 3.0 Web. Where Web 1.0 was a read only environment with people visiting sites for information, and Web 2.0 was the Read/Write web, with people blogging, instant messenger, Facebook, HighTechCville attempts to investigate if Web 3.0 is the Read/Write/Share web. Where data from multiple sources is easily combined and reused in new unforeseen ways.

So what do you think? Help me improve my elevator pitch!


Written by Eric

February 26, 2009 at 10:20 am

Posted in Uncategorized

HTC moving Servers

leave a comment »


Just a heads up that we are moving http://www.hightechcville.com from one server to another, and revamping some underpinnings, so the site may be down for most of the week since this is just a side project!


Written by Eric

February 25, 2009 at 4:43 pm

Posted in Uncategorized

LinkedIn terrified of OpenCalais?

with 4 comments

Based on some basic testing it seems like LinkedIn blocks any traffic from the SemanticProxy.com site. I wanted to see what my page http://www.linkedin.com/in/epugh would render when fed through the OpenCalais entity extraction engine, and instead I get back connection errors.

This actually makes a lot of sense. LinkedIn’s public profile pages are there to be indexed by search engines, drive more traffic to their site, but to convince users to join their walled network. So they put some information out there. But, to make sense of the data, you need to join the network so you can do queries, and see the underlying meaning behind the text.

But, with tools like OpenCalais proliferating, this allows other folks to add meaning to these profile pages, and reduces the need to join the walled garden.

For my part, a couple lines of Ruby code and this is what I extracted (type, value, relevancy):

Organization: Apache Software Foundation, 0.268
City: Charlottesville, 0.286
Technology: Information Technology, 0.344
Position: Services Consultant , 0.302
Technology: Apache, 0.268
Person: Eric Pugh, 0.845
Company: LinkedIn Corporation, 0.724

What OpenCalais missed that I would have linked was the My Interests which would have maybe returned some industry terms such as agile practices, ruby on rails, open source, unit testing, scrum, selenium and the websites listed.

Written by Eric

February 25, 2009 at 10:11 am

Entity Extraction of URL’s made easy…. Partly.

leave a comment »

Thanks to Ed Summers at the Library of Congress for his post on SemanticProxy. Semantic proxy offers a dead simple API for feeding URL’s to the OpenCalais entity extraction engine.

For those of you not familar with OpenCalais, it is a “rules” based Entity Extraction engine that knows how to find in free form text certain bits of information, like the name of a person. OpenCalais is sponsored by Thomson Reuters, so most of the rules are based around the text you would find in a newspaper. Like “GM is in talks with Chrysler for a merger” would give you the companies GM and Chrysler, as well as relationship called “merger_talks” between the two.

I was hoping to use OpenCalais to extract place, time, and subject information for all those free form event announcements. Unfortunately OpenCalais doesn’t have the rules to pull that type of entity out. It does find on the people listed in an event, but that’s it.

However, I did fine it very useful to find more people information for HTC. A new group in Charlottesville called FirstWednesdays has started, and their “Find Me” page in the comments has a wealth of data. I am just splitting up the DOM on each comment, and feeding each one to OpenCalais and getting back a person and a couple of relevant links. It’s working great.

So between SemanticProxy for pages of raw content and using OpenCalais for specific chunks of text I expect to simplify the process of adding new data to HighTechCville.

Written by Eric

February 24, 2009 at 5:48 pm

Posted in Uncategorized

Add HTC to your Firefox list of search engines

leave a comment »

Find your self search HighTechCville often? Now you can add HTC to your list of search engines in Firefox through the magic of the OpenSearch.org API. Just browse to http://www.hightechcville.com using Firefox and click the small blue arrow by your search bar and choose “Add HTCFind”:


Add a shortcut tag like “htc” and then type into your location bar “htc: Eric Pugh” to find me!

Written by Eric

February 4, 2009 at 8:07 pm

Posted in Uncategorized