Archive for the ‘Uncategorized’ Category
Google now supports microformats and RDFa
You know you are getting traction when the Big Guy joins in… I’ll be keeping a eye on when Google indexes HTC again and see if the results start picking up the hCard data listed for each person and organization.
Eric
Where in the World is HighTechCville?
How to describe HTC?
Here is what I am currently saying:
HighTechCville is a website that attempts to aggregate in a single website information about people, organizations related to “high tech” industries in the Charlottesville, Virginia geographical area. HighTechCville attempts to facilitate the connections between people and organizations to support the growth of technology related businesses and therefore the quality of life in Charlottesville.
HighTechCville is also a research project focusing on enabling the 3.0 Web. Where Web 1.0 was a read only environment with people visiting sites for information, and Web 2.0 was the Read/Write web, with people blogging, instant messenger, Facebook, HighTechCville attempts to investigate if Web 3.0 is the Read/Write/Share web. Where data from multiple sources is easily combined and reused in new unforeseen ways.
So what do you think? Help me improve my elevator pitch!
HTC moving Servers
Folks,
Just a heads up that we are moving www.hightechcville.com from one server to another, and revamping some underpinnings, so the site may be down for most of the week since this is just a side project!
Eric
LinkedIn terrified of OpenCalais?
Based on some basic testing it seems like LinkedIn blocks any traffic from the SemanticProxy.com site. I wanted to see what my page http://www.linkedin.com/in/epugh would render when fed through the OpenCalais entity extraction engine, and instead I get back connection errors.
This actually makes a lot of sense. LinkedIn’s public profile pages are there to be indexed by search engines, drive more traffic to their site, but to convince users to join their walled network. So they put some information out there. But, to make sense of the data, you need to join the network so you can do queries, and see the underlying meaning behind the text.
But, with tools like OpenCalais proliferating, this allows other folks to add meaning to these profile pages, and reduces the need to join the walled garden.
For my part, a couple lines of Ruby code and this is what I extracted (type, value, relevancy):
Organization: Apache Software Foundation, 0.268
City: Charlottesville, 0.286
Technology: Information Technology, 0.344
Position: Services Consultant , 0.302
Technology: Apache, 0.268
Person: Eric Pugh, 0.845
Company: LinkedIn Corporation, 0.724
What OpenCalais missed that I would have linked was the My Interests which would have maybe returned some industry terms such as agile practices, ruby on rails, open source, unit testing, scrum, selenium and the websites listed.
Entity Extraction of URL’s made easy…. Partly.
Thanks to Ed Summers at the Library of Congress for his post on SemanticProxy. Semantic proxy offers a dead simple API for feeding URL’s to the OpenCalais entity extraction engine.

For those of you not familar with OpenCalais, it is a “rules” based Entity Extraction engine that knows how to find in free form text certain bits of information, like the name of a person. OpenCalais is sponsored by Thomson Reuters, so most of the rules are based around the text you would find in a newspaper. Like “GM is in talks with Chrysler for a merger” would give you the companies GM and Chrysler, as well as relationship called “merger_talks” between the two.
I was hoping to use OpenCalais to extract place, time, and subject information for all those free form event announcements. Unfortunately OpenCalais doesn’t have the rules to pull that type of entity out. It does find on the people listed in an event, but that’s it.
However, I did fine it very useful to find more people information for HTC. A new group in Charlottesville called FirstWednesdays has started, and their “Find Me” page in the comments has a wealth of data. I am just splitting up the DOM on each comment, and feeding each one to OpenCalais and getting back a person and a couple of relevant links. It’s working great.
So between SemanticProxy for pages of raw content and using OpenCalais for specific chunks of text I expect to simplify the process of adding new data to HighTechCville.
Add HTC to your Firefox list of search engines
Find your self search HighTechCville often? Now you can add HTC to your list of search engines in Firefox through the magic of the OpenSearch.org API. Just browse to http://www.hightechcville.com using Firefox and click the small blue arrow by your search bar and choose “Add HTCFind”:

Add a shortcut tag like “htc” and then type into your location bar “htc: Eric Pugh” to find me!
People and Org Listing pages should be faster..
I’ve gone down the slippery path of caching the listing pages for people and organizations, so they should render much faster (well, except for the first person to hit them
).
I’ve also tuned a bit when we include the simile timeline javascript, as that always seemed to load slowly. It now only comes up on pages that need it like the browse recent events.
If you see out of date information, or any other oddness. Please drop me a line!
Now you can download contact information!
I’m rolling out the ability to download people and organizations contact information into your address book, wether that is AddressBook.app on the Mac, or Outlook on the PC. Through the magic of a really neat Technorati service the hCard information that is microformatted for a person or an organization is converted to the standard vCard format used by most address book applications.
Also, I’ve taken advantage of the Yahoo Geocoder service so that when we look up a lat and lng for mapping an organization based on a free form address, we also now parse out the street, zip, state, and country and store them! Hopefully this will lead to cleaner information!
HTC, now in a faster version!
One of the bits of feedback I got doing the Neon Guild presentation a couple weeks ago is that the site was kinda slow, and the uptime rate was pretty bad! I’ll blame this on it being mostly a research project, but now that I’ve shared it with the Guild, I realized I better look into this.
A couple of changes, from big to little have been made:
- Reducing # of SQL queries to generate a page. Used to be that the common pages would require up to a couple hundread SQL queries to get all the data, now it’s a handful.
- Caching the Blog section on the homepage. The RSS feed for this blog was pulled into the homepage every time someone visited. This obviously was inefficient, and added to how slow the site was. I am now caching the content, and using the ETag header from the RSS feed to see if I need to update content. By the way, a lot of credit for making this visible goes to NewRelics Rails Performance Monitor tool.
- Background jobs are now more “backgroundy” and shouldn’t take up so many resources. They are also running more reliable, and I am able to monitor them through the Job Log interface. You too can monitor them if you join the site!
- And a little change, on an organization page we would query Yahoo for the GPS coordinates of Charlottesville every time. I realized that since Charlottesville isn’t likely to moving, barring a major Ike or Katrina hurricane, that I could probably hard code the coordinates to 38.032125, -78.477519.