HighTechCville

Just another WordPress.com weblog

I expected this… But just not this soon…

with one comment

So I’ve been expecting to run into the limitations of a RDBMS for a while. For example, in my models a Person and an Organization only have one image related for each one. But I’ve recently run into a person, Charles Knight and an organization Phthisis Diagnostics which each turn out to have two accounts on the YouNoodle site with different logo images. Therefore the YouNoodle indexer was constantly reporting that a change had been made “1 person updated. 1 organization updated.” because the images had differing urls! For Phthisis Diagnostics both logos where the same image, so I started putting in a detector to see if there was a difference, and then to use the differing image. But Charles actually has two different images! So now I am stuck with logic that says “only populate the image if it isn’t already set”, instead of grabbing that extra bit of data because I only have a single column to put an image in!

What’s interesting is that I think what really has happened is that Charles and PHthisis Diagnostics both ran into issues setting up their accounts with YouNoodle, and probably didn’t mean to have two accounts! HighTechCville as a data integrity checker!

I know I could have a link to a table of images, the same way that I do for tags and links. However then soon every column in my RDBMS turns into its’ own table! So it may be time to start looking at some unstructured data sources…. However, I like SQL… And I hopefully don’t lose all the power of SQL when I move to a unstructured datastore….

Any suggestions on Rails friendly NoSQL solutions?

Eric

Advertisement

Written by Eric

January 26, 2010 at 4:20 pm

Posted in Uncategorized

First iPhone User spotted!

leave a comment »

The good news is that the first iPhone user of HTC has been spotted. The bad news is that they stumbled quite quickly over a page that didn’t properly support the iPhone, so they got an error message while attempting to look up a profile. 😦

I’ve looked at the error messages, and have fixed those pages, mostly involving looking at tags, to show up properly!

Written by Eric

January 7, 2010 at 3:40 pm

Posted in Uncategorized

Tagged with

Removing Profiles from HTC Fixed

leave a comment »

Hi all,

Just wanted to update folks that if you had previously asked for a profile to be removed, and it didn’t disappear, the caching bug was fixed. So those profiles are removed. Also, I’ve tested the removal process, and it is working smoothly now. Of course, I am always sorry to lose someone!

I am also starting to think about how to programmatically remove someone if they no longer live in Charlottesville. Any ideas? One thought was to check the data I index against the original sources, and if it has changed, and is no longer available, then remove it… Remove everything about someone, would indicate to remove them from HTC as no longer living here….

Written by Eric

January 6, 2010 at 12:27 pm

Posted in Uncategorized

Google now supports microformats and RDFa

leave a comment »

You know you are getting traction when the Big Guy joins in… I’ll be keeping a eye on when Google indexes HTC again and see if the results start picking up the hCard data listed for each person and organization.

Eric

Written by Eric

May 13, 2009 at 9:03 am

Posted in Uncategorized

Where in the World is HighTechCville?

leave a comment »

Written by Eric

March 4, 2009 at 1:17 pm

Posted in Uncategorized

Tagged with

How to describe HTC?

leave a comment »

Here is what I am currently saying:

HighTechCville is a website that attempts to aggregate in a single website information about people, organizations related to “high tech” industries in the Charlottesville, Virginia geographical area. HighTechCville attempts to facilitate the connections between people and organizations to support the growth of technology related businesses and therefore the quality of life in Charlottesville.

HighTechCville is also a research project focusing on enabling the 3.0 Web. Where Web 1.0 was a read only environment with people visiting sites for information, and Web 2.0 was the Read/Write web, with people blogging, instant messenger, Facebook, HighTechCville attempts to investigate if Web 3.0 is the Read/Write/Share web. Where data from multiple sources is easily combined and reused in new unforeseen ways.

So what do you think? Help me improve my elevator pitch!

Written by Eric

February 26, 2009 at 10:20 am

Posted in Uncategorized

HTC moving Servers

leave a comment »

Folks,

Just a heads up that we are moving http://www.hightechcville.com from one server to another, and revamping some underpinnings, so the site may be down for most of the week since this is just a side project!

Eric

Written by Eric

February 25, 2009 at 4:43 pm

Posted in Uncategorized

LinkedIn terrified of OpenCalais?

with 4 comments

Based on some basic testing it seems like LinkedIn blocks any traffic from the SemanticProxy.com site. I wanted to see what my page http://www.linkedin.com/in/epugh would render when fed through the OpenCalais entity extraction engine, and instead I get back connection errors.

This actually makes a lot of sense. LinkedIn’s public profile pages are there to be indexed by search engines, drive more traffic to their site, but to convince users to join their walled network. So they put some information out there. But, to make sense of the data, you need to join the network so you can do queries, and see the underlying meaning behind the text.

But, with tools like OpenCalais proliferating, this allows other folks to add meaning to these profile pages, and reduces the need to join the walled garden.

For my part, a couple lines of Ruby code and this is what I extracted (type, value, relevancy):

Organization: Apache Software Foundation, 0.268
City: Charlottesville, 0.286
Technology: Information Technology, 0.344
Position: Services Consultant , 0.302
Technology: Apache, 0.268
Person: Eric Pugh, 0.845
Company: LinkedIn Corporation, 0.724

What OpenCalais missed that I would have linked was the My Interests which would have maybe returned some industry terms such as agile practices, ruby on rails, open source, unit testing, scrum, selenium and the websites listed.

Written by Eric

February 25, 2009 at 10:11 am

Entity Extraction of URL’s made easy…. Partly.

leave a comment »

Thanks to Ed Summers at the Library of Congress for his post on SemanticProxy. Semantic proxy offers a dead simple API for feeding URL’s to the OpenCalais entity extraction engine.

For those of you not familar with OpenCalais, it is a “rules” based Entity Extraction engine that knows how to find in free form text certain bits of information, like the name of a person. OpenCalais is sponsored by Thomson Reuters, so most of the rules are based around the text you would find in a newspaper. Like “GM is in talks with Chrysler for a merger” would give you the companies GM and Chrysler, as well as relationship called “merger_talks” between the two.

I was hoping to use OpenCalais to extract place, time, and subject information for all those free form event announcements. Unfortunately OpenCalais doesn’t have the rules to pull that type of entity out. It does find on the people listed in an event, but that’s it.

However, I did fine it very useful to find more people information for HTC. A new group in Charlottesville called FirstWednesdays has started, and their “Find Me” page in the comments has a wealth of data. I am just splitting up the DOM on each comment, and feeding each one to OpenCalais and getting back a person and a couple of relevant links. It’s working great.

So between SemanticProxy for pages of raw content and using OpenCalais for specific chunks of text I expect to simplify the process of adding new data to HighTechCville.

Written by Eric

February 24, 2009 at 5:48 pm

Posted in Uncategorized

Add HTC to your Firefox list of search engines

leave a comment »

Find your self search HighTechCville often? Now you can add HTC to your list of search engines in Firefox through the magic of the OpenSearch.org API. Just browse to http://www.hightechcville.com using Firefox and click the small blue arrow by your search bar and choose “Add HTCFind”:

open_search

Add a shortcut tag like “htc” and then type into your location bar “htc: Eric Pugh” to find me!

Written by Eric

February 4, 2009 at 8:07 pm

Posted in Uncategorized