I expected this… But just not this soon…
So I’ve been expecting to run into the limitations of a RDBMS for a while. For example, in my models a Person and an Organization only have one image related for each one. But I’ve recently run into a person, Charles Knight and an organization Phthisis Diagnostics which each turn out to have two accounts on the YouNoodle site with different logo images. Therefore the YouNoodle indexer was constantly reporting that a change had been made “1 person updated. 1 organization updated.
” because the images had differing urls! For Phthisis Diagnostics both logos where the same image, so I started putting in a detector to see if there was a difference, and then to use the differing image. But Charles actually has two different images! So now I am stuck with logic that says “only populate the image if it isn’t already set”, instead of grabbing that extra bit of data because I only have a single column to put an image in!
What’s interesting is that I think what really has happened is that Charles and PHthisis Diagnostics both ran into issues setting up their accounts with YouNoodle, and probably didn’t mean to have two accounts! HighTechCville as a data integrity checker!
I know I could have a link to a table of images, the same way that I do for tags and links. However then soon every column in my RDBMS turns into its’ own table! So it may be time to start looking at some unstructured data sources…. However, I like SQL… And I hopefully don’t lose all the power of SQL when I move to a unstructured datastore….
Any suggestions on Rails friendly NoSQL solutions?
Eric
First iPhone User spotted!
The good news is that the first iPhone user of HTC has been spotted. The bad news is that they stumbled quite quickly over a page that didn’t properly support the iPhone, so they got an error message while attempting to look up a profile. 😦
I’ve looked at the error messages, and have fixed those pages, mostly involving looking at tags, to show up properly!
Removing Profiles from HTC Fixed
Hi all,
Just wanted to update folks that if you had previously asked for a profile to be removed, and it didn’t disappear, the caching bug was fixed. So those profiles are removed. Also, I’ve tested the removal process, and it is working smoothly now. Of course, I am always sorry to lose someone!
I am also starting to think about how to programmatically remove someone if they no longer live in Charlottesville. Any ideas? One thought was to check the data I index against the original sources, and if it has changed, and is no longer available, then remove it… Remove everything about someone, would indicate to remove them from HTC as no longer living here….
Google now supports microformats and RDFa
You know you are getting traction when the Big Guy joins in… I’ll be keeping a eye on when Google indexes HTC again and see if the results start picking up the hCard data listed for each person and organization.
Eric
How to describe HTC?
Here is what I am currently saying:
HighTechCville is a website that attempts to aggregate in a single website information about people, organizations related to “high tech” industries in the Charlottesville, Virginia geographical area. HighTechCville attempts to facilitate the connections between people and organizations to support the growth of technology related businesses and therefore the quality of life in Charlottesville.
HighTechCville is also a research project focusing on enabling the 3.0 Web. Where Web 1.0 was a read only environment with people visiting sites for information, and Web 2.0 was the Read/Write web, with people blogging, instant messenger, Facebook, HighTechCville attempts to investigate if Web 3.0 is the Read/Write/Share web. Where data from multiple sources is easily combined and reused in new unforeseen ways.
So what do you think? Help me improve my elevator pitch!
HTC moving Servers
Folks,
Just a heads up that we are moving http://www.hightechcville.com from one server to another, and revamping some underpinnings, so the site may be down for most of the week since this is just a side project!
Eric
LinkedIn terrified of OpenCalais?
Based on some basic testing it seems like LinkedIn blocks any traffic from the SemanticProxy.com site. I wanted to see what my page http://www.linkedin.com/in/epugh would render when fed through the OpenCalais entity extraction engine, and instead I get back connection errors.
This actually makes a lot of sense. LinkedIn’s public profile pages are there to be indexed by search engines, drive more traffic to their site, but to convince users to join their walled network. So they put some information out there. But, to make sense of the data, you need to join the network so you can do queries, and see the underlying meaning behind the text.
But, with tools like OpenCalais proliferating, this allows other folks to add meaning to these profile pages, and reduces the need to join the walled garden.
For my part, a couple lines of Ruby code and this is what I extracted (type, value, relevancy):
Organization: Apache Software Foundation, 0.268
City: Charlottesville, 0.286
Technology: Information Technology, 0.344
Position: Services Consultant , 0.302
Technology: Apache, 0.268
Person: Eric Pugh, 0.845
Company: LinkedIn Corporation, 0.724
What OpenCalais missed that I would have linked was the My Interests which would have maybe returned some industry terms such as agile practices, ruby on rails, open source, unit testing, scrum, selenium
and the websites listed.
Entity Extraction of URL’s made easy…. Partly.
Thanks to Ed Summers at the Library of Congress for his post on SemanticProxy. Semantic proxy offers a dead simple API for feeding URL’s to the OpenCalais entity extraction engine.
For those of you not familar with OpenCalais, it is a “rules” based Entity Extraction engine that knows how to find in free form text certain bits of information, like the name of a person. OpenCalais is sponsored by Thomson Reuters, so most of the rules are based around the text you would find in a newspaper. Like “GM is in talks with Chrysler for a merger” would give you the companies GM and Chrysler, as well as relationship called “merger_talks” between the two.
I was hoping to use OpenCalais to extract place, time, and subject information for all those free form event announcements. Unfortunately OpenCalais doesn’t have the rules to pull that type of entity out. It does find on the people listed in an event, but that’s it.
However, I did fine it very useful to find more people information for HTC. A new group in Charlottesville called FirstWednesdays has started, and their “Find Me” page in the comments has a wealth of data. I am just splitting up the DOM on each comment, and feeding each one to OpenCalais and getting back a person and a couple of relevant links. It’s working great.
So between SemanticProxy for pages of raw content and using OpenCalais for specific chunks of text I expect to simplify the process of adding new data to HighTechCville.
Add HTC to your Firefox list of search engines
Find your self search HighTechCville often? Now you can add HTC to your list of search engines in Firefox through the magic of the OpenSearch.org API. Just browse to http://www.hightechcville.com using Firefox and click the small blue arrow by your search bar and choose “Add HTCFind”:
Add a shortcut tag like “htc” and then type into your location bar “htc: Eric Pugh” to find me!