HighTechCville

Just another WordPress.com weblog

LinkedIn terrified of OpenCalais?

with 4 comments

Based on some basic testing it seems like LinkedIn blocks any traffic from the SemanticProxy.com site. I wanted to see what my page http://www.linkedin.com/in/epugh would render when fed through the OpenCalais entity extraction engine, and instead I get back connection errors.

This actually makes a lot of sense. LinkedIn’s public profile pages are there to be indexed by search engines, drive more traffic to their site, but to convince users to join their walled network. So they put some information out there. But, to make sense of the data, you need to join the network so you can do queries, and see the underlying meaning behind the text.

But, with tools like OpenCalais proliferating, this allows other folks to add meaning to these profile pages, and reduces the need to join the walled garden.

For my part, a couple lines of Ruby code and this is what I extracted (type, value, relevancy):

Organization: Apache Software Foundation, 0.268
City: Charlottesville, 0.286
Technology: Information Technology, 0.344
Position: Services Consultant , 0.302
Technology: Apache, 0.268
Person: Eric Pugh, 0.845
Company: LinkedIn Corporation, 0.724

What OpenCalais missed that I would have linked was the My Interests which would have maybe returned some industry terms such as agile practices, ruby on rails, open source, unit testing, scrum, selenium and the websites listed.

Written by Eric

February 25, 2009 at 10:11 am

4 Responses

Subscribe to comments with RSS.

  1. Eric:

    Tom Tague from Calais / Semanticproxy.com here.

    It might be intentional blocking – or it might be that the page has some non-standard HTML that somehow deeply confuses semanticproxy.com. I noticed it handles the summary page fine – but not the full profile. Sometimes things confuse us – we’ll take a look.

    All technology aside – you do raise an interesting point. It will be very interesting to see how various walled gardens deal with the onset of tools like Semanticproxy that free the information inside them for wider consumption and sharing.

    We’re treading on the safe side of the equation and following robots rules and all that. But – we’re the good guys. There are plenty of people ready to harvest this type of information using any tools at hand.

    Maybe the gardens should open the gate just a bit?

    Thomas Tague

    February 25, 2009 at 4:22 pm

  2. Eric:

    Brief follow up.

    I processed the page with Gnosis (http://bit.ly/8ICuy) our Firefox plugin. And it did… not so great. Calais is really designed to deal with unstructured textual prose – and a formatted page like the LinkedIn profile doesn’t give us a lot to work with. That being said – teaching the system to understand the structural elements (ala Dapper) of the top 1,000 sites or so would not be that big a deal.

    Thomas Tague

    February 25, 2009 at 4:30 pm

  3. Tom,

    I agree with opening the gate up a bit, since they will always have a better handle on the structure of the data since they built the structure then a tool like Calais will…

    I do owe another post about LinkedIn. I noticed that that they did recently add hCard formatting to their profile pages, which makes reusing that content much simpler, and is a nice sign of their playing will with open standards.

    I think the big value in Calais is the handling unstructured data. I can write a simple parser for structured data, well, assuming we aren’t looking at 1000’s of sites. And Dapper is an interesting approach at the same issue.

    Calais has really enahanced the data in HTC, and the next version which will be live later this week will reflect that. Now if only Calais could pull out event information like “Bob Smith speaking Monday 2/25/09 on Semantic Web”!

    Eric

    February 25, 2009 at 4:43 pm

  4. Eric:

    You might want to experiment a bit with the GenericRelations extraction parameter – but be prepared for a flood of very general metadata. This exposes general relationships betwwen a known entity type (for example person) and … whatever.

    Regards,

    Thomas Tague

    February 25, 2009 at 5:53 pm


Leave a Reply