2008-01-01 Happy New Year

If you signed any OpenPGP public keys during late or early in this morning today… It's maybe the time to revoke the signature. A small reminder (if you are an user of GnuPG) on how to revoke a signature. You cannot remove a signature you made, especially when the signed public key has been already uploaded to a public key server. But you can revoke a signature by using the --edit-key option in GnuPG with the command revsig. GnuPG will prompt you for the signature generated by the private key(s) found in your local keyring. If you were drunk during the last key signing party, you still have some options.

By the way, Happy New Year.

Tags openpgp fun gnupg xkcd

2008-01-03 Google Books And Public Domain

Following my past blog entry : Google Books Killing Public Domain made in late 2006, I found an interesting publication of Luc Vincent from Google presented at ICDAR 2007 called Google Book Search: Document Understanding on a Massive Scale.

The document is covering all the challenges encountered when doing OCR and how to analyse and understand the results of the documents scanned. That's a very difficult topic including small little things like page ordering or chapter detection. The publication also introduces the ongoing work with the OCR software engine released as free software called Tesseract OCR and the OCROpus framework also available as free software. It's still very beta software but that's nice to see Google releasing some parts of their software.

Beside all the positive there is a small negative point :

We believe we can help by making some large chunks of our (out of copyright) data available to the Document Analysis research community.

This part reminds me of my old blog entry about the public domain books scanned by Google and becoming again proprietary work… Why don't they release all the public domain datasets to make them available to everyone without the current restrictive license ? That would be easier and could provide some more interesting (scientific or not) results just like the datasets available from Wikipedia.

Tags: google publicdomain archiving copyright

2008-01-06 RDF and Free Tagging

RDF is providing a method to make a specific statement about a web resource. To take a very simple "triple" example out of my foaf file :

<http://www.foo.be/foaf.rdf#me> <http://xmlns.com/foaf/0.1/nick> "adulau"

Meaning roughly :

<http://www.foo.be/foaf.rdf#me> has a foaf nickname with a value "adulau"

Here, the subject is the "http://www.foo.be/foaf.rdf#me" with the property (in RDF terminology it's a predicate) "foaf nickname" and the value (in RDF terminology, it's the object) is "adulau". Don't ask me why each standard is often using a different term for naming similar things… but that's the today's topic. In free tagging, that's often the issue as people use different ways to tag something similar. If we take the previous example with a free tagging approach (like used in a social bookmarking system del.icio.us), we could have the following :

http://www.foo.be/foaf.rdf#me adulau
http://www.foo.be/foaf.rdf#me alex
http://www.foo.be/foaf.rdf#me udalau
http://www.foo.be/foaf.rdf#me nickname:adulau
http://www.foo.be/foaf.rdf#me alexdulaunoy
http://www.foo.be/foaf.rdf#me foaf:nickname=adulau

First of all, I make a small assumption that everyone tagged the same subject/URI but that's not always the case (just imagine URI with the domain name only or without the anchor). We can see that every user has his own way to classify the URI. Some are just using the nickname as a tag, some are using another nickname, some have misspelled it or some are using a machine tag (sometime called the poor mans RDF).

At a first glance, RDF looks cleaner so why is everyone using tagging and not directly RDF ? I think that because free tagging keep the user free to choose is own classification and how he perceived the world around him. I know that I could be stoned by saying that as RDF can be used with any kind of name space but you must define it before using it… That's why free tagging is simple to use (you don't have to look at convention before using it) and simple to implement (the parsing of tags is minimal compared to a well formed XML document).

My view is the following between the RDF world and the free tagging world. Those two worlds must live together and trying to benefit from each other. I really think that RDF has really an important role for exchanging description of resources between machine(s) (as long as the service providers are providing open interfaces between their services). But free tagging helps to ease the interface between humans and machines. Another advantage of keeping the user free to use his own classification, we could discover more about us (human) with a free classification scheme than a predefined limited scheme. It will just render the system to analyze the information a little bit more complex while keeping the interface very simple. That's just my Sunday's point of view… and now feel free to stone me ;-)

Tags: tagging rdf semweb semanticweb tag machinetag

geo: Les Bulles, Chiny

2008-01-12 Search Engine Startup in Europe

Surprised by the recent acquisition of FAST by Microsoft, I'm wondering where is the famous pan-European project called Quaero. After the initial project initialization made by France, Germany left and created their own search engine project called Theseus. I agree with the comment from Marc Andreessen where "investing" is more burning money in existing large companies taking the lead in the project. Progress in search engine won't come from investing in existing large company structure… but from small structure not really able to participate in large EU-funded project and could grab small part of the investment.

Where is the solution to boost search engine in Europe ? Instead of giving money to large group, EU could make a public call for investing the 100 Millions of Euro in 100 start-up in Europe working in information retrieval, search and information classification. The public call should be "paper free" (one electronic proposal submission) with one simple evaluation (to avoid the burden due to project management) : a product released in the first 2 years. That would give more potential "innovation" by distributing the risks and increasing investment in small structure more likely to create something new in the search engine area. Hey EC, it's time to take risks ?

Tags : innovation startup work searchengine start-up search google quaero theseus europe FP6

2008-01-20 Patent Auction and Semantic Web

We are living in a crazy world at least I have one proof collected from the semantic-web mailing list. Someone is making an auction on some patents he filled around 1997 about some concepts of the semantic web. I don't want to dig into those patents and the potential prior art. I was just surprise that the business of patent trolling is going one step forward by proposing an auction platform to sell "intellectual capital" (as stated on the Ocean Tomo website.) and creating fear about a potential bidder having bad intentions. But if you want to join the auction, you have to pay a "small" fee to be a bidder. But maybe everything is just a bad joke (yes the gala dinner will be the 1st April ;-) but this looks so real…

Tags: innovation patent patenttroll semweb

Page Collection for ^2008-01

2008-01-01 Happy New Year

2008-01-03 Google Books And Public Domain

2008-01-06 RDF and Free Tagging

2008-01-12 Search Engine Startup in Europe

2008-01-20 Patent Auction and Semantic Web