2006-09-03 Extending GooDiff

Some months ago, we (Michael Noll and I) started a project to monitor legal documents of the services provider on Internet called GooDiff 1. As more and more web services are available on Internet, we (the users) are not often (never?) warmed about changes in the terms of contracts and agreements. GooDiff is a basic services to monitor specific documents (often legal or contract documents) from providers like Yahoo!, del.icio.us or Google. The service is using a customized version of Trac with a simple subversion backend.

We discussed about possible extension to GooDiff and found out that making it more "social" is an interesting feature. What do we mean by more "social" ? We would like to make a kind of annotation or comment base interface to the legal document stored in GooDiff. The idea is to provide a basic unified interface using a classical wiki engine like MediaWiki. We don't want to reinvent the wheel and we want just to focus on the issues (already too much) to monitor "unstable" document. I just started goomirror to monitor, gather and store the raw files of the monitored services. The idea behind is to use a basic gateway to publish the information to the wiki where people could be able to comment or annotate the document. There are already some nifty Perl modules to access MediaWiki via a simple API. I already made a test (rssfromAPage) for generating an RSS from a list in a specific MediaWiki? page for the hack.lu 2006 website. Using the same approach, it seems to not be difficult to gateway the goomirror content back into a community wiki. MediaWiki is maybe not the best choice (the RSS support compared to oddmuse is too minimal) but we'll test and see what's the best fit.

2006-09-06 Google Books Killing Public Domain

I was very happy to see Google books to propose a lot of public domain scanned books. For example, Marion De Lorme, a Victor Hugo work is available in Google Book. But I was surprised about the statement before the beginning of the public domain work :

This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project
to make the world's books discoverable online.
It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject
to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books
are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover.
Marks, notations and other marginalia present in the original volume will appear in this file - a reminder of this book's long journey from the
publisher to a library and finally to you.
Usage guidelines
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we have taken steps to
prevent abuse by commercial parties, including placing technical restrictions on automated querying.
We also ask that you:
+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for
personal, non-commercial purposes.
+ Refrain from automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the
use of public domain materials for these purposes and may be able to help.
+ Maintain attribution The Google "watermark" you see on each file is essential for informing people about this project and helping them find
additional materials through Google Book Search. Please do not remove it.
+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner
anywhere in the world. Copyright infringement liability can be quite severe.
About Google Book Search
Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web
at http://books.google.com/

Google clearly relicensed the public domain work under a kind of ugly non-commercial license. Of course, they are allowed to do that but this is a kind of non-sense. It's building a new era of proprietary information from works already paid by the community (this was the purpose of the author rights). It's clearly not for the benefit of the author (everybody know that Victor Hugo is a young author looking for money to continue his work ;-) or the community, it's clearly only for the benefit of the editor (Google only ?). Google is claiming that they have to do that because it's expensive to do the scanning… like it's expensive to do the crawling of Internet ? but at the end, they will use the public domain work to show their advertisement ? So the commercial restriction is clearly for the sole benefit of them. The only think I hope is that Google is not burning the book after having scanned. I'm sure that they are not doing that. They are just moving the public domain in a private collection where the benefit for the community is minimal. The citizen are already paying twice for author rights… (think about taxes on media and distribution) now we are paying a third time.

Update 22 April 2007 : I'm not anymore alone at least for governmental publication. It's an open letter for keeping the works from the US government in the public domain when being scanned for Google Books. I'm still very surprised that a lot of people understand the act of scanning of a public domain work to become a proprietary work. It looks like a kind of black magic.

Tags: google publicdomain archiving copyright

2006-09-24 CopiePresse Is Smoking Crack

Reading the blog entry le soir about the court case of CopiePresse against Google, I was very disapointed of the role of CopiePresse and how the classical editors still don't understand Internet. Any indexer (including Google but others too) is providing a service (I'm not discussing here the various search engine functionalities) to help people for searching information. It's a benefit for the authors of the content AND the readers, it's providing a better access to the works made by the authors. Search engines are providing a mean to better search and sometimes, classify the information. This is the next step after the initial step of printing (from monks to Gutenberg to digital information to organized digital information). CopyPresse is stopped around digital information localized on one personal computer without any network connectivity. I'm not very proud of being belgian after seeing that (a part of?) belgian press has still not understood Internet.

The approach used by CopiePresse to play the legal battle instead of simply using the robots exclusion standard is very dangerous. I don't think that playing the legal battel about digital information is a good idea. It will generate more boundaries to the distribution of information instead of promoting the way of distribution. So editors are not playing their role of editors in that specific case.

So it's maybe the time to build an RSS scrapper to download the daily full article from lesoir.be and store them on a publicly accessible (for educational purpose) server where any search engine (like Google, Google News, Yahoo!,…) could have access ? That could be a nice example that all the legal stuff made by CopyPress is full of non-sense.

del.icio.us it!

2006-09-24 Javascript Slideshow

I'm updating the website of the small village where I'm living. As I received a collection of pictures , I was wondering on how to display them in a nice way. A kind of dynamic slideshow with crossfading could do the job. An update of the crossfade redux is available and works quite nicely without a ton of tweaking in the CSS. I made a test with the latest photos of the frontage renovation in our house. Works well… I plan to use it for the new website of Les Bulles, the only thing I was wondering is the license of the javascript code for the fading. By default, the exclusive rights of the author apply. So it's not free software…

del.icio.us it!

Footnotes:

1. The name has nothing to do with Google. GooDiff? is composed of two parts : Goo and Diff. Diff, for the geek, it's easy to understand, it's from the diff utility in order to view difference between files. Goo is for a hypothetical end-of-the-world event involving an uncontrolled molecular nanotechnology taking all the earth resources. The book Blood Music from Greg Bear is a nice novel about a grey goo hypothesis. That makes sense regarding the undefined legal blog ;-)

Page Collection for ^2006-09

2006-09-03 Extending GooDiff

2006-09-06 Google Books Killing Public Domain

2006-09-24 CopiePresse Is Smoking Crack

2006-09-24 Javascript Slideshow