As my blog is an inconsistent space, I need some works to keep it inconsistent. The today's topic is electronic music and the review of the latest release from Dave Clarke. Dave Clarke Presents Remixes & Rarities 1992 - 2005 is released under the mythical label Music Man as two CDs (real CD without DRM…). It's a compilation including the major remixes made by Dave Clarke until today. When I saw it in the shelf of the music store, It remembered me some electronic music parties where a DJ was always playing The Storm (one of the best track from Dave Clarke part of the mythical red serie) at a specific time. Going back to the compilation, it's an eclectic collection of remixes. It's somewhat reflecting the work of Dave Clarke and so including the different level of quality in its past and current work. I don't really want to talk about the tracks I don't like… but there are two tunes on the compilation that justify it's purchasing. The remix of the EBM-style track of Douglas McCarthy and Terence Fixmer : You Want It. It's really a great remix keeping the original atmosphere of the track with a nice touch of "Dave Clarke" rhythm. The other one is Lie To Me, the original track made by Slam (founder of Soma Records). The original lyric and vocal were already incredible but the never starting "bass" is very nice and give a new smooth and deep atmosphere. The quality of the other remixes (or unreleased tracks) is quite variable… but the compilation is quite good and give a perspective of the overall work of Dave Clarke. A nice new year gift for the electronic music fans.
During our spare time, we (Michael and I) are playing a little bit with the potential "social" web (in other words, we are just trying to extract some useful information from a bunch of bloody web pages). In that scope, we have to collect, mangle, analyze and evaluate a lot of web pages. During the process of evaluation, we could think of something new but we may forget to collect important data when crawling the urls.
We discussed the possibility to write and small lightweight framework that could operate partially like the big processing framework (e.g. : MapReduce or Hadoop). Those frameworks often operate in the same way (as a lot of operation can be expressed in that way) by splitting a large dataset in small datasets across multiple node. A map function is given to process the small datasets with user-specified operations. Afterwards the datasets are reduced and compiled to provide an uniform result. Such kind of framework is composed of multiple elements like a job scheduler systems (to dispatch the task across the nodes), a distributed file system (to efficiently distribute the datasets and also… write the results), a communication process system across the nodes… So developing such framework is very complex and could be time consuming.
In that scope, I was looking for a way to restart efficiently my crawling and processing process using a very simple process. I made a very quick-and-dirty(tm) prototype to do that job. So it's a 2 hours experiment but with the idea behind to build a potential lightweight framework… Here is the basic steps in :
I made a basic interface around GNU screen (the excellent terminal multiplexer) to handle the -X option via a simple Python class(svn). Like that the job will be managed inside a single screen session.
The second part is the webcrawler (the tasks) that will collect the urls. The webcrawler is a very simple HTTP fetcher but including the ability to retrieve the url from an url list at a specific offset. The webcrawler(svn) can be called like that :
python crawl.py -s ./urlstore2/ -f all-url.txt.100000.sampled -b 300 -e 400
Where -b is the beginning of the url to fetch and -e is the last url to fetch from the url file specified with -f.
The last part is the master managing the tasks (the multiple crawl.py to be called) :
cmd = './crawl.py'
p_urllist = ' -f '
urllist = "all-url.txt.100000.sampled"
p_urlstore = ' -s '
p_start = ' -b '
p_end = ' -e '
p_dacid = ' -i '
urlstore = "./urlstore2"
tasknum = 25
jobname = 'crawlersample'
lineNum = numLine (urllist)
job = TGS.TermGnuScreen(jobname)
jobid = 0
for x in range (0, lineNum, lineNum/tasknum):
end = x+(lineNum/tasknum)
dacid = jobname+",slave1,"+str(jobid)
if end > lineNum:
end = lineNum
cmdline = cmd + p_urllist + urllist + p_urlstore + urlstore + p_start + str(x+1) + p_end + str(end) + p_dacid + dacid
job.addWindow("jobname"+str(jobid), cmdline)
jobid = jobid + 1
So if you run the master.py(svn), the tasks will be launched inside a single screen session. and you can use screen to see the evolution of each tasks.
dacid:crawlersample,slave1,24 2956 of 3999
That means that the task 24 of job crawlersample has already retrieved 2956 urls on 3999. In this experiment, we are quite lucky as we are using as simple hash-based store the crawled. As the urls are unique per task, there is no issue to use the same local repository.
It's clear that experiment could be written using multi-processes or multi-threading program, but the purpose is to be able to managed the distribution of the tasks across different nodes. Another advantage of a multiple tasks approach, we can stop a specific task without impacting the rest of the job and the other tasks.
It was just an experiment… I don't know if we could come at the end of the exercise with a more useful framework ;-) But it was useful, I discovered that GNU Screen has a hard coded limit for the number of windows per session (maximum 40 windows). You can easily change that in the config.h and recompile it.
I remembered a discussion I had some years ago about the similarities between the free software movement and the electronic music movement. I was feeling quite alone at that time as I was the only one thinking that the sharing principle behind the electronic music and free software creation is quite the same (e.g. sharing samples, audio works ). I kept back the idea in my mind but without thinking about it too much…
While walking in a bookshop, I discovered the following book : Digital Magma : De l'utopie des rave parties à la génération iPod made by Jean-Yves Leloup (sorry in French). It's a very nice and concise book about electronic music and its evolution in the society. My main surprise is the book is clearly explaining the parallelism between the two : the free software world and the electronic music world. There are some good references to other classical books about electronic music or free software (e.g. : like the famous book from Pekka Himanen). A nice, easy and pleasant reading if you are interested by the subject. I'll try to find back the people in the past discussion and send them the reference of this book ;-)
Some days ago, the RFC 4772 (Security Implications of Using the Data Encryption Standard) was published by the IETF. It covers the security implication of using DES and why you must avoid its use in the modern information society. The RFC is very complete and covering all the security aspect of DES including the "new" method to a make an exhaustive search using a botnet1. The RFC is a nice reading and introduction to the issues around DES and (some) block ciphers. I still know a lot of companies, individual relying on DES for legacy Virtual Private Network, file system encryption or alike. They are often keeping its use only for backward compatibility with existing or deprecated software/hardware. I really like the conclusion of the RFC :
With respect to the third reason (ignorance), this note attempts to address this, and we should continue to make every effort to get the word out. DES is no longer secure for most uses, and it requires significant security expertise to evaluate those small number of cases in which it might be acceptable. Technologies exist that put DES-cracking capability within reach of a modestly financed or modestly skilled motivated attacker. There are stronger, cheaper, faster encryption algorithms available. It is time to move on.
So guys, it's really time to move on… if not your attacker will buy a copacobana system (the new customizable EFF-like hardware code breaker) or use its botnet infrastructure to discover your small symmetric key.
Today I replaced my last Antec Power Supply Unit from the notorious serie : SL350P. I now have replaced all the old SL350P by SP-350P. I broke three SL350P in less than 1 year mainly mainly due to thunder quite frequent in the area where I'm living. The strange part is that the powersupply was still giving the correct output voltage in VSB (+5V). VSB is the standby voltage to keep the minimal voltage on the mainboard (e.g. the wake-on-lan required that to power-on the board). For the rest, the outputs were quite chaotic and not standard output like +3.3V, +/-5V or +12V on the 24pin connector. From my experience, it's not a good idea to try fixing up the power supplies… and it's often better (mainly for safety) to order new one ;-)
All the story reminds me the paper from Google : High-efficiency power supplies for home computers and servers about the current inefficiency of the PSU and that we should have a single output voltage 12V from the power supply. I'll vote for it…
Footnotes:
1. Using all the vulnerable information systems and resources to build a network of compromised system that will be used for the sole purpose of the attacker. It costs (until now) less to build a software worm to infect a bunch of system than building a dedicated hardware to crack a symmetric cryptosystem.