Recent Events for MainPageDiary (Blog)


Scope of the database

Malware definition ? Any programs (source code or binaries) that could be used or triggered for an unknown usage or known usage often malicious.

When playing with honeynets or compromised systems, you are starting to get a nice collection of malware, malicious scripts and rootkits. The bigger the collection is, the more difficult it is to find past malware. The idea to solve this problem is to create a database containing a fingerprint of all the malware and the malware itself. With this we can be able to correlate information regarding malware and malicious activities in a system. The proposal plans to go a little bit deeper than only the database approach for two reasons :

Database structure

v0.0.1 of the structure

The fingerprint used is the SHA2 of the files.

CREATE TABLE malware (
    fingerprint   TEXT PRIMARY KEY,
    states        INTEGER,
    origfilename  TEXT,
    description   TEXT,
    filemagic     TEXT,
    mimemagic     TEXT,
    datesubmit    DATE,
    source        TEXT

    fingerprint   TEXT,
    tag           TEXT

Planned addition in another database :

CREATE TABLE container (
    sourcefinger TEXT,
    containfinger TEXT,
    states INTEGER

CREATE TABLE comment (
    commentid INTEGER,
    datecreation DATE,
    fingerprint TEXT,
    comment TEXT

States can be used to reference state of the malware (existing, false positive) but also on the container side (part of, integrated in binary, …). State should be defined as soon as possible for the basis.

Designing an efficient structure is far from being easy. Creating unique id based on submission will limit the possibility of decentralized database (for example merging database can be difficult as often ids are linear in a database). We would like to focus on hashing id per binary. A hash value got the problem of being well distributed and random and for some database this can be an issue. On the other side, we can easily calculate the id (hash) when having to lookup a specific binary.

Tagging Approach

We use a free tagging approach but some tag are interpreted by default. Like CVE:NNNNN or RFCXXXXX and alike. An object can have no or multiple tags. There is not limit.



Config File

        db = /home/adulau/malware-db/db/malware
        db-files = /home/adulau/malware-db/db-files/
        rss-path = /usr/local/apache/csrrt/ml/rss/ 
        url_repo =
        url =
        version = 0.0.2

DNS Query to the malware database

You can query the database to check for the existence of a malware or not in the database. The purpose is to use a common and easy way to get the information. Using DNS query is a very common way to check or get information, RBL works like that to get black list of IPs from spammers.

A lot of application could benefit of checking hash against a malware database. If a hash is matching something in the database, this could give information regarding the security "state" of a file.

The malware DNS server is faking a DNS server but only answer to "TXT" query. If the record exists, the server is replying with the "origfilename" as a TXT record with a NOERROR status. For all other queries, the NXDOMAIN status code is returned. In a DNS request, the label can't exceed the size of 63 bytes. In order to avoid the size limitation (the SHA-2 in hex format is bigger), the request has to be split using a subdomain. The query can be splitted anywhere you want, the server rebuild the full hash by default (check the example below). The server is always sending AA (Authoritative Answer).

An example query using a SHA-2 hash :

dig -t TXT  @

; <<>> DiG 9.2.5 <<>> -t TXT @
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34721
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0


;; ANSWER SECTION: 3600 IN TXT "ffd05c84ee0803cc49423a2d45f3964d"

;; Query time: 2 msec
;; WHEN: Tue Feb 21 15:19:58 2006
;; MSG SIZE  rcvd: 150

possible application/use of the DNS malware interface

XML-RPC add malware


is a string containing an unique-id


is a string containing one tag