AdulauWikiDiary: Malware Database

Scope of the database

Scope of the database

Malware definition ? Any programs (source code or binaries) that could be used or triggered for an unknown usage or known usage often malicious.

When playing with honeynets or compromised systems, you are starting to get a nice collection of malware, malicious scripts and rootkits. The bigger the collection is, the more difficult it is to find past malware. The idea to solve this problem is to create a database containing a fingerprint of all the malware and the malware itself. With this we can be able to correlate information regarding malware and malicious activities in a system. The proposal plans to go a little bit deeper than only the database approach for two reasons :

Using a simple free tagging approach (OK implemented),
- with some known prefix like CVEXXXX or RFCXXXX than can be interpreted when needed
Using an additional table to cross-references part of files (To be implemented),

http://en.wikipedia.org/wiki/Malware

Database structure

v0.0.1 of the structure

The fingerprint used is the SHA2 of the files.

CREATE TABLE malware (
    fingerprint   TEXT PRIMARY KEY,
    states        INTEGER,
    origfilename  TEXT,
    description   TEXT,
    filemagic     TEXT,
    mimemagic     TEXT,
    datesubmit    DATE,
    source        TEXT
);

CREATE TABLE tag (
    fingerprint   TEXT,
    tag           TEXT
);

Planned addition in another database :

CREATE TABLE container (
    sourcefinger TEXT,
    containfinger TEXT,
    states INTEGER
);

CREATE TABLE comment (
    commentid INTEGER,
    datecreation DATE,
    fingerprint TEXT,
    comment TEXT
);

States can be used to reference state of the malware (existing, false positive) but also on the container side (part of, integrated in binary, …). State should be defined as soon as possible for the basis.

Designing an efficient structure is far from being easy. Creating unique id based on submission will limit the possibility of decentralized database (for example merging database can be difficult as often ids are linear in a database). We would like to focus on hashing id per binary. A hash value got the problem of being well distributed and random and for some database this can be an issue. On the other side, we can easily calculate the id (hash) when having to lookup a specific binary.

Tagging Approach

We use a free tagging approach but some tag are interpreted by default. Like CVE:NNNNN or RFCXXXXX and alike. An object can have no or multiple tags. There is not limit.

Software

(console) add.pl <filename> The interface to add malware in the database. Don't forget that the filename will be recorded in the database (including the full path, sometimes required from compromised systems).
(console) add-tag.pl <fingerprint> <tag> Add a free tag to a known fingerprint.
(web) index.pl The simple cgi interface to view the data of the malware database. There is a special interface for "admin" where users can update information related to malware.
(web) genrss.pl The simple web application generating the xml rss format 1.0 of the latest malware added.

Demo

(v0.0.1) http://www.csrrt.org/maldb/index.pl
- http://www.csrrt.org/maldb/index.pl?action=info&id=50f3d00e4cd6992d1ca8e13ee8b428b11765fee875abc9960e95354ddc828f05

Config File

<global>
        db = /home/adulau/malware-db/db/malware
        db-files = /home/adulau/malware-db/db-files/
        rss-path = /usr/local/apache/csrrt/ml/rss/ 
        url_repo = http://www.csrrt.org/maldb-files/
        url = http://www.csrrt.org/maldb/index.pl
        version = 0.0.2
</global>

DNS Query to the malware database

You can query the database to check for the existence of a malware or not in the database. The purpose is to use a common and easy way to get the information. Using DNS query is a very common way to check or get information, RBL works like that to get black list of IPs from spammers.

A lot of application could benefit of checking hash against a malware database. If a hash is matching something in the database, this could give information regarding the security "state" of a file.

The malware DNS server is faking a DNS server but only answer to "TXT" query. If the record exists, the server is replying with the "origfilename" as a TXT record with a NOERROR status. For all other queries, the NXDOMAIN status code is returned. In a DNS request, the label can't exceed the size of 63 bytes. In order to avoid the size limitation (the SHA-2 in hex format is bigger), the request has to be split using a subdomain. The query can be splitted anywhere you want, the server rebuild the full hash by default (check the example below). The server is always sending AA (Authoritative Answer).

An example query using a SHA-2 hash :

dig -t TXT 3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org  @127.0.0.1

; <<>> DiG 9.2.5 <<>> -t TXT 3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org @127.0.0.1
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34721
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org. IN TXT

;; ANSWER SECTION:
3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org. 3600 IN TXT "ffd05c84ee0803cc49423a2d45f3964d"

;; Query time: 2 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Feb 21 15:19:58 2006
;; MSG SIZE  rcvd: 150

possible application/use of the DNS malware interface

LIDS/Integrity file checker
SMTP/HTTP proxy
…

XML-RPC add malware

MalwareDB?.tquery (tag) - return array of malware hash (SHA256)
MalwareDB?.dquery (last, bdate, ldate) - return array of malware hash (SHA256)
MalwareDB?.add (malkey, filename, content in base64) - return '200' string when success or return 500 string when error occured.
MalwareDB?.get (malkey, hash) - return base64 encoded of the malware found - or return 404 string when not found.
MalwareDB?.tag (malkey, hash, tag) - return '200' string when success or return 500 string when error occured.

malkey

is a string containing an unique-id

tag

is a string containing one tag

History

26/02/2006

Fixed the issues in the RBL-like DNS server for the NXDOMAIN error code
maldb.csrrt.org is now delegated on Internet - you can query it where you want
- zone available sha1.maldb.csrrt.org - query for SHA-256 id
- other zones planned (like origfilename to get back the initial nepenthes id or any other filename)
Added a basic graphingstatistic page - http:/www.csrrt.org/maldb/index.pl?action=stats
Fixed a bug in the SQL query for an 'IN' statement where querying a specific tag
Fixed the table rendering in the CSS

TODO

use File::HStore (DONE)
Basic RSS/RDF 1.0 export to get the latest entries (DONE) + FIX
Cleanup schema (DONE)
A list of predefined tag to be evaluated (IN PROGRESS)
DNS interface (a la RBL) to query the database (DONE as an ugly interface - TO BE CLEANED UP)
An official release of version 0.0.1
Make a PM of the function
Add an authenticated mode for admin (adding tag, comment, description…) - (API key?)
Adding an XML-RPC interface + client to add malware in the database
Ideas ? feel free.

MalwareDatabase

Contents