Malware definition ? Any programs (source code or binaries) that could be used or triggered for an unknown usage or known usage often malicious.
When playing with honeynets or compromised systems, you are starting to get a nice collection of malware, malicious scripts and rootkits. The bigger the collection is, the more difficult it is to find past malware. The idea to solve this problem is to create a database containing a fingerprint of all the malware and the malware itself. With this we can be able to correlate information regarding malware and malicious activities in a system. The proposal plans to go a little bit deeper than only the database approach for two reasons :
v0.0.1 of the structure
The fingerprint used is the SHA2 of the files.
CREATE TABLE malware ( fingerprint TEXT PRIMARY KEY, states INTEGER, origfilename TEXT, description TEXT, filemagic TEXT, mimemagic TEXT, datesubmit DATE, source TEXT ); CREATE TABLE tag ( fingerprint TEXT, tag TEXT );
Planned addition in another database :
CREATE TABLE container ( sourcefinger TEXT, containfinger TEXT, states INTEGER ); CREATE TABLE comment ( commentid INTEGER, datecreation DATE, fingerprint TEXT, comment TEXT );
States can be used to reference state of the malware (existing, false positive) but also on the container side (part of, integrated in binary, …). State should be defined as soon as possible for the basis.
Designing an efficient structure is far from being easy. Creating unique id based on submission will limit the possibility of decentralized database (for example merging database can be difficult as often ids are linear in a database). We would like to focus on hashing id per binary. A hash value got the problem of being well distributed and random and for some database this can be an issue. On the other side, we can easily calculate the id (hash) when having to lookup a specific binary.
We use a free tagging approach but some tag are interpreted by default. Like CVE:NNNNN or RFCXXXXX and alike. An object can have no or multiple tags. There is not limit.
<global> db = /home/adulau/malware-db/db/malware db-files = /home/adulau/malware-db/db-files/ rss-path = /usr/local/apache/csrrt/ml/rss/ url_repo = http://www.csrrt.org/maldb-files/ url = http://www.csrrt.org/maldb/index.pl version = 0.0.2 </global>
You can query the database to check for the existence of a malware or not in the database. The purpose is to use a common and easy way to get the information. Using DNS query is a very common way to check or get information, RBL works like that to get black list of IPs from spammers.
A lot of application could benefit of checking hash against a malware database. If a hash is matching something in the database, this could give information regarding the security "state" of a file.
The malware DNS server is faking a DNS server but only answer to "TXT" query. If the record exists, the server is replying with the "origfilename" as a TXT record with a NOERROR status. For all other queries, the NXDOMAIN status code is returned. In a DNS request, the label can't exceed the size of 63 bytes. In order to avoid the size limitation (the SHA-2 in hex format is bigger), the request has to be split using a subdomain. The query can be splitted anywhere you want, the server rebuild the full hash by default (check the example below). The server is always sending AA (Authoritative Answer).
An example query using a SHA-2 hash :
dig -t TXT 3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org @127.0.0.1 ; <<>> DiG 9.2.5 <<>> -t TXT 3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org @127.0.0.1 ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34721 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org. IN TXT ;; ANSWER SECTION: 3d5a9097cda0565ccc4a0e8aaa703b8543.18731eb80bce12e8d9958f115fa468.sha1.maldb.csrrt.org. 3600 IN TXT "ffd05c84ee0803cc49423a2d45f3964d" ;; Query time: 2 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Tue Feb 21 15:19:58 2006 ;; MSG SIZE rcvd: 150
is a string containing an unique-id
is a string containing one tag