Automation and Estimative Language

Automation and Estimative Language in Information Exchange

Alexandre Dulaunoy a@foo.be

version 0.2 - 2017-10-08

In the past years, automation slightly improved in incident response and information sharing. We slowly grown up from list of indicators in random unstructured emails to automatic information exchange with partial contextual information. The state of information exchange is still in its infancy and many new practices need to be created, communicated, shared, interpreted and finally used.

Objectives of information sharing and automation in incident response, cybersecurity (or even counter-terrorism) are diverse:

Pro-active analytic report sharing to better understand the current threats and have a common ground for evaluation (human to human exchange).
Bridging analytic report from analysts to systems which then perform automation (human to machine exchange/automation).
Analytical information generated by machines (from monitoring, algorithmic analysis or numerical analysis) toward a human (from machine to human).

Different analytic standards exist like Intelligence Community Directive (ICD) 203, Analytic Standards but those are often described into human readable reports and not easily interpretable by software or tools to use or process. Even if such analytic standards have limitation as described by Friedman and Zeckhauser in Handling and Mishandling Estimative Probability: Likelihood, Confidence, and the Search for Bin Laden. The benefit of using analytic standards to reach the objectives mentioned above can be significant.

Within the MISP project (an open source threat intelligence platform and a set of open standards), we strongly believe that should be available within the format used along with the tools available to the analysts. In 2015, we investigated various solutions to incorporate such analytical model in the systems without the need to change or update the MISP core software according to each model used by a CSIRT, an intelligence organisation or a law-enforcement agency. We designed a system called misp-taxonomies where anyone can describe their model in a simple format and use it directly to tagged information within the MISP platform.

As an example in MISP, you can see below how an analyst can assign a specific estimation to an existing set of attributes like a blog-post towards an external analysis report:

A practical example of estimative language within the MISP platform

Behind the scene, misp-taxonomies are described in a simple JSON format:

{
  "namespace": "estimative-language",
  "expanded": "Estimative language ICD 203",
  "description": "Estimative language to describe quality and credibility of underlying sources, data, and methodologies based Intelligence Community Directive 203 (ICD 203)",
  "version": 2,
  "predicates": [
    {
      "value": "likelihood-probability",
      "expanded": "Likelihood or probability",
      "description": "Properly expresses and explains uncertainties associated with major analytic judgments: Analytic products should indicate and explain the basis for the uncertainties associated with major analytic judgments, specifically the likelihood of occurrence of an event or development, and the analyst's confidence in the basis for this judgment. Degrees of likelihood encompass a full spectrum from remote to nearly certain. Analysts' confidence in an assessment or judgment may be based on the logic and evidentiary base that underpin it, including the quantity and quality of source material, and their understanding of the topic. Analytic products should note causes of uncertainty (e.g., type, currency, and amount of information, knowledge gaps, and the nature of the issue) and explain how uncertainties affect analysis (e.g., to what degree and how a judgment depends on assumptions). As appropriate, products should identify indicators that would alter the levels of uncertainty for major analytic judgments. Consistency in the terms used and the supporting information and logic advanced is critical to success in expressing uncertainty, regardless of whether likelihood or confidence expressions are used."
    }
  ],
  "values": [
    {
      "predicate": "likelihood-probability",
      "entry": [
        ...
        {
          "value": "very-likely",
          "expanded": "Very likely - highly probable - 80-95%",
          "numerical_value": 80
        }
        ...
      ]
    }
  ]
}

A taxonomy can be used to express analytical model described in document and attach it to structured information (e.g. indicators, contextual information, preventive measure). In addition, machine often requires to be able to interpret such model. We expanded the model to support a numerical_value value attached to each entry to quantitative information which can be used by other tools, software or machine for further processing. The numerical value conveyed on structured information can then be used to perform additional actions based on the threshold or a specific function using the values as parameters.

Based on such approach, analytical model described in a structured taxonomy allows the processing and sharing of human-readable information with analytical information to support or integrate automation. Within the MISP project team, we always try to keep a very practical approach (in other words, it needs to work at least for us). Feedback, ideas or new analytical model can be shared with us via pull-request/issues, chat or during MISP training sessions.

References

Documentation about existing MISP taxonomies and classification as machine tags - PDF