Yara Exploration | Bleu blog

YARA Basics

Introduction to Yara

YARA (Yet Another Recursive Acronym) is a signature based detection for identifying IOCs, malicious files on networks and endpoints. YARA rules classify files based on textual or binary patterns, as well as generic rules to identify common attributes of various malware. Each rule starts with the keyword rule followed by a rule identifier and are composed of two sections comprising a set of strings and logical boolean expression to identify particular patterns within the files. The strings definition section can be omitted if the rule doesn’t rely on any string, but the condition section is always required.

rule HelloWorld 
{
    meta:
        description = "Rule example"
    strings:
        $hello_world = "Hello World!"
        $hello_world_uppercase = "HELLO WORLD!"
        $hello_world_lowercase = "hello world!"
    condition:
        any of them
}

Typical YARA usecases are around artefact classification to identify different filetypes, malicious signature identification based off of IOCs, and for threat hunting at scale by scanning across environments. At an entrerprise level these translate to proactive malicious file alerting, analyzing email attachments, or user uploads.

Strings

YARA has three types of strings, hexadecimal strings, text strings and regular expressions. Hexadecimal strings are used to define raw sequence of bytes while text and regular expression are used for legible text.

Hexadecimal strings allow four special constructions, wild-cards, not operators, jumps and alternatives. Wildcards are placeholders indicating some bytes are unknown, not operators define that byte in the highlighted location can take any value but the one defined, jumps are used to define strings with variable content/length and alternatives are for alternative fragments of hex string.

rule HexstringExample
{
    meta:
        description = "Hexadecimal strings example"
    strings:
        $wildcard_string = { E2 34 ?? C8 A? FB }
        $not_string = { F4 23 ~?0 62 B4 }
        $jump_string = { F4 23 [4-6] 62 B4 }
        $alternative_string = { F4 23 ( 62 B4 | 56 ) 45 }
    condition:
        any of them
}

Text strings are ASCII encoded, case sensitive strings with useful modifiers that can be appended at the end of the sting altering the way the strings are interpreted.

rule TextStringExample
{
    meta:
        description = "Text strings example"
    strings:
        $base_string = "foobar"
        $caseinsensitive_string = "foobar" nocase   // equivalent to Foobar, FOOBAR, fOoBaR and others
        $twobyteschar_string = "foobar" wide        // typical in executable binaries f\x00o\x00o\x00b\x00a\x00r\x00
        $singlebytexor_string = "foobar" xor        // equivalent to xor_string_01 = "gppcbs", xor_string_02 = "hqqdct"
        $base64_string = "foobar" base64            // matches three string combination at three byte offsets
        $fullword_string = "foobar" fullword        // matches www.foobar.com but not www.foobarpgh.com
    condition:
        any of them
}

Regular expressions are defined similar to text strings, enclosed in forward slashes instead of double quotes. Regular expressions can similarily be followed by previously defined string modifiers as highlighted through text strings and additionally followed by characters i and s after closing slash to highlight case insensitivity and that the dot(.) can match new-line characters. Additonal metacharacters, quantifiers and escape sequences supported by YARA are highlighted through documentation.

rule RegexExample
{
    meta:
        description = "Regular expressions example"
    strings:
        $re1 = /md5: [0-9a-fA-F]{32}/
        $re2 = /state: (on|off)/
        $re3 = /foo/i                   // This regexp is case-insentitive
        $re4 = /bar./s                  // In this regexp the dot matches everything, including new-line
     condition:
        any of them
}

Additionally, YARA supports Private strings and Unrefernced strings with former being used as string modifier to ensure strings are not included in output of YARA using the -s flag and latter corresponding to string identifiers starting with underscore “_” not having to be referenced through the condition logic.

Conditions and Logical Operators

Conditions are boolean expressions, similar to if statement in other programming languages. Conditions can contain Boolean operators (and, or, not); relational operators (>=, <=, <, >, ==, !=); arithmetic operators (+, -, *, \, %), and bitwise operators (&, |, <<, >>, ~, ^)

String counts and offsets are concepts within conditions for detecting multiple occurences of a string within a file or process memory and if the string is at some specific offset on the file or at some virtual address within the process address space. The number of occurrences of each string is represented by a variable whose name is the string identifier but with a # character in place of the $ character. While the at operator allows to search for a string at some fixed offset in the file or virtual address in a process memory space, the in operator allows to search for the string within a range of offsets or addresses. For many regular expressions and hex strings containing jumps, the length of the match is variable. Length of the matches can be used as part of condition by using the character ! in front of the string identifier, in a similar way to the @ character for the offset.

rule StringCountOffsetLenExample
{
    meta:
        description = String counts, offsets and match length condition example"
    strings:
        $str1 = "malicious"    // Detect the word "malicious"
        $str2 = "exploit"      // Detect the word "exploit"
        $str3 = "vuln.*"       // Matches "vuln" followed by any characters
    condition:
        #str1 >= 2 and                          // Atleast 2 occurences of "malicious"
        ($str2 at 1200 or @str2[0] > 1000) and  // "exploit" string is at offset 1200 or first occurence is after offset 1000 in the file
        @str1[1] in (1000..filesize) and        // The second occurence of "malicious" string is between 1000 and end of file
        !str3[0] > 10                           // The length of the first occurrence of "vulnerability" is greater than 10 bytes
}

String identifiers are not the only variables that can appear in a condition (in fact, rules can be defined without any string definition as will be shown below), there are other special variables that can be used as well. One of these special variables is filesize, which holds, as its name indicates, the size of the file being scanned(the size is expressed in bytes). Postfix KB/MB when attached to a numerical constant automatically multiplies the value of the constant by 1024 and 2^20 respectively with both postfixes being used with only decimal constants and when the rule is applied to a file.

rule FileSizeExample
{
    condition:
        filesize > 200KB
}