Wednesday, September 7, 2011

Fixing strings(1)

The first step in any examination of a binary is to run strings(1) on it.

This usually results in output like the following:
bash$ strings /bin/ls
FXL9GX
|"H9
GXL9FX
|"H9
F@L9G@
...which, needless to say, sucks.

The problem here is that a 'string' is any sequence of printable ASCII characters of a minimum length (usually 4).

The output of strings(1) can be made a bit more usable by discarding any string that does not contain a word in the system dictionary. 

An off-the cuff implementation of such a filter in Ruby would be:
#!/usr/bin/env ruby

DICT='/usr/share/dict/american-english-insane'
ARGF.lines.each do |line|
  is_str = false
  line.gsub(/[^[:alnum:]]/, ' ').split.each do |word|
    next if word.length < 4
    is_str = true if (`grep -Fxc '#{word}' #{DICT}`.chomp.to_i > 0)
  end
  puts line if is_str
end 

This would then be passed the output of strings(1):
bash$ strings /bin/ls | ./strings.rb
__gmon_start__
_fini
clock_gettime
acl_get_entry
acl_get_tag_type
...

This is super-slow for large executables, and will miss sprintf-style formatting strings that contain only control characters (e.g. "%p\n"), but for the general case it produces useful output.

Directions for improvement: load one or more dictionary files and perform lookups on them in Ruby. Search for english-like words by taking 4-letter prefixes and suffixes of each 'word' in the string and searching for dictionary words that start or end with that prefix/suffix. Provide early-exit from the inner field look when a match is found. Allow matches of sprintf formatting strings, URIs ('http://'), etc.

No comments:

Post a Comment