Monday, July 23, 2012

Format 212

Physionet has a number of databases that consist of .dat signal-capture files in their "format212" format.

The signal (5) manpage provides this following less-than-lucid explanation:

Each sample is represented by a 12-bit two's complement amplitude. The first sample is obtained from the 12 least significant bits of the first byte pair (stored least significant byte first). The second sample is formed from the 4 remaining bits of the first byte pair (which are the 4 high bits of the 12-bit sample) and the next byte (which contains the remaining 8 bits of the second sample). The process is repeated for each successive pair of sample. 

Why write a line of code when 100 words of plaintext will suffice?

Basically, 3 bytes (A, B, C) encode 2 data points (x, y) as follows:
  x = ((B & 0x0F) << 8) | A
  y = ((B & 0xF0) << 4) | C

A throwaway Ruby script to convert a "format212" file to an array of data-pairs:

#!/usr/bin/env ruby

def fmt212_to_a(buf)
  buf.bytes.each_slice(3).collect do |data|
    a = ((data[1] & 0x0F) << 8) | data[0]
    b = ((data[1] & 0xF0) << 4) | data[2]
    [a,b]
  end
end


if __FILE__ == $0
  ARGV.each do |path| 
    File.open(path, 'rb') do |f| 
      fmt212_to_a(f.read).each_with_index { |(a,b),x|
         # generate gnuplot-friendly output
         puts x.to_s + "\t" + a.to_s + "\t" + b.to_s
      }
    end
  end
end

Some equally throwaway gnuplot code to plot an output file signal.dat:

# plot entire file

plot 'signal.dat' using 1:2 title 'lead 1' with lines, 'signal.dat' using 1:3 title 'lead 2' with lines
# plot first 1024 data points of potentially huge files
 plot '< head -1024 signal.dat' using 1:2 title 'lead 1' with lines, '< head -1024 signal.dat' using 1:3 title 'lead 2' with lines 

No comments:

Post a Comment