Representing binary data

2017-06-18 | Martin Hoppenheit | 6 min read

Each piece of digital data that is stored, processed or transmitted by a computer consists of nothing more than a sequence of bits. This applies to image files, to office documents, to executable files and also to plain text files. On a basic level, it’s all bits. This text explains how such bit sequences are commonly displayed in a human-readable form known as binary or hexadecimal notation.

What is a bit?

Conceptually, a bit is the smallest unit of information in the digital world. It can take exactly two different values, so we could imagine a bit as a tiny switch (like the one for the light in your living room) that is either on or off; and we could imagine a storage device as a large sequence of such tiny switches (like a whole wall in your living room covered by light switches). While a single switch can hold only a very limited amount of information (on or off) a sequence of switches forms a pattern of on/off-states. If we agree on a meaning for this pattern we can pass messages or store information just by flipping some switches in the hypothetical sequence on your living room wall. For example, three switches in the off position followed by one switch in the on position could mean “I’ve gone to buy some beer, see you later”.

Now the “agree on a meaning” part in the previous paragraph is crucial: A sequence of bits becomes meaningful only when interpreted correctly, where the term “correctly” is a matter of agreement, definition, convention or standardization. Depending on the interpretation context of a given bit sequence (e.g., the application software a file is opened with), the very same bit sequence could represent the letter A, the number 65, a green pixel or an instruction for the computer’s processor to add two numbers together.

Usually we are not interested in the low-level bit sequence that makes up a file but in its interpreted view as defined by a file format and displayed by a suitable viewer. That’s why our computers by default open image files in image viewers, text files in text editors and office documents in office suites – these programs know how to interpret bit sequences according to the expected file format.

Sometimes however, we actually want to see the plain bit sequence that is hidden in a file. (If that never happened to you the remainder of this text will be a bit boring.) These are the moments when we fire up hex editors to display a file’s raw bit content. And that leads us to the actual topic of this text: the representation of bit sequences by binary or hexadecimal numbers, as in hex editors.

Binary notation

The most straightforward way to represent bit sequences in a human-readable way is to write the two possible values of a bit using two distinct symbols. We could use little pictures of on and off light switches or red and blue dots, but more common are the symbols 1 and 0. This is called binary notation because it uses two symbols. A short bit sequence of eight bits in binary looks like 01000001, and the first few bits of a TIFF file look something like this, with each digit representing one bit:

01001001 01001001 00101010 00000000 00001000 00000000 00000000 00000000
00010111 00000000 11111110 00000000 00000100 00000000 00000001 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
00000011 00000000 00000001 00000000 00000000 00000000 10011100 00001111
00000000 00000000 00000001 00000001 00000011 00000000 00000001 00000000
00000000 00000000 10110110 00001010 00000000 00000000 00000010 00000001

Hexadecimal notation

Obviously though, a sequence of 1 and 0 symbols like this is a little hard on the eyes and quite error-prone even with the added whitespace. Because of that longer bit sequences are rarely written in binary, but in hexadecimal notation (or hex for short). The same first few bits of a TIFF file look like this in hex:

49 49 2A 00 08 00 00 00 17 00 FE 00 04 00 01 00 00 00 00 00 00 00 00 01
03 00 01 00 00 00 9C 0F 00 00 01 01 03 00 01 00 00 00 B6 0A 00 00 02 01

In hexadecimal notation, each group of four consecutive bits is represented by one symbol from the set 09, AF according to the table below. This may seem less intuitive than binary with its “one bit, one symbol” approach at first, but don’t worry. Since each hex symbol unambiguously corresponds to a sequence of four binary symbols (or four bits), it’s in fact very easy to convert between hex and binary notation using this table. Take the byte (a sequence of eight bits is called a byte) 3F in hex notation as an example: According to the table the hex symbol 3 corresponds to the binary sequence 0011, and the hex symbol F corresponds to the binary sequence 1111, resulting in the binary sequence 00111111. This works just as well for conversion in the other direction, from binary to hexadecimal.

The fact that the intuitive binary notation and the concise hexadecimal notation are so easily interchangeable is the reason for the widespread use of hex symbols for the human-readable representation of bit sequences.

Hexadecimal and binary notation
Hexadecimal Binary
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
A 1010
B 1011
C 1100
D 1101
E 1110
F 1111

Side note: numbers or bit patterns?

The binary and hexadecimal systems are more than just ways to represent bit patterns; they are positional notation methods for encoding numbers. This sounds fancy, but it’s just a way to write numbers. Usually, we work with numbers that are based on the digits 0–9; that’s called decimal notation. The binary and hexadecimal systems share the same structure (the positional notation) but are not based on ten, but on two or sixteen digits, respectively. That’s quite interesting but nothing you need to worry about if you use binary and hex just to represent bit patterns; in this context, they are just unambiguous mappings between symbols and bit patterns.

Side note: binary vs. text files

People often distinguish between binary and (plain) text file formats, claiming that text files are more or less human-readable because they can be viewed and modified in any text editor while binary files are not. This distinction is very helpful in many situations; don’t be mistaken though! On the binary level that we have just discussed, text files and binary files look exactly the same; both are just sequences of bits.

The actual difference between text and binary files on the bit level is that plain text files allow to apply a well-known and rather simple interpretation of bit patterns as characters (like ASCII or UTF-8) when reading and editing them – that’s why they can be opened with any text editor. Many binary files, on the other hand, contain bit patterns that are more complex, more difficult to interpret or not publicly documented and often require software that is exclusively written for a specific (binary) file format.