5. Protocol Analysis
Protocol analysis is the process of capturing network traffic (with sniffing
programs) and looking at it closely in order to figure out what is going on.
As data is
sent across a wire, it is "packetized", meaning broken down into multiple
packets that are each sent individually across the network, then reassembled back on the
other side. For example, you probably downloaded this document from the network. Since
this document is around 45,000 bytes and the typical packet size is 1,500 bytes, it took
about 30 packets to deliver this document to you.
Below is a sample packet. This packet was taken from a packet sniffer that watch my
workstation download this FAQ from my website. This packet was originally 1514 bytes long,
but only the first 512 bytes are shown here:
000 00 00 BA 5E BA 11 00 A0 C9 B0 5E BD 08 00 45 00 ...^......^...E.
010 05 DC 1D E4 40 00 7F 06 C2 6D 0A 00 00 02 0A 00 ....@....m......
020 01 C9 00 50 07 75 05 D0 00 C0 04 AE 7D F5 50 10 ...P.u......}.P.
030 70 79 8F 27 00 00 48 54 54 50 2F 31 2E 31 20 32 py.'..HTTP/1.1.2
040 30 30 20 4F 4B 0D 0A 56 69 61 3A 20 31 2E 30 20 00.OK..Via:.1.0.
050 53 54 52 49 44 45 52 0D 0A 50 72 6F 78 79 2D 43 STRIDER..Proxy-C
060 6F 6E 6E 65 63 74 69 6F 6E 3A 20 4B 65 65 70 2D onnection:.Keep-
070 41 6C 69 76 65 0D 0A 43 6F 6E 74 65 6E 74 2D 4C Alive..Content-L
080 65 6E 67 74 68 3A 20 32 39 36 37 34 0D 0A 43 6F ength:.29674..Co
090 6E 74 65 6E 74 2D 54 79 70 65 3A 20 74 65 78 74 ntent-Type:.text
0A0 2F 68 74 6D 6C 0D 0A 53 65 72 76 65 72 3A 20 4D /html..Server:.M
0B0 69 63 72 6F 73 6F 66 74 2D 49 49 53 2F 34 2E 30 icrosoft-IIS/4.0
0C0 0D 0A 44 61 74 65 3A 20 53 75 6E 2C 20 32 35 20 ..Date:.Sun,.25.
0D0 4A 75 6C 20 31 39 39 39 20 32 31 3A 34 35 3A 35 Jul.1999.21:45:5
0E0 31 20 47 4D 54 0D 0A 41 63 63 65 70 74 2D 52 61 1.GMT..Accept-Ra
0F0 6E 67 65 73 3A 20 62 79 74 65 73 0D 0A 4C 61 73 nges:.bytes..Las
100 74 2D 4D 6F 64 69 66 69 65 64 3A 20 4D 6F 6E 2C t-Modified:.Mon,
110 20 31 39 20 4A 75 6C 20 31 39 39 39 20 30 37 3A .19.Jul.1999.07:
120 33 39 3A 32 36 20 47 4D 54 0D 0A 45 54 61 67 3A 39:26.GMT..ETag:
130 20 22 30 38 62 37 38 64 33 62 39 64 31 62 65 31 ."08b78d3b9d1be1
140 3A 61 34 61 22 0D 0A 0D 0A 3C 74 69 74 6C 65 3E :a4a"....<title>
150 53 6E 69 66 66 69 6E 67 20 28 6E 65 74 77 6F 72 Sniffing.(networ
160 6B 20 77 69 72 65 74 61 70 2C 20 73 6E 69 66 66 k.wiretap,.sniff
170 65 72 29 20 46 41 51 3C 2F 74 69 74 6C 65 3E 0D er).FAQ</title>.
180 0A 0D 0A 3C 68 31 3E 53 6E 69 66 66 69 6E 67 20 ...<h1>Sniffing.
190 28 6E 65 74 77 6F 72 6B 20 77 69 72 65 74 61 70 (network.wiretap
1A0 2C 20 73 6E 69 66 66 65 72 29 20 46 41 51 3C 2F ,.sniffer).FAQ</
1B0 68 31 3E 0D 0A 0D 0A 54 68 69 73 20 64 6F 63 75 h1>....This.docu
1C0 6D 65 6E 74 20 61 6E 73 77 65 72 73 20 71 75 65 ment.answers.que
1D0 73 74 69 6F 6E 73 20 61 62 6F 75 74 20 74 61 70 stions.about.tap
1E0 70 69 6E 67 20 69 6E 74 6F 20 0D 0A 63 6F 6D 70 ping.into...comp
1F0 75 74 65 72 20 6E 65 74 77 6F 72 6B 73 20 61 6E uter.networks.an
This is the standard "hexdump"
representation of a network packet, before being decoded. A hexdump has three columns: the
offset of each line, the hexadecimal data, and the ASCII equivalent. This packet contains
a 14-byte Ethernet header, a 20-byte IP header, a 20-byte
TCP header, an HTTP header ending in two
line-feeds (0D 0A 0D 0A) and then the data.
The reason both hex and ASCII are shown is that sometimes ones is easier to read than
the other. For example, at the top of the packet, the ASCII looks like garbage, but the
hex is readable, from which you can tell, for example, that my MAC address is
00-00-BA-5E-BA-11 (i.e. "BASEBAll").
A "protocol analyzer" will then take this hexdump and interpret the
ETHER: Destination address : 0000BA5EBA11
ETHER: Source address : 00A0C9B05EBD
ETHER: Frame Length : 1514 (0x05EA)
ETHER: Ethernet Type : 0x0800 (IP)
IP: Version = 4 (0x4)
IP: Header Length = 20 (0x14)
IP: Service Type = 0 (0x0)
IP: Precedence = Routine
IP: ...0.... = Normal Delay
IP: ....0... = Normal Throughput
IP: .....0.. = Normal Reliability
IP: Total Length = 1500 (0x5DC)
IP: Identification = 7652 (0x1DE4)
IP: Flags Summary = 2 (0x2)
IP: .......0 = Last fragment in datagram
IP: ......1. = Cannot fragment datagram
IP: Fragment Offset = 0 (0x0) bytes
IP: Time to Live = 127 (0x7F)
IP: Protocol = TCP - Transmission Control
IP: Checksum = 0xC26D
IP: Source Address = 10.0.0.2
IP: Destination Address = 10.0.1.201
TCP: Source Port = Hypertext Transfer Protocol
TCP: Destination Port = 0x0775
TCP: Sequence Number = 97517760 (0x5D000C0)
TCP: Acknowledgement Number = 78544373 (0x4AE7DF5)
TCP: Data Offset = 20 (0x14)
TCP: Reserved = 0 (0x0000)
TCP: Flags = 0x10 : .A....
TCP: ..0..... = No urgent data
TCP: ...1.... = Acknowledgement field significant
TCP: ....0... = No Push function
TCP: .....0.. = No Reset
TCP: ......0. = No Synchronize
TCP: .......0 = No Fin
TCP: Window = 28793 (0x7079)
TCP: Checksum = 0x8F27
TCP: Urgent Pointer = 0 (0x0)
HTTP: Response (to client using port 1909)
HTTP: Protocol Version = HTTP/1.1
HTTP: Status Code = OK
HTTP: Reason = OK
In the above hexdump and decode, I've underlined the "Time to Live" field of
0x7F. This is how a protocol decode works: it pulls each of the fields out of the packet
and attempts to explain what the numbers mean. Some fields are as small as a single bit,
other span many bytes.
Protocol analysis really is a difficult art, and requires a lot of knowledge
about protocols in order to do it well. However, the rewards are that a lot of information
can be easily gleaned from protocols. This info can be useful to network managers trying
to debug problems, or hackers who are trying to break into computers.
All data within a computer is represented as numbers. Hexadecimal (or simply
"hex") is a better numbering system for viewing this data than the
"decimal" numbers everyone is already familiar with. Hexadecimal is one of those
computer science concepts that is difficult to understand, until the "aha"
moment when you finally understand it. After that point, it becomes second nature.
Everybody has a different path to that "aha" moment, so don't feel bad if you
don't understand the following discussion. However, you must eventually understand
hexadecimal, so you really should look it up on the web.
The word "decimal"
has the root "dec", meaning "10". This means that there are 10 digits
in this numbering system:
0 1 2 3 4 5 6 7 8 9
The word "hexadecimal" has the roots "hex" meaning 6 and
"dec" meaning 10; add them together and you get 16. This means there are sixteen
digits in this numbering system:
0 1 2 3 4 5 6 7 8 9 A B C D E F
The is useful because all data is stored in a computer as "bits"
(binary-digits, meaning two digits: 0 1), but all bits are grouped into 8-bit units known
as "bytes" or "octects", which in theory have 256 digits. Bits are two
small to view data, because all we would see is a stream like
00101010101000010101010110101101101011110110, which is unreadable. Similarly, using 256
digits would be impossible: who can memorize that many different digits? Hexadecimal
breaks a "byte" down into a 4-bit "nibble", which has 16-combinations
(256 = 16*16). This allows us to represent each bytes as two hexadecimal digits.
Hexadecimal allows technical people to visualize the underlying binary data. A
technical person has the following table memorized:
0000 = 0 0001 = 1 0010 = 2 0011 = 3
0100 = 4 0101 = 5 0110 = 6 0111 = 7
1000 = 8 1001 = 9 1010 = A 1011 = B
1100 = C 1101 = D 1110 = E 1111 = F
In other words, when you encounter the hexadecimal digit "B", you should
immediately visualize the bit pattern "1011" in your head. It is much like
memorizing multiplication tables as a kid, memorizing this table will serve much the same
Hexadecimal is often preceded by a special character(s). For example, when you see the
number "12", is this "twelve" (decimal) or "eighteen"
(hexadecimal)? If it is hex, it is often written as either "0x12",
"x12", or "$12". The former is the preferred version, since that is
how many programming languages represent it. Naturally, this isn't needed for hex dumps
because the fact we are showing hex is pretty much assumed.
5.3 What is ASCII?
Computers represent everything as numbers. This means the text your are reading right
now is represented as numbers within the computer. ASCII is one just representation. In
ASCII, the letter 'A' is represented by the number 65, or in hex, 0x41. The letter
'B" is represented by the number 66/0x42. And the process continues for all
characters, numbers, punctuation, and so forth.
If you look at the normal (U.S.
English) keyboard you will count 32 punctuation characters, 10 decimal digits, 26 letters,
and 26 more letters when you take into account UPPER/lower case. This comes to 94
different characters. In binary, you need 7-bits to represent that number of combinations.
This maps nicely onto the standard 8-bit bytes used in computers, with room left over.
In hex dumps, note that the ASCII columns contains lots of periods. A byte has 256
combinations, but we can only view 94 of them. Any character that is not one of these 94
visible characters is shown as a period.
5.4 What is the "OSI 7-Layer Model"?
Any discussion of protocol analysis usually starts with the OSI model.
the OSI Model, networking was generally "monolithic". In other words, the
application that displayed the data on your screen was also responsible for the hardware
that moved the bits across the wire. You couldn't change either the software or the
hardware with upgrading the entire system. Imagine having to buy a new computer simply to
upgrade the software!
The concept behind the OSI model is to separate the functionality into different
conceptual modules. As a quick introduction to this, consider the following 3-layer model
that most consumers are familiar with:
||Web browser, e-mail, RealAudio
||Dial-up modem, Cable modem, DSL, Ethernet
Conceptually, this can be viewed in the following diagram:
| Computer |
| +-------+ |
| | Web | | ____
| |Browser| | __/ \__
| +----++-+ | / Internet\
| || | | cloud \
| +----||-+ | +--------+ | +------+
| | TCP \\| +-----+ Link | Router | \ | Web |
| | IP \+=+ NIC +===//==+ +=====//======+====+ Site |
| | | +-----+ | | / | |
| +-------+ | +--------+ | +------+
| | | /
+-------------+ \__ __/
In this conceptual representation, the user's "Web Browser" application is
trying to view a web-page located on a "Web Site" located out on the web
somewhere. The "Web Browser" passes it down to the "TCP/IP" stack,
which sends it out the "NIC" across the local "Link" to the nearest
"Router" gateway. At this point, the client doesn't really know what is going to
happen to the data. Presumably, it is passed from router-to-router through the Internet
"cloud" until it reaches the destination "Server" hosting the
The important point to learn from all this is the concept of abstraction. Each
component of this diagram does not know anything about the other components. For example,
consider the postman that delivers your mail (physical mail, not e-mail). The postman has
no knowledge of the contents of your letters. S/he simply moves the mail between the local
post-office and your mail box. In much the same way, the IP layer within the machine has
no knowledge of the contents of the packets. Its only responsibility is to accept packets
from the TCP layer, and send them out NIC toward the local Router. The IP layer is even
fuzzy on the exact details of how the NIC transports the packets to the local router, and
is completely clueless as to what happens to the packets after that point.
In other words, each layer has a single job to do, and doesn't know anything about
what is going on in the other layers.
This is difficult for humans to understand, because we can see what the entire process
is trying to accomplish. It is difficult for us to constrain our view to just a single
Note that in the above diagram, there is lots of stuff toward the left of the diagram,
but not much detail toward the right. This is because the user really has no idea how
packets are really routed on the Internet, nor does the user really know much about the
web site. In fact, the web site may consist of multiple computers for load-balancing
purposes, or conversely may consist of a single computer hosting many web sites.
In order to understand the OSI Model, you must first understand the political backdrop
behind its creation. In the late 1970s, computer networking was dominated by large,
proprietary systems. Once you bought product from a single vendor, you could never buy
products from other vendors that would work with it. You were "locked-in", and
the vendor was free to charge whatever they wanted. Therefore, the OSI working group (OSI
= Open Systems Interconnect) as part of the ISO (International Organization for
Standardization) was created in order to standardize network protocols. In theory, if
everyone conformed to standards, then consumers could buy products from different vendors
at lower prices and save lots of money.
However, the OSI/ISO standardization process is fundamentally dysfunctional. For
example, what does the acronym I-S-O stand for? In English, this stands for
"International Organization for Standardization" or IOS. In French, it stands
for "Organisation Internationale de Normalization", or OIN.
(This can be seen on their homepage at http://www.iso.ch/).
English and French are the two official languages of ISO, and acronyms are chosen so that
they match neither the English or French terms they refer to. Generally, standards that
start within the ISO follow the same logic: in an effort to appease everybody, they end up
The OSI Model was a blueprint for an entire protocol suite that would implement the
individual layers. They actually succeeded in generating this standard, but it never
achieved popular use and has largely been supplanted by TCP/IP. While large organizations
(government, industrial), mostly in Europe, have attempted to use it, it has largely
become a boat anchor.
The following is a description of the 7-Layers within the model, and how they map onto
the TCP/IP suite that we are familiar with.
||As far as...
||This layer doesn't mean the application itself, but the protocols that do the work for
the application. Examples: HTTP for web browsers, SMTP/POP/IMAP for e-mail,
||This layer is an example of the political processes mentioned above. The theory was
that the application layer didn't need to format the data for transmission across the
wire; that would be the job of an underlying layer. Furthermore, the idea was that a
client and server would negotiate which format they wanted to use.
This actually made
sense back in the late 1970s, because most networking involved dumb, character-mode
terminals controlled by mainframes. An application wanted to deal with abstract concepts
like database forms that a user filled out. Different terminals have different control
codes to display this type of information. However, it makes virtually no sense nowadays.
As a result, OSI protocols always negotiate a Presentation encoding, even though only one
option is available.
In TCP/IP, the only protocol that really does his negotiation is Telnet. Character-mode
applications such as 'vi' go through a package called "curses". When
you Telnet to a host, you exchange your terminal type with with the host. When an
application such as 'vi' wishes to clear the screen, the 'curses' packages tells it how to
do so for your particular terminal.
However, the concept of how data is formatted on the wire is extremely important to
applications. So even though it doesn't exist as a real layer outside of OSI protocols, it
is still an important component of all applications.
||Like the Presentation layer, the Session layer is an artifact of the ISO political
processes, but more so. In fact, as it turns out, the Session layer is completely useless.
If you pick up a book on OSI and read up on the Session layer, you will end up reading
lots of technical content about it but you still won't be able to answer the question:
What is the session layer for? If you ask somebody what the Session layer does, they will
tell you something like "It establishes a session between two entities". If you
ask what a "session" is, the answer would be "It's what the Session layer
establishes". If they give you a more detailed answer, they are probably confused and
will really be describing a Transport layer "connection", which isn't the same
There is a Session layer protocol defined by OSI, but the OSI hasn't defined any
uses for it (that I know of). In other words, all the OSI applications establish both
Transport layer "connections" and Session layer "sessions" when they
start talking to each other, then tear down the connections/sessions together when they
stop talking. Luckily, the OSI protocol has been defined in a fairly efficient manner such
that both can be established in the same packet, and the Session layer only adds a couple
bytes on average to every packet sent.
Now I could actually tell you what the Session layer really is for, but I'm not going
to. (Hint: it has to do with terminals talking to mainframes.) You should simply remember
the concept that it really doesn't do anything. The 7-Layer model really just contains
6-layers (actually, 5-layers, because the Presentation layer should be thrown out as
||The Transport layer is TCP (and UDP).
Like the discussions above, the official OSI
Transport layer has lots of worthless features attached to it. However, I'm going to
pretend it just describes how TCP implements this layer. Furthermore, in order to
understand this layer, you need to understand the layer below (IP), so I explain both in
the next section.
||The Network layer is IP. Everything on the Internet pretty much focuses on the IP
protocol. Even though there are 7 layers in the OSI model, the "first" layer is
layer #3, the Network layer.
Like the discussions above, the OSI has specified lots of
useless features for the Network layer. Each layer was given to its own group to design,
so each group designed into their layers pretty much all the features of the rest of the
stack. Therefore, it is difficult to point to any particular networking feature and say
"it belongs to layer #X". Luckily, TCP/IP is much more clean that way.
Therefore, I will follow the standard practice of describing only the features that the
TCP/IP protocol implements rather all possible features.
The IP protocol is designed around the concept of an "unreliable datagram".
Its one purpose is to get a packet of data from one machine to the destination machine all
the way across the Internet. In order to do this, each machine is given an IP address:
e.g. 192.0.2.14. The originating machine puts that address into a packet, then sends it to
the nearest router. The router then looks at the IP address, and decides which direction
it goes, and forwards it to the next router in that direction. The packet travels from
hop-to-hop through the Internet until it reaches its destination. The whole process takes,
on average, about a tenth of a second.
You can visualize this as being just like normal mail. You write a letter, wrap it in
an envelope, then address it. You stick the envelope in the mailbox and somehow it
disappears, gets routed from hop-to-hop through the postal network, and eventually ends up
at its destination.
The important concept to remember is "unreliable": letters get lost in the
mail. Roughly 1 out of every 100 IP datagrams gets lost on the Internet, sometimes more,
sometimes less. This is where TCP (from the Transport layer above) steps in: it keeps
track of all the datagrams going back and forth, and if one is lost, it automatically
In order for TCP to provide this reliability, it must create a "connection".
In other words, before two programs can talk to each other across the Internet, they must
first establish a TCP connection through a process known as a "handshake".
Whenever one machine sends data to the other, the receiver must send back an
acknowledgement so that the sender knew it arrived.
Likewise, IP itself requires no connection; it is "connectionless". In fact,
many protocols who don't want the complexity of TCP choose bypass it by using UDP on top
of IP: UDP is essentially the same as TCP, but doesn't acknowledge packets and doesn't
require a connection. A good example of this usage is an application like IP Phone: it
doesn't matter if a bit of data is lost here and there (cell phones do it all the time).
Most of the time you won't notice a single lost packet, and if multiple packets are lost
in succession, you can simply ask the other person to repeat what they said. In fact, TCP
is very bad for this application: whenever a packet is lost, transmission halts until both
sides are caught up again. In an IP Phone conversation, this means you might hear a pause,
then the other person's delayed comments that need to be replayed very fast in order to
catch up. For this reason, "real time" applications like audio, video, and games
prefer UDP over TCP.
Finally, the IP address will get your data as far as the destination machine, but many
programs on that machine may be waiting for data to arrive. How does that machine know to
which application that belongs? The Transport layer (TCP and UDP) contain their own unique
addresses for each program on that machine. Each program is bound to a different
"port" number. For example, imagine a machine with two web services running on
it. You get to the different web services by specifying different ports in the URL. the
URL http://www.example.com:80/index.html will get to the web service running at port 80,
and http://www.example.com:81/index.html will get to the web service running at port 81.
Thus, these URLs will return different web pages, depending on which service responded.
||The most important concept to remember about the Data Link layer is "next
hop". It's only purpose is to connect two machines together: you machine and the
nearest router, or two routers. This is the "Link" component in the diagram
In other words, on an Ethernet wire, your machine wraps the IP packet with
Ethernet information, sends it to the first router. That router than strips off the
Ethernet header and forgets about it. The router then decides which direction to foward
the packet, the wraps it with the Data Link framing information to go across that wire to
the next router.
A machine might have both an Ethernet "MAC" address and an IP address. The
Ethernet is the Data Link layer, and the MAC address is only visible locally, and is used
by the local router in order to figure out how to send incoming traffic to you (vs.
anybody else sharing the same Ethernet wire). Conversely, the IP address is global. If
somebody in Siberia sends you traffic, they will use your IP address.
||The Physical layer simply gets the bits out onto the wire. Different wires require
different ways of encoding the bits. A telephone modem, for example, converts the bits
into sounds patterns that go across the telephone wire. Ethernet, on the other hand,
converts the bits into a series of high/low voltage levels.
People often have trouble understanding all these concepts. I like to summarize as
- The Physical layer (1) sends bits onto the wire.
- The Data Link layer (2/Ethernet/PPP) sends frames as far as the next hop.
- The Network layer (3/IP) sends packets as far as the destination machine across the
- The Transport layer (4/TCP) creates connections to the program on that destination
- The Application layer (7/HTTP/SMTP/POP/IMAP) communicates the received information (such
as files) to the user.
- Forget about the Session layer (5) and Presentation layer (6).
A more laid-back approach can be found at: http://www.europa.com/~dogman/osi/
5.5 What is a packet?
To truely understand this answer, you must read sections section 5.1 and section 5.4 above.
However, this is a common question, so I'll introduce the concept briefly here.
down that is transfered on the Internet is packaged in individual units known as
"packets". It takes between 30 and 50 packets for this document to be transfered
to your computer, for example. Each packet is labeled with an "IP address" that
specifies its destination.
The trick is that everything that is sent by the computer needs to be broken down into
these packets. For example, if you listen to Internet "radio" to a streaming
broadcast, it appears to you as one continuous stream, but in reality the transmitter is
breaking the data down into individual packets, then your machine is reassembling them
back into a stream.
The entire effort of sniffing consists of looking at either the individual packets, the
reassembled data, or the sifted information (like passwords) out of the reassembled data.
TCP is a "connection-oriented" protocol. This means that before you send data
across it, you must first establish the connection. In English terms, the TCP Three-Way
Handshake (TWHS) is the following:
- I would like to talk to you
- Sure, let's talk
TCP is a "reliable" service, meaning that everthing has to be acknowledged in
order to verify that it was received correctly. Similar protocols to TCP use 2, 3, or 4
packet handshakes in order to setup the connection. An amazing amount of work went into
choosing the optimal 3 packet exchange.
In TCP-speak, we view this exchange as :
Where "SYN" is a flag in the TCP header that means "let's start
talking", and which only occurs in the first two packets in the exchange. The
"ACK" field means that the "acknowledgement" field is valid.
TCP has this interesting concept that every packet acknowledges receipt of data in the
other direction. So if I send you one packet and you send me five in response, then each
of your five responses acknowledge that you have received my one packet. This means that all
TCP packets have the ACK bit set, except for the first one (because there's nothing to
acknowledge). Note: most firewalls that block incoming TCP connections really just block
packets without the ACK bit set; this means TCP traffic with that bit can still make it
through the firewall to "ping" yo
All connection oriented protocols have "sequence numbers" which help order
the data in the correct sequence, and acknowledge how far along in the sequence you've
received the data. Most connection-oriented protocols beging their sequence numbers at
zero, but TCP has this weird concept of starting them at a random number. Therefore, the
TWHS looks something like:
Both sides tell the other side what sequence numbers will be used in the connection.
The first data sent in either direction will use these sequence numbers.
Another thing to remember is that a TCP connection occurs from the port on one machine
to the port on another. For example, a web server typically runs at port 80, and client
ports are allocated starting at port 1024. Therefore, we might expand our example to show
This explanation is just an overview provided as an alternative way of
looking at the subject. I strongly recommend you grab a book on
TCP/IP, or lookup "three way handshake" on the web.