RFC 791, Internet Protocol
2015-03-11I have some time off so I thought I'd take the opportunity to dive more deeply into the fundamental aspects of web development and work my way through the specification for one of the pillars of the internet, Internet Protocol.
In reading Foundations of Python Network Programming by Rhodes and Goerzen, I was linked to RFC 791: INTERNET PROTOCOL, DARPA Internet Program Protocol Specification. It seems as good a place as any to discover what this whole "internet" thing is. In an effort to really understand the content, I'm taking notes as I work my way through it. The entire document is only 45 pages, we'll see if I can make it in one sitting.
Before I Start
I'll try and make note of any questions as they occur to me and call them out with a top level bullet-point, any subsequent points should be to address those questions as I understand them. It's not the most logical flow but it works for me.
Diving Right In
Protocol Hierarchy
+------+ +-----+ +-----+ +-----+
|Telnet| | FTP | | TFTP| === | ... |
+------+ +-----+ +-----+ +-----+
| | | |
+-----+ +-----+ +-----+
| TCP | | UDP | === | ... |
+-----+ +-----+ +-----+
| | |
+--------------------------+----+
| Internet Protocol & ICMP |
+--------------------------+----+
|
+---------------------------+
| Local Network Protocol |
+---------------------------+
Protocol Relationships
In what instances would a local network protocol not use IP (or a higher level protocol such as TCP)?
- The local network protocols are those at the OSI Model's Data link layer and Physical layer, the local network is a more fundamental operating level than those protocols built on IP.
"The internet protocol can capitalize on the services of its supporting networks to provide various types and qualities of service." What does this refer to specifically? TCP and UDP are all built on IP - what comprises the supporting networks of IP?
- According to the OSI Model, the supporting networks of IP are the Data Link layer and the Physical layer. MAC (Media Access Control) addresses and Ethernet protocols fall within the Data link layer, raw bits and electrical connectors are the subject of the physical layer.
"A TCP module would call on the internet module to take a TCP segment as the data portion of an internet datagram". Would it be correct to say TCP is fitted over IP as a more stringent subset?
- Not a subset but a complement built on to IP. TCP segments are encapsulated into an IP datagram before exchange.
IP implements two functions, addressing and fragmentation.
Addressing
At what level of the "stack" does routing fall? Addressing specifies a destination, but there are all manner of routes to a given destination. Is it left to the discretion of the router?
- IP deals in addresses, it is the responsibility of local nets and gateways to perform routing operations (a lower level than IP)
Several examples are given of specific classes (A, B, C) for addresses, differentiated by the network number and local address number (7/24, 14/16, 21/8) respectively. What is the significance of the alternate address classes?
- Not very relevant any more since the industry has apparently switched to
classless routing. The initial intent was to designate the various
combinations of hosts to networks (Class A would be used for very large
numbers of total hosts, Class C was intended for LANs). If
a.b.c.d
were a given IP address, in class A the network ID would be specified bya
whileb.c.d
were for a host ID. Class C would be the inverse, witha.b.c
used to identify the network ID andd
for the host ID.
- Not very relevant any more since the industry has apparently switched to
classless routing. The initial intent was to designate the various
combinations of hosts to networks (Class A would be used for very large
numbers of total hosts, Class C was intended for LANs). If
Fragmentation
Fragmentation is necessary when traversing any network that limits packets to a smaller size than any previous step along the route.
Has the utility of small networks declined with advances in technology (since 1981)?
- Seemingly not, though not a reflection of small/constrained networks, the maximum transmission unit is dependent on communication interface standards, for example Ethernet V2, which specifies a MTU as a 1500-byte packet.
Fragmented datagrams include an identification field in order to reassemble, if IP makes no provision for sequencing, how are fragmented datagrams reassembled?
- Reassembly is done through the use of the fragment offset and length fields, the omission of sequencing information refers to sequencing between discrete packets, rather than fragmented packets/datagrams.
If a portion of a fragmented datagram is lost or malformed, is the whole datagram discarded?
- The datagram is discarded, which is why IP is said to make no guarantees on the transmission or delivery of packets. Higher level protocols (TCP) would trigger re-transmission on packet loss.
A recursive fragmentation procedure is outlined:
if total_length <= maximum_transmission_unit
submit datagram to next step in datagram processing
elif dont_fragment == 1
discard_datagram
else
copy_original_header
old_header_length = header_length
old_total_length = total_length
old_fragment_offset = fragment_offset
old_more_fragment_flag = more_fragments_flag
number_fragment_blocks = (maximum_transmission_unit
- header_length * 4)
/ 8
Attach the first number_fragment_blocks * 8 data octets
# Correct the header:
more_fragments_flag = 1
total_length = (header_length * 4) + (number_fragment_blocks * 8)
recompute_checksum
return fragment # submit this fragment to the next step in datagram
# processing
# To produce the second fragment:
selectively_copy_internet_header # some options are not copied, see option
# definitions
append_remaining_data
# Correct the header:
header_length = (((old_header_length * 4)
- (length of options not copied))
+ 3)
/ 4
total_length = old_total_length
- (number_fragment_blocks * 8)
- ((old_header_length - header_length) * 4)
fragment_offset = old_fragment_offset + number_fragment_blocks
more_fragments_flag = old_more_fragment_flag
recompute_checksum
recur
The number of potential failure points in fragmentation and reassembly, as well as the lack of any kind of error handling (specifically around recomputing checksums) is, for me, reminiscent of the Unix Philosophy 1 (I've since found that this is not a novel idea 2). Error handling is left out of the specification and is deferred to the interfaces - I'm curious how thoroughly this is addressed in the TCP specification, or any of the protocols built on IP.
Datagrams
Example 1:
This is an example of the minimal data carrying internet datagram:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+
ICMP
Internet Control Message Protocol is implemented in the internet protocol module. Is it a further specification on IP? A datagram specification?
- ICMP is part of the Internet Protocol Suite and it is complementary to IP. Though ICMP messages are contained in standard IP packets the messages are processed as a special case. Errors are directed to the source address of the originating packet.
Four mechanisms of IP
Type of Service - Characterizes the service choices provided in the network.
Time to Live - TTL is set by the sender of the datagram and reduced at the points along the route where it is processed.
- Time to live has an upper bound of 255 seconds, or 4.25 minutes, though
it seems the Routing Information Protocol (version 1) dictates a maximum
of 15 hops per datagram. In playing with
traceroute
a bit I was interested to find that (on OSX at least) the default is to send UDP datagrams, rather than ICMP messages, setting the-I
flag reports maximum hop count as 64, I wonder though if that is mixing terminology between hops and TTL.
- Time to live has an upper bound of 255 seconds, or 4.25 minutes, though
it seems the Routing Information Protocol (version 1) dictates a maximum
of 15 hops per datagram. In playing with
Header Checksum - Verification that the information used in processing the datagram has been transmitted correctly. The data may contain errors.
Options - Include provisions for timestamps, security, and special routing.
- Record Route, it seems IP provides for no guarantees against "spoofing" of routes, as the route data is not part of the header checksum. I'm curious if something like TCP provides for stronger checks.
- To the casual observer, the security options seem to be an anachronism from IP's DoD origins. I'll check later what place they have today in the day to day operation for "civilians", but it is amusing to see the following specification:
00000000 00000000 - Unclassified
11110001 00110101 - Confidential
01111000 10011010 - EFTO
10111100 01001101 - MMMM
01011110 00100110 - PROG
10101111 00010011 - Restricted
11010111 10001000 - Secret
01101011 11000101 - Top Secret
Notes on the Four Mechanisms of IP
What are the types of service available? Are they standardized?
- At some point in time, some networks offered service precedence, usually dependent on load. Type of Service has been redefined over the years, most recently to a Differentiated Services Code Point. It seems the general idea was to build control mechanisms for throttling traffic at the protocol level.
How do the security options affect header checksums?
- Options are variable length and not part of the header checksum calculation
How can errors within the data create a passing checksum?
- The data is not a part of the header checksum
IP Headers
The minimum value for a correct IP header is 5, 32 bit words, 160 bits total.
- 4 bits for version
- 4 bits for IHL
- 8 bits type of services
- 16 bits total length
- 16 bits identification
- 3 bits flags, (don't fragment, more fragments)
- 13 bits fragment offset
- 8 bits TTL
- 8 bits protocol
- 16 bits header checksum
- 32 bits source address
- 32 bits destination address
A single datagram may be up to 65,535 octets in length (216), however such large datagrams are impractical and "all hosts must be prepared to accept datagrams of up to 576 octets"
TTL is measured in seconds, but each hop decrements by at least 1, giving an upper bound on the number of hops a datagram may traverse
What are the possible protocols for the protocol header field?
The protocol header is used to specify the next encapsulated protocol (TCP, ICMP, GGP). There are dozens of possible protocols.
The checksum is recomputed and verified at each point of processing. The algorithm is:
the 16 bit one's complement, of the one's complement sum, of all 16 bit words in the header.
(For purposes of computing the checksum, the value of the checksum field is zero).
One's complement is the binary inversion of a number (0011 => 1100) and a one's complement sum has the interesting property of "wrapping" (also called an end-around-borrow).
16 bit words in the header are:
- total length
- identification
- header checksum (given value 0)
Wrapping Up
The document wraps up with more specific examples and discussion around them. While informative, they are not particularly useful to me in my initial foray into the bowels of the internet. I can imagine referencing them if I am ever in a position to build directly on top of the internet protocol, but I can't say that's something I look forward to.
Take Away
While I was able to read the entire RFC in one sitting and then immediately write this post, I have to admit it took hours. Maybe I'm a slow writer, the reading was pretty quick but the amount of time spent cross checking and referencing other sources was significant. I think I have a better understanding of the fundamentals of the internet, but the more important lesson for me was how approachable it all is. Yes, RFCs make for very dry reading, but they don't lack for thoroughness. The idea of reading more, while not exciting, is not so daunting as it might have been.
More and more I find that large, daunting systems like TCP/IP or the Unix operating system are actually quite unsurprising and very well documented. This might be obvious to anyone else, but I think their longevity is probably due to this careful simplicity.
- Specifically, I thought of Richard Gabriel's Worse is Better
- "There's something deep in software development that not everyone gets but the people at Bell Labs did. It's the undercurrent of "the New Jersey Style", "Worse is Better", and "the Unix philosophy" - and it's not just a feature of Bell Labs software either. You see it in the original Ethernet specification where packet collision was considered normal.. and the same sort of idea is deep in the internet protocol. It's deep awareness of design ramification - a willingness to live with a little less to avoid the bigger mess and a willingness to see elegance in the real rather than the vision." Michael Feathers - On Loving C