I have some time off so I thought I'd take the opportunity to dive more deeply into the fundamental aspects of web development and work my way through the specification for one of the pillars of the internet, Internet Protocol.
In reading Foundations of Python Network Programming by Rhodes and Goerzen, I was linked to RFC 791: INTERNET PROTOCOL, DARPA Internet Program Protocol Specification. It seems as good a place as any to discover what this whole "internet" thing is. In an effort to really understand the content, I'm taking notes as I work my way through it. The entire document is only 45 pages, we'll see if I can make it in one sitting.
I'll try and make note of any questions as they occur to me and call them out with a top level bullet-point, any subsequent points should be to address those questions as I understand them. It's not the most logical flow but it works for me.
+------+ +-----+ +-----+ +-----+
|Telnet| | FTP | | TFTP| === | ... |
+------+ +-----+ +-----+ +-----+
| | | |
+-----+ +-----+ +-----+
| TCP | | UDP | === | ... |
+-----+ +-----+ +-----+
| | |
+--------------------------+----+
| Internet Protocol & ICMP |
+--------------------------+----+
|
+---------------------------+
| Local Network Protocol |
+---------------------------+
Protocol Relationships
In what instances would a local network protocol not use IP (or a higher level protocol such as TCP)?
"The internet protocol can capitalize on the services of its supporting networks to provide various types and qualities of service." What does this refer to specifically? TCP and UDP are all built on IP - what comprises the supporting networks of IP?
"A TCP module would call on the internet module to take a TCP segment as the data portion of an internet datagram". Would it be correct to say TCP is fitted over IP as a more stringent subset?
IP implements two functions, addressing and fragmentation.
At what level of the "stack" does routing fall? Addressing specifies a destination, but there are all manner of routes to a given destination. Is it left to the discretion of the router?
Several examples are given of specific classes (A, B, C) for addresses, differentiated by the network number and local address number (7/24, 14/16, 21/8) respectively. What is the significance of the alternate address classes?
a.b.c.d
were a given IP address, in class A the network ID would be specified by
a
while b.c.d
were for a host ID. Class C would be the inverse, with
a.b.c
used to identify the network ID and d
for the host ID.Fragmentation is necessary when traversing any network that limits packets to a smaller size than any previous step along the route.
Has the utility of small networks declined with advances in technology (since 1981)?
Fragmented datagrams include an identification field in order to reassemble, if IP makes no provision for sequencing, how are fragmented datagrams reassembled?
If a portion of a fragmented datagram is lost or malformed, is the whole datagram discarded?
A recursive fragmentation procedure is outlined:
if total_length <= maximum_transmission_unit
submit datagram to next step in datagram processing
elif dont_fragment == 1
discard_datagram
else
copy_original_header
old_header_length = header_length
old_total_length = total_length
old_fragment_offset = fragment_offset
old_more_fragment_flag = more_fragments_flag
number_fragment_blocks = (maximum_transmission_unit
- header_length * 4)
/ 8
Attach the first number_fragment_blocks * 8 data octets
# Correct the header:
more_fragments_flag = 1
total_length = (header_length * 4) + (number_fragment_blocks * 8)
recompute_checksum
return fragment # submit this fragment to the next step in datagram
# processing
# To produce the second fragment:
selectively_copy_internet_header # some options are not copied, see option
# definitions
append_remaining_data
# Correct the header:
header_length = (((old_header_length * 4)
- (length of options not copied))
+ 3)
/ 4
total_length = old_total_length
- (number_fragment_blocks * 8)
- ((old_header_length - header_length) * 4)
fragment_offset = old_fragment_offset + number_fragment_blocks
more_fragments_flag = old_more_fragment_flag
recompute_checksum
recur
The number of potential failure points in fragmentation and reassembly, as well as the lack of any kind of error handling (specifically around recomputing checksums) is, for me, reminiscent of the Unix Philosophy 1 (I've since found that this is not a novel idea 2). Error handling is left out of the specification and is deferred to the interfaces - I'm curious how thoroughly this is addressed in the TCP specification, or any of the protocols built on IP.
Example 1:
This is an example of the minimal data carrying internet datagram:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+
Internet Control Message Protocol is implemented in the internet protocol module. Is it a further specification on IP? A datagram specification?
Type of Service - Characterizes the service choices provided in the network.
Time to Live - TTL is set by the sender of the datagram and reduced at the points along the route where it is processed.
traceroute
a bit I was
interested to find that (on OSX at least) the default is to send UDP
datagrams, rather than ICMP messages, setting the -I
flag reports
maximum hop count as 64, I wonder though if that is mixing terminology
between hops and TTL.Header Checksum - Verification that the information used in processing the datagram has been transmitted correctly. The data may contain errors.
Options - Include provisions for timestamps, security, and special routing.
00000000 00000000 - Unclassified
11110001 00110101 - Confidential
01111000 10011010 - EFTO
10111100 01001101 - MMMM
01011110 00100110 - PROG
10101111 00010011 - Restricted
11010111 10001000 - Secret
01101011 11000101 - Top Secret
What are the types of service available? Are they standardized?
How do the security options affect header checksums?
How can errors within the data create a passing checksum?
The minimum value for a correct IP header is 5, 32 bit words, 160 bits total.
- 4 bits for version
- 4 bits for IHL
- 8 bits type of services
- 16 bits total length
- 16 bits identification
- 3 bits flags, (don't fragment, more fragments)
- 13 bits fragment offset
- 8 bits TTL
- 8 bits protocol
- 16 bits header checksum
- 32 bits source address
- 32 bits destination address
A single datagram may be up to 65,535 octets in length (216), however such large datagrams are impractical and "all hosts must be prepared to accept datagrams of up to 576 octets"
TTL is measured in seconds, but each hop decrements by at least 1, giving an upper bound on the number of hops a datagram may traverse
What are the possible protocols for the protocol header field?
The protocol header is used to specify the next encapsulated protocol (TCP, ICMP, GGP). There are dozens of possible protocols.
The checksum is recomputed and verified at each point of processing. The algorithm is:
the 16 bit one's complement, of the one's complement sum, of all 16 bit words in the header.
(For purposes of computing the checksum, the value of the checksum field is zero).
One's complement is the binary inversion of a number (0011 => 1100) and a one's complement sum has the interesting property of "wrapping" (also called an end-around-borrow).
16 bit words in the header are:
The document wraps up with more specific examples and discussion around them. While informative, they are not particularly useful to me in my initial foray into the bowels of the internet. I can imagine referencing them if I am ever in a position to build directly on top of the internet protocol, but I can't say that's something I look forward to.
While I was able to read the entire RFC in one sitting and then immediately write this post, I have to admit it took hours. Maybe I'm a slow writer, the reading was pretty quick but the amount of time spent cross checking and referencing other sources was significant. I think I have a better understanding of the fundamentals of the internet, but the more important lesson for me was how approachable it all is. Yes, RFCs make for very dry reading, but they don't lack for thoroughness. The idea of reading more, while not exciting, is not so daunting as it might have been.
More and more I find that large, daunting systems like TCP/IP or the Unix operating system are actually quite unsurprising and very well documented. This might be obvious to anyone else, but I think their longevity is probably due to this careful simplicity.