I have some time off so I thought I'd take the opportunity to dive more deeply into the fundamental aspects of web development and work my way through the specification for one of the pillars of the internet, Internet Protocol.
In reading Foundations of Python Network Programming by Rhodes and Goerzen, I was linked to RFC 791: INTERNET PROTOCOL, DARPA Internet Program Protocol Specification. It seems as good a place as any to discover what this whole "internet" thing is. In an effort to really understand the content, I'm taking notes as I work my way through it. The entire document is only 45 pages, we'll see if I can make it in one sitting.
I'll try and make note of any questions as they occur to me and call them out with a top level bullet-point, any subsequent points should be to address those questions as I understand them. It's not the most logical flow but it works for me.
+------+ +-----+ +-----+ +-----+ |Telnet| | FTP | | TFTP| === | ... | +------+ +-----+ +-----+ +-----+ | | | | +-----+ +-----+ +-----+ | TCP | | UDP | === | ... | +-----+ +-----+ +-----+ | | | +--------------------------+----+ | Internet Protocol & ICMP | +--------------------------+----+ | +---------------------------+ | Local Network Protocol | +---------------------------+ Protocol Relationships
In what instances would a local network protocol not use IP (or a higher level protocol such as TCP)?
"The internet protocol can capitalize on the services of its supporting networks to provide various types and qualities of service." What does this refer to specifically? TCP and UDP are all built on IP - what comprises the supporting networks of IP?
"A TCP module would call on the internet module to take a TCP segment as the data portion of an internet datagram". Would it be correct to say TCP is fitted over IP as a more stringent subset?
IP implements two functions, addressing and fragmentation.
At what level of the "stack" does routing fall? Addressing specifies a destination, but there are all manner of routes to a given destination. Is it left to the discretion of the router?
Several examples are given of specific classes (A, B, C) for addresses, differentiated by the network number and local address number (7/24, 14/16, 21/8) respectively. What is the significance of the alternate address classes?
a.b.c.dwere a given IP address, in class A the network ID would be specified by
b.c.dwere for a host ID. Class C would be the inverse, with
a.b.cused to identify the network ID and
dfor the host ID.
Fragmentation is necessary when traversing any network that limits packets to a smaller size than any previous step along the route.
Has the utility of small networks declined with advances in technology (since 1981)?
Fragmented datagrams include an identification field in order to reassemble, if IP makes no provision for sequencing, how are fragmented datagrams reassembled?
If a portion of a fragmented datagram is lost or malformed, is the whole datagram discarded?
A recursive fragmentation procedure is outlined:
if total_length <= maximum_transmission_unit submit datagram to next step in datagram processing elif dont_fragment == 1 discard_datagram else copy_original_header old_header_length = header_length old_total_length = total_length old_fragment_offset = fragment_offset old_more_fragment_flag = more_fragments_flag number_fragment_blocks = (maximum_transmission_unit - header_length * 4) / 8 Attach the first number_fragment_blocks * 8 data octets # Correct the header: more_fragments_flag = 1 total_length = (header_length * 4) + (number_fragment_blocks * 8) recompute_checksum return fragment # submit this fragment to the next step in datagram # processing # To produce the second fragment: selectively_copy_internet_header # some options are not copied, see option # definitions append_remaining_data # Correct the header: header_length = (((old_header_length * 4) - (length of options not copied)) + 3) / 4 total_length = old_total_length - (number_fragment_blocks * 8) - ((old_header_length - header_length) * 4) fragment_offset = old_fragment_offset + number_fragment_blocks more_fragments_flag = old_more_fragment_flag recompute_checksum recur
The number of potential failure points in fragmentation and reassembly, as well as the lack of any kind of error handling (specifically around recomputing checksums) is, for me, reminiscent of the Unix Philosophy 1 (I've since found that this is not a novel idea 2). Error handling is left out of the specification and is deferred to the interfaces - I'm curious how thoroughly this is addressed in the TCP specification, or any of the protocols built on IP.
Example 1: This is an example of the minimal data carrying internet datagram: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification = 111 |Flg=0| Fragment Offset = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time = 123 | Protocol = 1 | header checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | source address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | destination address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+
Internet Control Message Protocol is implemented in the internet protocol module. Is it a further specification on IP? A datagram specification?
Type of Service - Characterizes the service choices provided in the network.
Time to Live - TTL is set by the sender of the datagram and reduced at the points along the route where it is processed.
traceroutea bit I was interested to find that (on OSX at least) the default is to send UDP datagrams, rather than ICMP messages, setting the
-Iflag reports maximum hop count as 64, I wonder though if that is mixing terminology between hops and TTL.
Header Checksum - Verification that the information used in processing the datagram has been transmitted correctly. The data may contain errors.
Options - Include provisions for timestamps, security, and special routing.
00000000 00000000 - Unclassified 11110001 00110101 - Confidential 01111000 10011010 - EFTO 10111100 01001101 - MMMM 01011110 00100110 - PROG 10101111 00010011 - Restricted 11010111 10001000 - Secret 01101011 11000101 - Top Secret
What are the types of service available? Are they standardized?
How do the security options affect header checksums?
How can errors within the data create a passing checksum?
The minimum value for a correct IP header is 5, 32 bit words, 160 bits total.
- 4 bits for version - 4 bits for IHL - 8 bits type of services - 16 bits total length - 16 bits identification - 3 bits flags, (don't fragment, more fragments) - 13 bits fragment offset - 8 bits TTL - 8 bits protocol - 16 bits header checksum - 32 bits source address - 32 bits destination address
A single datagram may be up to 65,535 octets in length (216), however such large datagrams are impractical and "all hosts must be prepared to accept datagrams of up to 576 octets"
TTL is measured in seconds, but each hop decrements by at least 1, giving an upper bound on the number of hops a datagram may traverse
What are the possible protocols for the protocol header field?
The checksum is recomputed and verified at each point of processing. The algorithm is:
the 16 bit one's complement, of the one's complement sum, of all 16 bit words in the header.
(For purposes of computing the checksum, the value of the checksum field is zero).
One's complement is the binary inversion of a number (0011 => 1100) and a one's complement sum has the interesting property of "wrapping" (also called an end-around-borrow).
16 bit words in the header are:
The document wraps up with more specific examples and discussion around them. While informative, they are not particularly useful to me in my initial foray into the bowels of the internet. I can imagine referencing them if I am ever in a position to build directly on top of the internet protocol, but I can't say that's something I look forward to.
While I was able to read the entire RFC in one sitting and then immediately write this post, I have to admit it took hours. Maybe I'm a slow writer, the reading was pretty quick but the amount of time spent cross checking and referencing other sources was significant. I think I have a better understanding of the fundamentals of the internet, but the more important lesson for me was how approachable it all is. Yes, RFCs make for very dry reading, but they don't lack for thoroughness. The idea of reading more, while not exciting, is not so daunting as it might have been.
More and more I find that large, daunting systems like TCP/IP or the Unix operating system are actually quite unsurprising and very well documented. This might be obvious to anyone else, but I think their longevity is probably due to this careful simplicity.