RFC 791, Internet Protocol

2015-03-11

I have some time off so I thought I'd take the opportunity to dive more deeply into the fundamental aspects of web development and work my way through the specification for one of the pillars of the internet, Internet Protocol.

In reading Foundations of Python Network Programming by Rhodes and Goerzen, I was linked to RFC 791: INTERNET PROTOCOL, DARPA Internet Program Protocol Specification. It seems as good a place as any to discover what this whole "internet" thing is. In an effort to really understand the content, I'm taking notes as I work my way through it. The entire document is only 45 pages, we'll see if I can make it in one sitting.

Before I Start

I'll try and make note of any questions as they occur to me and call them out with a top level bullet-point, any subsequent points should be to address those questions as I understand them. It's not the most logical flow but it works for me.

Diving Right In

Protocol Hierarchy

     +------+ +-----+ +-----+     +-----+
     |Telnet| | FTP | | TFTP| === | ... |
     +------+ +-----+ +-----+     +-----+
           |   |         |           |
          +-----+     +-----+     +-----+
          | TCP |     | UDP | === | ... |
          +-----+     +-----+     +-----+
             |           |           |
          +--------------------------+----+
          |    Internet Protocol & ICMP   |
          +--------------------------+----+
                         |
            +---------------------------+
            |   Local Network Protocol  |
            +---------------------------+

                Protocol Relationships

In what instances would a local network protocol not use IP (or a higher level protocol such as TCP)?
- The local network protocols are those at the OSI Model's Data link layer and Physical layer, the local network is a more fundamental operating level than those protocols built on IP.
"The internet protocol can capitalize on the services of its supporting networks to provide various types and qualities of service." What does this refer to specifically? TCP and UDP are all built on IP - what comprises the supporting networks of IP?
- According to the OSI Model, the supporting networks of IP are the Data Link layer and the Physical layer. MAC (Media Access Control) addresses and Ethernet protocols fall within the Data link layer, raw bits and electrical connectors are the subject of the physical layer.
"A TCP module would call on the internet module to take a TCP segment as the data portion of an internet datagram". Would it be correct to say TCP is fitted over IP as a more stringent subset?
- Not a subset but a complement built on to IP. TCP segments are encapsulated into an IP datagram before exchange.

IP implements two functions, addressing and fragmentation.

Addressing

At what level of the "stack" does routing fall? Addressing specifies a destination, but there are all manner of routes to a given destination. Is it left to the discretion of the router?
- IP deals in addresses, it is the responsibility of local nets and gateways to perform routing operations (a lower level than IP)
Several examples are given of specific classes (A, B, C) for addresses, differentiated by the network number and local address number (7/24, 14/16, 21/8) respectively. What is the significance of the alternate address classes?
- Not very relevant any more since the industry has apparently switched to classless routing. The initial intent was to designate the various combinations of hosts to networks (Class A would be used for very large numbers of total hosts, Class C was intended for LANs). If a.b.c.d were a given IP address, in class A the network ID would be specified by a while b.c.d were for a host ID. Class C would be the inverse, with a.b.c used to identify the network ID and d for the host ID.

Fragmentation

Fragmentation is necessary when traversing any network that limits packets to a smaller size than any previous step along the route.

Has the utility of small networks declined with advances in technology (since 1981)?
- Seemingly not, though not a reflection of small/constrained networks, the maximum transmission unit is dependent on communication interface standards, for example Ethernet V2, which specifies a MTU as a 1500-byte packet.
Fragmented datagrams include an identification field in order to reassemble, if IP makes no provision for sequencing, how are fragmented datagrams reassembled?
- Reassembly is done through the use of the fragment offset and length fields, the omission of sequencing information refers to sequencing between discrete packets, rather than fragmented packets/datagrams.
If a portion of a fragmented datagram is lost or malformed, is the whole datagram discarded?
- The datagram is discarded, which is why IP is said to make no guarantees on the transmission or delivery of packets. Higher level protocols (TCP) would trigger re-transmission on packet loss.

A recursive fragmentation procedure is outlined:

    if total_length <= maximum_transmission_unit
        submit datagram to next step in datagram processing
    elif dont_fragment == 1
        discard_datagram
    else
        copy_original_header
        old_header_length = header_length
        old_total_length = total_length
        old_fragment_offset = fragment_offset
        old_more_fragment_flag = more_fragments_flag

        number_fragment_blocks = (maximum_transmission_unit
                                 - header_length * 4)
                                 / 8

        Attach the first number_fragment_blocks * 8 data octets

        # Correct the header:
        more_fragments_flag = 1
        total_length = (header_length * 4) + (number_fragment_blocks * 8)

        recompute_checksum

        return fragment # submit this fragment to the next step in datagram
                        # processing

        # To produce the second fragment:
        selectively_copy_internet_header # some options are not copied, see option
                                         # definitions

        append_remaining_data

         # Correct the header:
         header_length = (((old_header_length * 4)
                         - (length of options not copied))
                         + 3)
                         / 4

        total_length = old_total_length
                       - (number_fragment_blocks * 8)
                       - ((old_header_length - header_length) * 4)

        fragment_offset = old_fragment_offset + number_fragment_blocks
        more_fragments_flag = old_more_fragment_flag

        recompute_checksum

        recur

The number of potential failure points in fragmentation and reassembly, as well as the lack of any kind of error handling (specifically around recomputing checksums) is, for me, reminiscent of the Unix Philosophy ¹ (I've since found that this is not a novel idea ²). Error handling is left out of the specification and is deferred to the interfaces - I'm curious how thoroughly this is addressed in the TCP specification, or any of the protocols built on IP.

Datagrams

    Example 1:

      This is an example of the minimal data carrying internet datagram:


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Ver= 4 |IHL= 5 |Type of Service|        Total Length = 21      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Identification = 111     |Flg=0|   Fragment Offset = 0   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   Time = 123  |  Protocol = 1 |        header checksum        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         source address                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      destination address                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |     data      |
       +-+-+-+-+-+-+-+-+

ICMP

Internet Control Message Protocol is implemented in the internet protocol module. Is it a further specification on IP? A datagram specification?
- ICMP is part of the Internet Protocol Suite and it is complementary to IP. Though ICMP messages are contained in standard IP packets the messages are processed as a special case. Errors are directed to the source address of the originating packet.

Four mechanisms of IP

Type of Service - Characterizes the service choices provided in the network.
Time to Live - TTL is set by the sender of the datagram and reduced at the points along the route where it is processed.
- Time to live has an upper bound of 255 seconds, or 4.25 minutes, though it seems the Routing Information Protocol (version 1) dictates a maximum of 15 hops per datagram. In playing with traceroute a bit I was interested to find that (on OSX at least) the default is to send UDP datagrams, rather than ICMP messages, setting the -I flag reports maximum hop count as 64, I wonder though if that is mixing terminology between hops and TTL.
Header Checksum - Verification that the information used in processing the datagram has been transmitted correctly. The data may contain errors.
Options - Include provisions for timestamps, security, and special routing.
- Record Route, it seems IP provides for no guarantees against "spoofing" of routes, as the route data is not part of the header checksum. I'm curious if something like TCP provides for stronger checks.
- To the casual observer, the security options seem to be an anachronism from IP's DoD origins. I'll check later what place they have today in the day to day operation for "civilians", but it is amusing to see the following specification:

            00000000 00000000 - Unclassified
            11110001 00110101 - Confidential
            01111000 10011010 - EFTO
            10111100 01001101 - MMMM
            01011110 00100110 - PROG
            10101111 00010011 - Restricted
            11010111 10001000 - Secret
            01101011 11000101 - Top Secret

Notes on the Four Mechanisms of IP

What are the types of service available? Are they standardized?
- At some point in time, some networks offered service precedence, usually dependent on load. Type of Service has been redefined over the years, most recently to a Differentiated Services Code Point. It seems the general idea was to build control mechanisms for throttling traffic at the protocol level.
How do the security options affect header checksums?
- Options are variable length and not part of the header checksum calculation
How can errors within the data create a passing checksum?
- The data is not a part of the header checksum

IP Headers

The minimum value for a correct IP header is 5, 32 bit words, 160 bits total.

     - 4 bits for version
     - 4 bits for IHL
     - 8 bits type of services
     - 16 bits total length
     - 16 bits identification
     - 3 bits flags, (don't fragment, more fragments)
     - 13 bits fragment offset
     - 8 bits TTL
     - 8 bits protocol
     - 16 bits header checksum
     - 32 bits source address
     - 32 bits destination address

A single datagram may be up to 65,535 octets in length (216), however such large datagrams are impractical and "all hosts must be prepared to accept datagrams of up to 576 octets"

TTL is measured in seconds, but each hop decrements by at least 1, giving an upper bound on the number of hops a datagram may traverse

What are the possible protocols for the protocol header field?
The protocol header is used to specify the next encapsulated protocol (TCP, ICMP, GGP). There are dozens of possible protocols.

The checksum is recomputed and verified at each point of processing. The algorithm is:

the 16 bit one's complement, of the one's complement sum, of all 16 bit words in the header.

(For purposes of computing the checksum, the value of the checksum field is zero).

One's complement is the binary inversion of a number (0011 => 1100) and a one's complement sum has the interesting property of "wrapping" (also called an end-around-borrow).

16 bit words in the header are:

total length
identification
header checksum (given value 0)

Wrapping Up

The document wraps up with more specific examples and discussion around them. While informative, they are not particularly useful to me in my initial foray into the bowels of the internet. I can imagine referencing them if I am ever in a position to build directly on top of the internet protocol, but I can't say that's something I look forward to.

Take Away

While I was able to read the entire RFC in one sitting and then immediately write this post, I have to admit it took hours. Maybe I'm a slow writer, the reading was pretty quick but the amount of time spent cross checking and referencing other sources was significant. I think I have a better understanding of the fundamentals of the internet, but the more important lesson for me was how approachable it all is. Yes, RFCs make for very dry reading, but they don't lack for thoroughness. The idea of reading more, while not exciting, is not so daunting as it might have been.

More and more I find that large, daunting systems like TCP/IP or the Unix operating system are actually quite unsurprising and very well documented. This might be obvious to anyone else, but I think their longevity is probably due to this careful simplicity.

Specifically, I thought of Richard Gabriel's Worse is Better
"There's something deep in software development that not everyone gets but the people at Bell Labs did. It's the undercurrent of "the New Jersey Style", "Worse is Better", and "the Unix philosophy" - and it's not just a feature of Bell Labs software either. You see it in the original Ethernet specification where packet collision was considered normal.. and the same sort of idea is deep in the internet protocol. It's deep awareness of design ramification - a willingness to live with a little less to avoid the bigger mess and a willingness to see elegance in the real rather than the vision." Michael Feathers - On Loving C