[nolan@nprescott.com] $>  cat blog archive feed

RFC 791, Internet Protocol

2015-03-11

I have some time off so I thought I'd take the opportunity to dive more deeply into the fundamental aspects of web development and work my way through the specification for one of the pillars of the internet, Internet Protocol.

In reading Foundations of Python Network Programming by Rhodes and Goerzen, I was linked to RFC 791: INTERNET PROTOCOL, DARPA Internet Program Protocol Specification. It seems as good a place as any to discover what this whole "internet" thing is. In an effort to really understand the content, I'm taking notes as I work my way through it. The entire document is only 45 pages, we'll see if I can make it in one sitting.

Before I Start

I'll try and make note of any questions as they occur to me and call them out with a top level bullet-point, any subsequent points should be to address those questions as I understand them. It's not the most logical flow but it works for me.

Diving Right In

Protocol Hierarchy

     +------+ +-----+ +-----+     +-----+
     |Telnet| | FTP | | TFTP| === | ... |
     +------+ +-----+ +-----+     +-----+
           |   |         |           |
          +-----+     +-----+     +-----+
          | TCP |     | UDP | === | ... |
          +-----+     +-----+     +-----+
             |           |           |
          +--------------------------+----+
          |    Internet Protocol & ICMP   |
          +--------------------------+----+
                         |
            +---------------------------+
            |   Local Network Protocol  |
            +---------------------------+

                Protocol Relationships

IP implements two functions, addressing and fragmentation.

Addressing

Fragmentation

Fragmentation is necessary when traversing any network that limits packets to a smaller size than any previous step along the route.

A recursive fragmentation procedure is outlined:

    if total_length <= maximum_transmission_unit
        submit datagram to next step in datagram processing
    elif dont_fragment == 1
        discard_datagram
    else
        copy_original_header
        old_header_length = header_length
        old_total_length = total_length
        old_fragment_offset = fragment_offset
        old_more_fragment_flag = more_fragments_flag

        number_fragment_blocks = (maximum_transmission_unit
                                 - header_length * 4)
                                 / 8

        Attach the first number_fragment_blocks * 8 data octets

        # Correct the header:
        more_fragments_flag = 1
        total_length = (header_length * 4) + (number_fragment_blocks * 8)

        recompute_checksum

        return fragment # submit this fragment to the next step in datagram
                        # processing

        # To produce the second fragment:
        selectively_copy_internet_header # some options are not copied, see option
                                         # definitions

        append_remaining_data

         # Correct the header:
         header_length = (((old_header_length * 4)
                         - (length of options not copied))
                         + 3)
                         / 4

        total_length = old_total_length
                       - (number_fragment_blocks * 8)
                       - ((old_header_length - header_length) * 4)

        fragment_offset = old_fragment_offset + number_fragment_blocks
        more_fragments_flag = old_more_fragment_flag

        recompute_checksum

        recur

The number of potential failure points in fragmentation and reassembly, as well as the lack of any kind of error handling (specifically around recomputing checksums) is, for me, reminiscent of the Unix Philosophy 1 (I've since found that this is not a novel idea 2). Error handling is left out of the specification and is deferred to the interfaces - I'm curious how thoroughly this is addressed in the TCP specification, or any of the protocols built on IP.

Datagrams

    Example 1:

      This is an example of the minimal data carrying internet datagram:


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Ver= 4 |IHL= 5 |Type of Service|        Total Length = 21      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Identification = 111     |Flg=0|   Fragment Offset = 0   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   Time = 123  |  Protocol = 1 |        header checksum        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         source address                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      destination address                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |     data      |
       +-+-+-+-+-+-+-+-+

ICMP

Four mechanisms of IP

  1. Type of Service - Characterizes the service choices provided in the network.

  2. Time to Live - TTL is set by the sender of the datagram and reduced at the points along the route where it is processed.

  3. Header Checksum - Verification that the information used in processing the datagram has been transmitted correctly. The data may contain errors.

  4. Options - Include provisions for timestamps, security, and special routing.

            00000000 00000000 - Unclassified
            11110001 00110101 - Confidential
            01111000 10011010 - EFTO
            10111100 01001101 - MMMM
            01011110 00100110 - PROG
            10101111 00010011 - Restricted
            11010111 10001000 - Secret
            01101011 11000101 - Top Secret

Notes on the Four Mechanisms of IP

IP Headers

The minimum value for a correct IP header is 5, 32 bit words, 160 bits total.

     - 4 bits for version
     - 4 bits for IHL
     - 8 bits type of services
     - 16 bits total length
     - 16 bits identification
     - 3 bits flags, (don't fragment, more fragments)
     - 13 bits fragment offset
     - 8 bits TTL
     - 8 bits protocol
     - 16 bits header checksum
     - 32 bits source address
     - 32 bits destination address

A single datagram may be up to 65,535 octets in length (216), however such large datagrams are impractical and "all hosts must be prepared to accept datagrams of up to 576 octets"

TTL is measured in seconds, but each hop decrements by at least 1, giving an upper bound on the number of hops a datagram may traverse

The checksum is recomputed and verified at each point of processing. The algorithm is:

the 16 bit one's complement, of the one's complement sum, of all 16 bit words in the header.

(For purposes of computing the checksum, the value of the checksum field is zero).

One's complement is the binary inversion of a number (0011 => 1100) and a one's complement sum has the interesting property of "wrapping" (also called an end-around-borrow).

16 bit words in the header are:

  1. total length
  2. identification
  3. header checksum (given value 0)

Wrapping Up

The document wraps up with more specific examples and discussion around them. While informative, they are not particularly useful to me in my initial foray into the bowels of the internet. I can imagine referencing them if I am ever in a position to build directly on top of the internet protocol, but I can't say that's something I look forward to.

Take Away

While I was able to read the entire RFC in one sitting and then immediately write this post, I have to admit it took hours. Maybe I'm a slow writer, the reading was pretty quick but the amount of time spent cross checking and referencing other sources was significant. I think I have a better understanding of the fundamentals of the internet, but the more important lesson for me was how approachable it all is. Yes, RFCs make for very dry reading, but they don't lack for thoroughness. The idea of reading more, while not exciting, is not so daunting as it might have been.

More and more I find that large, daunting systems like TCP/IP or the Unix operating system are actually quite unsurprising and very well documented. This might be obvious to anyone else, but I think their longevity is probably due to this careful simplicity.


  1. Specifically, I thought of Richard Gabriel's Worse is Better
  2. "There's something deep in software development that not everyone gets but the people at Bell Labs did. It's the undercurrent of "the New Jersey Style", "Worse is Better", and "the Unix philosophy" - and it's not just a feature of Bell Labs software either. You see it in the original Ethernet specification where packet collision was considered normal.. and the same sort of idea is deep in the internet protocol. It's deep awareness of design ramification - a willingness to live with a little less to avoid the bigger mess and a willingness to see elegance in the real rather than the vision." Michael Feathers - On Loving C
[nolan@nprescott.com] $> █