Draft Simple Multilink June 1993 INTERNET DRAFT Expires: December 31st, 1993 A Simple Multilink Protocol for Synchronizing the Transmission of Multi-protocol Datagrams. Keith Sklower Computer Science Department University of California, Berkeley 1. Status of This Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appro- priate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the 1id-abstracts.txt listing contained in the internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current status of any Internet Draft. 2. Abstract This document proposes a method for splitting and recombin- ing datagrams across multiple logical data links. Such facilities are desirable to exploit the potential for increased bandwidth offered by multiple bearer channels in ISDN, yet to do so in such away that minimizes reordering of packets. More precisely, modifications to RFC1294[4] are proposed, and new PPP[2] options and protocols are suggested. Sklower [Page 1] Draft Simple Multilink June 1993 3. Acknowledgements The author specifically wishes to thank Brian Lloyd of Lloyd & Associates, Fred Baker of ACC, Craig Fox of Network Sys- tems, Gerry Meyer of Spider Systems, and the members of the IP over Large Public Data Networks and PPP extensions work- ing groups, for much useful discussion on the subject. 4. Conventions The following language conventions are used in the items of specification in this document: o Must, Shall or Mandatory -- the item is an absolute requirement of the specification. o Should or Recommended -- the item should generally be followed for all but exceptional circumstances. o May or Optional -- the item is truly optional and may be followed or ignored according to the needs of the implementor. 5. Introduction Basic Rate and Primary Rate ISDN both offer the possibility of opening multiple simultaneous channels between systems, giving users additional bandwidth on demand (for additional cost). Previous proposals for the transmission of internet protocols over ISDN have stated as a goal the ability to make use of this capability, (e.g. Leifer et al. [1]). There are proposals being advanced by communications providers for providing synchronization between multiple streams at the bit level; such features are not as yet widely deployed and it may be useful to have a software solution as what may prove to be a stopgap measure. Furthermore, even if the ISDN service providers guarantee bit-synchronization between different switched circuits, there may not be sufficiently flexible hardware to combine the multiple bit streams in arbitrary orders for HDLC bit- unstuffing. There are other instances where bandwidth on demand can be exploited, such as opening additional X.25 SVC where the window size is limited to two by international agreement. The simplest possible algorithms of alternating packets between channels on a space available basis (which might be called the Bank Teller's algorithm) may have undesirable side effects due to reordering of packets. Sklower [Page 2] Draft Simple Multilink June 1993 By relaxing some requirements of the packet segmentation and reassembly algorithm described in RFC1294, and introducing only a modest amount more complexity into implementations supporting it, one can split packets amongst parallel vir- tual circuits between systems in such a way that packets do not become reordered, or at least the likelihood of this is greatly reduced. The method discussed here is similar to the multilink proto- col described in ISO 7776, but offers the additional ability to split and recombine packets, thereby reducing latency. Furthermore, there is no requirement here for acknowledged- mode operation on the link layer, although that is option- ally permitted. Any method for bandwidth aggregation would require some means of identifying which channels are to participate in such a process. This could be achieved specifically in the case of ISDN by use of the calling party information ele- ment, but more generally by methods discussed in the draft ``Protocol Negotiation for the Multiprotocol Interconnect'' (PNMI[7]), such as by using PPP's authentication protocols [3]. For Frame Relay run over dedicated channels, some sort of manual configuration could suffice. 6. Generic Description 6.1. Packet Formats As stated above, the concept is to use the packet segmenta- tion/reassembly formats defined in RFC1294, but to allow the fragments to be transmitted over multiple virtual circuits (or potentially multiple physical links). We'll refresh the reader's memory by shamelessly paraphrasing the section on segmentation from that RFC: Packet fragments may be thought of as a special type of bridged packets, for a fictitious type of media. Large packets are first encapsulated according to normal RFC1294 procedures, and then are broken up into multiple frames sized appropriately for the multiple physical links and each section is appended to a bridged packet header followed by a fragmentation header. (Thus the first fragment of an packet in RFC1294 will have two headers, one for the fragment, fol- lowed by the header for the packet itself). Within RFC1294 fragments are encapsulated using the SNAP format with an OUI of 0x00-80-C2 and a PID of 0x00-0D. Individual fragments will, therefore, have the following format: Sklower [Page 3] Draft Simple Multilink June 1993 Figure 1: Fragment format for RFC1294 +---------------+---------------+ | Q.922 Address | +---------------+---------------+ | Control 0x03 | pad 0x00 | +---------------+---------------+ | NLPID 0x80 | OUI 0x00 | +---------------+---------------+ | OUI 0x80-C2 | +---------------+---------------+ | PID 0x00-0D | +---------------+---------------+ | sequence number | +-+-+-+-+-+-----+---------------+ |F| MBZ |offset | +-+-------+-----+---------------+ | fragment data | | . | | . | | . | +---------------+---------------+ | FCS | +---------------+---------------+ The sequence field is a two octet identifier that is incre- mented every time a new complete message is fragmented. It allows detection of lost frames and is set to a random value at initialization. The offset field is an 11 bit value representing the logical offset of this fragment in bytes divided by 32. The first fragment must have an offset of zero. The (F)inal bit is a one bit field set to 1 on the last fragment and set to 0 for all other fragments. The reserved field is 4 bits long and is not currently defined. It must be set to 0. In RFC1294, a separate and single reassembly structure is associated with each virtual circuit, and the sequence num- ber is interpreted only in the context of that virtual cir- cuit. By contrast, we propose having a reassembly structure that is associated with a group of virtual circuits and that sequence number then be interpreted in the context of the group. Sklower [Page 4] Draft Simple Multilink June 1993 6.2. Trading Buffer Space Against Fragment Loss In the RFC1294 segmentation procedure, where fragments must arrive in sequence and only on one circuit, it is easy to determine the amount of buffer space required for reassembly and easy to recognize when loss has occurred. In a multi- link procedure, where one channel may be delayed with respect to the other channel, fragments are may not arrive in the same sequence they left the sender. So, it is more difficult to determine that a fragment has been lost, and more difficult to estimate the amount of buffer space required. However, in any case the sender MUST transmit fragments with non-decreasing sequence numbers over any constituent circuit in a multilink arrangement. In this section we present a default strategy for minimizing the buffer space required for retaining enough fragments to determine that a fragment has been lost. 6.2.1. Sender-Simple Reassembly The idea is that the receiver should keep track of the mini- mum of the sequence numbers of all fragments received on all channels participating in the multilink procedure. (Call this M). Every time M advances, one can discard all incom- plete packets with sequence number less than M. The sender should transmit a null fragment increasing the sequence number for each channel when the queue for each physical link becomes empty; otherwise the receiver will stall until new packets arrive. The amount of buffering required to guarantee correct recog- nition of fragment lost depends on the relative delay between the channels (D[c1,c2]), the number of channels par- ticipating (say N), the data rate of each channel R[c], the fragment size (f) and the maximum permissible reassembled size (M). When using PPP or PNMI, the delay between chan- nels can be determined by LCP echo request and echo reply packets. In the common case where the data rates are the same, one could define for each channel, its slippage to be the band- width times the delay for that channel relative to the slow- est, S[c] = R[c] * D[c, c-worst]. Given these conditions having buffer space of N*(M-f) + S[1] + S[2] + ... + S[n] should be sufficient to insure that you have not have erro- neously thrown away an incomplete packet before its complet- ing fragment arrives. Sklower [Page 5] Draft Simple Multilink June 1993 6.2.2. Other Reassembly Schemes As this was intended to be a simple fragmentation/reassembly protocol, we'll just mention that it would be possible to have a more-elaborate timer and/or buffer size based imple- mentations (like the tradition IP reassembly queues in end- system implementations), but we'll leave that to more ambi- tious future extensions. 7. PPP The Point-to-Point Protocol (PPP) [5] provides a standard method of encapsulating Network Layer protocol information over (virtual) point-to-point links. PPP has four main components: 1. A method for encapsulating datagrams over serial links. 2. A Link Control Protocol (LCP) for establishing, config- uring, and testing the data-link connection. 3. A family of Network Control Protocols (NCPs) for estab- lishing and configuring different network-layer proto- cols. 4. A family of Network-Layer Protocols (NPs) which are the classes of data to be transmitted themselves. 7.1. Crude Description of Overall Intent In order to establish communications over a point-to-point link, each end of the PPP link must first send LCP packets to configure the data link during Link Establishment phase. After the link has been established, PPP provides for an Authentication phase before proceeding to the Network-Layer Protocol phase. Since the idea is to ``tie together'' mul- tiple circuits between a fixed pair of systems, and since both of the current PPP authentication protocols permit the side effect of assigning identifiers to both systems, it makes sense to require the use of one or the other of the existing authentication protocols (or any future authentica- tion protocol that assigns unique identifiers to communicat- ing systems) We suggest that multilink operation can be modeled as a sep- arate PPP Network-Layer protocol with its own control proto- col, which may be negotiated in the usual PPP manner. A slightly unusually (for PPP) convention might be that if the PPP-Simple Multilink (network-layer) Protocol were proposed and accepted in both directions between systems that had previously concluded all NCP negotiations on another cir- cuit, that the new circuit ``inherit'' the results of the Sklower [Page 6] Draft Simple Multilink June 1993 previous negotiation. and that both the old and new circuit would then be combined into a single logical channel. To construct the (network layer) packets (themselves) in this new ``protocol'' one would start with a complete PPP packet, prune the leading HDLC address and UI designation bytes, divide it into roughly equal parts, and to each sec- tion prepend the usual PPP control header followed by an RFC1294 fragmentation header. An issue that was raised was whether fragmented packets could be reassembled and released in sequence, which is required by some network protocols such as VJ header com- pression for TCP/IP. Fred Baker also gave the example reconstructing the data dictionary when more general com- pression methods were in use. One could negotiate a guaran- tee of this based on the sequence number in the RFC1294 fragmentation header, agreeing not to release packets out of sequence order, but that would require that delivery across the individual subchannels was reliable. However, even in the face of unreliable delivery, it is the guess of the author of this modest proposal that following the method described above of merely maintaining as many reassembly queues as there are physical channels, and not beginning the transmission of packet N+G until all fragments of packet N were known to have been sent should provide sequenced delivery in most cases. 7.2. More detailed description of the ``Network Protocol'' In our description here, we will need to make use of a PPP, 16-bit network protocol identifier. Just for the sake of ease of description we will choose the numbers 0x3b to rep- resent the proposed fragmentation network protocol and 0x803b to represent the corresponding control protocol. These numbers will clearly need to be reassigned should this proposal be accepted. Individual fragments will thus be encoded in a PPP context according to the following format: Figure 2: Fragment format for PPP 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+ | HDLC FLAG | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0xFF | 0x03 | 0x00 | 0x3b + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number |F| RSVD | Offset + Sklower [Page 7] Draft Simple Multilink June 1993 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | fragment data | ..... | ..... | ..... + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The descriptions of the fields are exactly the same as they are for RFC1294: a 16 bit sequence number, a single bit representing the completing fragment, offset understood to be multiplied by 32. The execution of the protocol itself is exactly as specified above. 7.3. Fragmentation Protocol Control Protocol [Here, we equally shamelessly plagiarize RFC1220] The Fragmentation Protocol Control Protocol is responsible for establishing agreement to initiate multilink operation, and for setting a limited number of parameters. It is exactly the same as the Point-to-Point Link Control Protocol [2], with the following exceptions: Data Link Layer Protocol Field Exactly one Fragmentation Protocol Control Protocol packet is encapsulated in the Information field of PPP Data Link Layer frames where the Protocol field indi- cates type hex 803b. [to be reassigned appropriately] Code field Only Codes 1 through 7 (Configure-Request, Configure- Ack, Configure-Nak, Configure-Reject, Terminate- Request, Terminate-Ack and Code-Reject are used. Other Codes should be treated as unrecognized and should result in Code-Rejects. Precedence FPCP packets may not be exchanged until the Link Con- trol Protocol has reached the network-layer Protocol Configuration Negotiation phase (i.e. Authentication and Link Negotiation have concluded). The FPCP negoti- ation must precede any other network protocol negotia- tion. Configuration Option Types The Fragmentation Protocol Control Protocol has a sepa- rate set of Configuration Options. These permit the negotiation of the following items: o Sequenced Delivery o Reset on Loss o Maximum Received Reconstructed Unit o Maximum Received Completed Sequence (Prior to Loss) Sklower [Page 8] Draft Simple Multilink June 1993 [The author of this proposal realizes the title of it was the ***SIMPLE*** multilink protocol, and would not be gravely disappointed if none of the options were permitted, but is enumerating them here merely for completeness' sake.] 7.3.1. Sequenced Delivery Option Figure 3: Sequenced Delivery Option 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type=1 | length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This option requests the system at the other end of the link not to deliver packets out of sequence. In the absence of reliable delivery, a missing fragment in the packet number N would delay the delay the delivery of packet number N+1 until receipt of any fragment for packet N+G, where G is the number of channels participating in the multilink procedure. 7.3.2. Reset on Loss Figure 4: Reset on Loss Option 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type=2 | length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This option requests the system at the other end of the link to, upon detection of the loss of a fragment, re-enter the fragmentation-control-negotiation phase. 7.3.3. Maximum Receive Reconstructed Unit Figure 5: Maximum Receive Reconstructed Unit Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type=3 | length = 4 | Number of Component Fragments | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This option advises the peer that the implementation will be able to reconstruct a datagram consisting of the specified number of fragments. Thus the maximum reconstructed size in bytes will be the product of the number of constituent Sklower [Page 9] Draft Simple Multilink June 1993 number of fragments and the difference between the size negotiated by the LCP MRU option subtracted by the fragmen- tation header size. 7.3.4. Maximum Received Completed Sequence Figure 5: Maximum Received Sequence Number 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type=4 | length = 4 | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This option is used after an implementation reenters FPCP after detecting a fragment loss. The sequence number is that of the last complete datagram successfully reassembled prior to the detection of the loss. 7.4. Synchronization When sequenced delivery is in effect, there are issues about signaling loss, and graceful termination of a constituent link among several in a group, if it has been determined that the additional bandwidth is no longer needed. We propose a mechanism that can be use for both purposes. Any time that FPCP has been successfully completed, on any constituent link, an implementation may send a FPCP configu- ration request including the ``Maximum Received Completed Sequence'' (MRCS) option. The implementation must not send any further fragmentation packets until both sides have sent and received configure-ack packets control packets. Upon receipt of a config-request with a MRCS option, the peer implementation must eventually reply with either a con- fig-request including the MRCS option, or a config-ack. The peer may also delay send the ack up to twice the round trip delay time. The peer must not begin transmitting fragmenta- tion packets over the constituent link until the initiating implementation sends a config-ack. Both the implementation and its peer may continue sending traffic on the other par- ticipating links in the group however. If the constituent link is to be closed, each peer may send terminate-request packets after verifying that all fragments transmitted prior to dropping back into control protcol mode had been received. Sklower [Page 10] Draft Simple Multilink June 1993 8. RFC 1294/B-Channel Frame Relay The packet formats for the native fragmentation protocol remain unchanged. However, the encapsulation for the Con- trol Protocol relies on PPP control packet formats. PPP control packets may be transmitted in a way compatible with RFC 1294 by SNAP encapsulation. This is detailed in the PNMI draft [7]. 9. RFC 1356/B-Channel X.25 RFC1356 can transmit packet formats found in RFC1294 either by specifying an NLPID of 0 in the Call User Data or by placing a common SNAP header in the Call User Data, and omitting that initial segment on all packets subsequently transmitted over that virtual circuit. In the case of NLPID 0, an in-band negotiation could be used to identify members of a reassembly group by use of the proposed parameter nego- tiation over the multiprotocol interconnect. If one is willing to settle for external manual configura- tion, as the RFC1294 fragmentation procedure employs SNAP encapsulated packets, one could open an X.25 virtual circuit with the fixed SNAP portion of the fragmentation header in the Call User Data, and with the reassembly group defined to be all virtual circuits having the same X.121 calling address. 10. References [1] Leifer, D., Sheldon, S., and Gorsline B., "A Subnetwork Control Protocol for ISDN Circuit-Switching" IPLPDN Working Group, Internet Draft (Expired), March 1991. [2] Simpson, W., "The Point-to-Point Protocol (PPP) for the Transmission of Multi-protocol Datagrams over Point-to- Point Links", Network Working Group, RFC-1331, May 1992. [3] Lloyd, B., Simpson, W., "PPP Authentication Protocols", Networking Working Group, RFC-1334 [4] Bradley, T., Brown, C., and Malis, A., "Multiprotocol Interconnect over Frame Relay", Network Working Group, RFC-1294, January 1992. [5] Malis, A., Robinson, D., Ullman R., "Multiprotocol Interconnect on X.25 and ISDN in the Packet Mode", Net- work Working Group, RFC-1356, August 1992. [6] Sklower, K. and Frost, C. "Determination of Encapsula- tion of Multi-protocol Datagrams in Circuit-switched Environments", IPLPDN Working Group, Committee Sklower [Page 11] Draft Simple Multilink June 1993 Document, February 1993. [7] Sklower, K. "Parameter Negotiation for the Multiproto- col Interconnect" IPLPDN Working Group, Committee Docu- ment, November 1992. [8] Boland, F., editor, "Stable Implementation Agreements for Open Systems Interconnection Protocols: Version 2 Edition 1", NIST Workshop for Implementors of OSI, NIST, February 1989. [9] Internation Organisation for Standardization, "HDLC - Description of the X.25 LAPB-Compatible DTE Data Link Procedures", Internation Standard 7776, 1988 11. Author's Address Keith Sklower Computer Science Department 570 Evans Hall University of California Berkeley, CA 94720 Phone: (510) 642-9587 E-mail: sklower@cs.Berkeley.EDU 12. Expiration Date of this Draft December 31, 1993 Sklower [Page 12]