INTERNET-DRAFT Network Working Group H. Nussbacher Israeli Inter-University Computer Center Y. Bourvine Hebrew University June 1993 Hebrew Character Encoding for Internet Messages This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au. Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard. Distribution of this memo is unlimited. Abstract This document describes the encoding used in electronic mail [RFC822] for transferring Hebrew. The standard devised makes use of MIME [RFC1341] and ISO-8859-8. Description All Hebrew text when transferred via e-mail must first be translated into ISO-8859-8, and then encoded using either Quoted-Printable (preferable) or Base64, as defined in MIME. Nussbacher & Bourvine Expires December 16, 1993 [Page 1] INTERNET DRAFT Hebrew email encodings June 1993 The following table provides the four most common Hebrew encodings: PC IBM PC ISO Hebrew 8859-8 letter 8-bit 7-bit 8-bit Ascii EBCDIC Ascii Ascii ---------- ----- ------ ----- ------ aleph 128 41 96 224 bet 129 42 97 225 gimel 130 43 98 226 daled 131 44 99 227 hey 132 45 100 228 vav 133 46 101 229 zayin 134 47 102 230 het 135 48 103 231 tet 136 49 104 232 yud 137 51 105 233 chaf sofit 138 52 106 234 chaf 139 53 107 235 lamed 140 54 108 236 mem sofit 141 55 109 237 mem 142 56 110 238 nun sofit 143 57 111 239 nun 144 58 112 240 samech 145 59 113 241 ayin 146 62 114 242 pey sofit 147 63 115 243 pey 148 64 116 244 zadik sofit 149 65 117 245 zadik 150 66 118 246 kuf 151 67 119 247 raish 152 68 120 248 shin 153 69 121 249 tuff 154 71 122 250 Note: All values are in decimal ASCII except for the EBCDIC column which is in hexadecimal. The directionality of the text is visual and not implicit. That means that the Hebrew text is encoded from left to right (even though Hebrew text is entered right to left) and is transmitted from left to right via the standard MIME mechanisms. Future MIME extensions (to the Content-type) will hopefully contain the "direction=" option, at which point, it will be assumed that the default is visual for Hebrew e-mail. All discussion regarding Hebrew in email, as well as discussions of Hebrew in other TCP/IP protocols, is discussed in the ilan-h@vm.tau.ac.il list. To subscribe send mail to listserv@vm.tau.ac.il with one line of text as follows: subscribe ilan-h firstname lastname Nussbacher & Bourvine Expires December 16, 1993 [Page 2] INTERNET DRAFT Hebrew email encodings June 1993 MIME Considerations Mail that is sent that contains Hebrew must contain the following minimum amount of MIME headers: MIME-Version: 1.0 Content-type: text/plain; charset=ISO-8859-8 Content-transfer-encoding: BASE64 | Quoted-Printable Users should keep their text to within 72 columns so as to allow email quoting via the prefixing of each line with a ">". Users should also realize that not all MIME implementations handle email quoting properly, so quoting email that contains Hebrew text may lead to problems. In the future, when all email systems implement fully transparent 8-bit email as defined in RFC1425 and RFC1426 this standard will become partially obsolete. The "Content-type:" field will still be necessary, as well as directionality (which might be implicit for 8BIT, but is something for future discussion), but the "Content-transfer-encoding" will be altered to use 8BIT rather than Base64 or Quoted-Printable. Optional It is recommended, although not required, to support Hebrew encoding in mail headers as specified in RFC 1342. Specifically, the Q-encoding format is to be the default method used for encoding Hebrew in Internet mail headers and not the B-encoding method. Caveats Within Israel there are in excess of 40 Listserv lists which will now start using Hebrew for part of their conversations. Normally, Listserv will deliver mail from a distribution list with a "shortened" header, one that does not include the extra MIME headers. This will cause the MIME encoding to be left intact and the user agent decoding software will not be able to interpret the mail. Each user is able to customize how Listserv delivers mail. For lists that contain Hebrew, users should send mail to Listserv with the following command: set listname full where listname is the name of the list which the user wants full, unabridged headers to appear. This will update their private entry and all subsequent mail from that list will be with full RFC822 headers, including MIME headers. In addition, Listserv usually maintains automatic archives of all postings to a list. These archives, contained in the file "listname LOGyymm", do not contain the MIME headers, so all encoding information will be lost. This is a limitation of the Listserv software. Nussbacher & Bourvine Expires December 16, 1993 [Page 3] INTERNET DRAFT Hebrew email encodings June 1993 Example Below is a short example of Quoted-Printable encoded Hebrew email: Date: Sun, 06 Jun 93 15:25:35 IDT From: Hank Nussbacher Subject: Sample Hebrew mail To: Hank Nussbacher , Yehavi Bourvine MIME-Version: 1.0 Content-Type: Text/plain; charset=ISO-8859-8 Content-Transfer-Encoding: QUOTED-PRINTABLE The end of this line contains Hebrew .=EC=E0=F8=F9=E9 =F5= =F8=E0=EE =ED=E5=EC=F9 Hank Nussbacher =F8=EB=E1=F1=E5= =F0 =F7=F0=E4 Acknowledgements Many thanks to Rafi Sadowsky and Nathaniel Borenstein for all their help. References [ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic Character Sets, Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text Messages", STD 11, RFC 822, UDEL, August 1982. [RFC1341] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", Bellcore, Innosoft, June 1992. [RFC1342] Moore K., "Representation of Non-ASCII Text in Internet Message Headers", University of Tennessee, June 1992. [RFC1425] Klensin, J., Freed N., Rose M., Stefferud E., Crocker D., "SMTP Service Extensions", February 1993. [RFC1426] Klensin, J., Freed N., Rose M., Stefferud E., Crocker D., "SMTP Service Extension for 8bit-MIME transport", February 1993. Security Considerations Security issues are not discussed in this memo. Nussbacher & Bourvine Expires December 16, 1993 [Page 4] INTERNET DRAFT Hebrew email encodings June 1993 Authors' Addresses Hank Nussbacher Computer Center Tel Aviv University Ramat Aviv Israel Fax: +972 3 6409118 Phone: +972 3 6408309 EMail: hank@vm.tau.ac.il Yehavi Bourvine Computer Center Hebrew University Jerusalem Israel Phone: +972 2 585684 Fax: +972 2 527349 EMail: yehavi@vms.huji.ac.il Nussbacher & Bourvine Expires December 16, 1993 [Page 5]