restemplate/indy/Protocols/IdCoderBinHex4.pas

512 lines
18 KiB
Plaintext

{
$Project$
$Workfile$
$Revision$
$DateUTC$
$Id$
This file is part of the Indy (Internet Direct) project, and is offered
under the dual-licensing agreement described on the Indy website.
(http://www.indyproject.org/)
Copyright:
(c) 1993-2005, Chad Z. Hower and the Indy Pit Crew. All rights reserved.
}
{
$Log$
}
{
Rev 1.7 10/6/2004 10:47:00 PM BGooijen
changed array indexer from 64 to 32 bit, it gave errors in dotnet, and making
>2GB arrays is not done anyways
Rev 1.6 2004.05.20 1:39:28 PM czhower
Last of the IdStream updates
Rev 1.5 2004.05.20 11:37:24 AM czhower
IdStreamVCL
Rev 1.4 2004.05.19 3:06:56 PM czhower
IdStream / .NET fix
Rev 1.3 2004.02.03 5:45:50 PM czhower
Name changes
Rev 1.2 1/21/2004 1:19:58 PM JPMugaas
InitComponent.
Rev 1.1 16/01/2004 18:00:26 CCostelloe
This is now working code.
Rev 1.0 14/01/2004 00:46:14 CCostelloe
An implementation of Apple's BinHex4 encoding. It is a "work-in-progress",
it does not yet work properly, only checked in as a placeholder.
}
unit IdCoderBinHex4;
{
Written by Ciaran Costelloe, ccostelloe@flogas.ie, December 2003.
Based on TIdCoderMIME, derived from TIdCoder3to4, derived from TIdCoder.
DESCRIPTION:
This is an implementation of the BinHex 4.0 decoder used particularly by Apple.
It is defined in RFC 1741. It is a variant of a 3-to-4 decoder, but it uses
character 90 for sequences of repeating characters, allowing some compression,
but thereby not allowing it to be mapped in as another 3-to-4 decoder.
Per the RFC, it must be encapsulated in a MIME part (it cannot be directly coded
inline in an email "body"), the part is strictly defined to have a header entry
(with the appropriate "myfile.ext"):
Content-Type: application/mac-binhex40; name="myfile.ext"
After the header, the part MUST start with the text (NOT indented):
(This file must be converted with BinHex 4.0)
This allows the option AND the ambiguity of identifying it by either the
Content-Type OR by the initial text line. However, it is also stated that any
text before the specified text line must be ignored, implying the line does not
have to be the first - an apparent contradiction.
The encoded file then follows, split with CRLFs (to avoid lines that are too long
for emails) that must be discarded.
The file starts with a colon (:), a header, followed by the file contents, and
ending in another colon.
There is also an interesting article on the web, "BinHex 4.0 Definition by Peter
N Lewis, Aug 1991", which has very useful information on what is implemeted in
practice, and seems to come with the good provenance of bitter experience.
From RFC 1741:
1) 8 bit encoding of the file:
Byte: Length of FileName (1->63)
Bytes: FileName ("Length" bytes)
Byte: Version
Long: Type
Long: Creator
Word: Flags (And $F800)
Long: Length of Data Fork
Long: Length of Resource Fork
Word: CRC
Bytes: Data Fork ("Data Length" bytes)
Word: CRC
Bytes: Resource Fork ("Rsrc Length" bytes)
Word: CRC
2) Compression of repetitive characters.
($90 is the marker, encoding is made for 3->255 characters)
00 11 22 33 44 55 66 77 -> 00 11 22 33 44 55 66 77
11 22 22 22 22 22 22 33 -> 11 22 90 06 33
11 22 90 33 44 -> 11 22 90 00 33 44
The whole file is considered as a stream of bits. This stream will
be divided in blocks of 6 bits and then converted to one of 64
characters contained in a table. The characters in this table have
been chosen for maximum noise protection. The format will start
with a ":" (first character on a line) and end with a ":".
There will be a maximum of 64 characters on a line. It must be
preceded, by this comment, starting in column 1 (it does not start
in column 1 in this document):
(This file must be converted with BinHex 4.0)
Any text before this comment is to be ignored.
The characters used are:
!"#$%&'()*+,- 012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr
IMPLEMENTATION NOTES:
There are older variants referred to in RFC 1741, but I have only come
across encodings in current use as separate MIME parts, which this
implementation is targetted at.
When encoding into BinHex4, you do NOT have to implement the run-length
encoding (the character 90 for sequences of repeating characters), and
this encoder does not do it. The CRC values generated in the header have
NOT been tested (because this decoder ignores them).
The decoder has to allow for the run-length encoding. The decoder works
irrespective of whether it is preceded by the identification string
or not (GBinHex4IdentificationString below). The string to be decoded must
include the starting and ending colons. It can deal with embedded CR and LFs.
Unlike base64 and quoted-printable, we cannot decode line-by-line cleanly,
because the lines do not contain a clean number of 4-byte blocks due to the
first line starting with a colon, leaving 63 bytes on that line, plus you have
the problem of dealing with the run-length encoding and stripping the header.
If the attachment only has a data fork, it is saved; if only a resource fork,
it is saved; if both, only the data fork is saved. The decoder does NOT
check that the CRC values are correct.
Indy units use the content-type to decide if the part is BinHex4:
Content-Type: application/mac-binhex40; name="myfile.ext"
WARNING: This code only implements BinHex4.0 when used as a part in a
MIME-encoded email. To have a part encoded, set the parts
ContentTransfer := 'binhex40'.
}
interface
{$i IdCompilerDefines.inc}
uses
Classes,
IdException,
IdCoder,
IdCoder3to4,
IdGlobal,
IdStream,
SysUtils;
type
TIdEncoderBinHex4 = class(TIdEncoder3to4)
protected
FFileName: String;
function GetCRC(const ABlock: TIdBytes; const AOffset: Integer = 0; const ASize: Integer = -1): Word;
procedure AddByteCRC(var ACRC: Word; AByte: Byte);
procedure InitComponent; override;
public
{$IFDEF WORKAROUND_INLINE_CONSTRUCTORS}
constructor Create(AOwner: TComponent); reintroduce; overload;
{$ENDIF}
procedure Encode(ASrcStream: TStream; ADestStream: TStream; const ABytes: Integer = -1); override;
//We need to specify this value before calling Encode...
property FileName: String read FFileName write FFileName;
end;
TIdDecoderBinHex4 = class(TIdDecoder4to3)
protected
procedure InitComponent; override;
public
{$IFDEF WORKAROUND_INLINE_CONSTRUCTORS}
constructor Create(AOwner: TComponent); reintroduce; overload;
{$ENDIF}
procedure Decode(ASrcStream: TStream; const ABytes: Integer = -1); override;
end;
const
//Note the 7th characeter is a ' which is represented in a string as ''
GBinHex4CodeTable: string = '!"#$%&''()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr'; {Do not Localize}
GBinHex4IdentificationString: string = '(This file must be converted with BinHex 4.0)'; {Do not Localize}
type
EIdMissingColon = class(EIdException);
EIdMissingFileName = class(EIdException);
var
GBinHex4DecodeTable: TIdDecodeTable;
implementation
uses
IdResourceStrings;
{ TIdDecoderBinHex4 }
{$IFDEF WORKAROUND_INLINE_CONSTRUCTORS}
constructor TIdDecoderBinHex4.Create(AOwner: TComponent);
begin
inherited Create(AOwner);
end;
{$ENDIF}
procedure TIdDecoderBinHex4.InitComponent;
begin
inherited InitComponent;
FDecodeTable := GBinHex4DecodeTable;
FCodingTable := ToBytes(GBinHex4CodeTable);
FFillChar := '='; {Do not Localize}
end;
procedure TIdDecoderBinHex4.Decode(ASrcStream: TStream; const ABytes: Integer = -1);
var
LCopyToPos: integer;
LIn : TIdBytes;
LInSize: Integer;
LOut: TIdBytes;
LN: Integer;
LRepetition: Integer;
LForkLength: Integer;
begin
LInSize := IndyLength(ASrcStream, ABytes);
if LInSize <= 0 then begin
Exit;
end;
SetLength(LIn, LInSize);
TIdStreamHelper.ReadBytes(ASrcStream, LIn, LInSize);
//We don't need to check if the identification string is present, since the
//attachment is bounded by a : at the start and end, and the identification
//string may have been stripped off already.
//While we are at it, remove all the CRs and LFs...
LCopyToPos := -1;
for LN := 0 to LInSize-1 do begin
if LIn[LN] = 58 then begin //Ascii 58 is a colon :
if LCopyToPos = -1 then begin
//This is the start of the file...
LCopyToPos := 0;
end else begin
//This is the second :, i.e. the end of the file...
SetLength(LIn, LCopyToPos);
LCopyToPos := -2; //Flag that we got an end marker
Break;
end;
end else begin
if (LCopyToPos > -1) and (not ByteIsInEOL(LIn, LN)) then begin
LIn[LCopyToPos] := LIn[LN];
Inc(LCopyToPos);
end;
end;
end;
//did we get the initial colon?
if LCopyToPos = -1 then begin
raise EIdMissingColon.Create('Block passed to TIdDecoderBinHex4.Decode is missing a starting colon :'); {Do not Localize}
end;
//did we get the terminating colon?
if LCopyToPos <> -2 then begin
raise EIdMissingColon.Create('Block passed to TIdDecoderBinHex4.Decode is missing a terminating colon :'); {Do not Localize}
end;
if Length(LIn) = 0 then begin
Exit;
end;
LOut := InternalDecode(LIn);
// Now expand the run-length encoding.
// $90 is the marker, encoding is made for 3->255 characters
// 00 11 22 33 44 55 66 77 -> 00 11 22 33 44 55 66 77
// 11 22 22 22 22 22 22 33 -> 11 22 90 06 33
// 11 22 90 33 44 -> 11 22 90 00 33 44
LN := 0;
while LN < Length(LOut) do begin
if LOut[LN] = $90 then begin
LRepetition := LOut[LN+1];
if LRepetition = 0 then begin
//90 is by itself, so just remove the 00
//22 90 00 -> 22 90
RemoveBytes(LOut, LN+1, 1);
Inc(LN); //Move past the $90
end
else if LRepetition = 1 then begin
//Not allowed: 22 90 01 -> 22
//Throw an exception or deal with it? Deal with it.
RemoveBytes(LOut, LN, 2);
end
else if LRepetition = 2 then begin
//Not allowed: 22 90 02 -> 22 22
//Throw an exception or deal with it? Deal with it.
LOut[LN] := LOut[LN-1];
RemoveBytes(LOut, LN+1, 1);
Inc(LN);
end
else if LRepetition = 3 then begin
//22 90 03 -> 22 22 22
LOut[LN] := LOut[LN-1];
LOut[LN+1] := LOut[LN-1];
Inc(LN, 2);
end
else begin
//Repetition is 4 to 255: expand the sequence.
//22 90 04 -> 22 22 22 22
LOut[LN] := LOut[LN-1];
LOut[LN+1] := LOut[LN-1];
ExpandBytes(LOut, LN+2, LRepetition-2, LOut[LN-1]);
Inc(LN, LRepetition-1);
end;
end else begin
Inc(LN);
end;
end;
//We are not finished yet. Strip off the header, by calculating the offset
//of the start of the attachment and it's length.
LN := 1 + LOut[0]; //Length byte + length of filename
Inc(LN, 1 + 4 + 4 + 2); //Version, type, creator, flags
// TODO: use one of the BytesTo...() functions here instead?
LForkLength := (((((LOut[LN]*256)+LOut[LN+1])*256)+LOut[LN+2])*256)+LOut[LN+3];
Inc(LN, 4); //Go past the data fork length
if LForkLength = 0 then begin
//No data fork present, save the resource fork instead...
// TODO: use one of the BytesTo...() functions here instead?
LForkLength := (((((LOut[LN]*256)+LOut[LN+1])*256)+LOut[LN+2])*256)+LOut[LN+3];
end;
Inc(LN, 4); //Go past the resource fork length
Inc(LN, 2); //CRC
//At this point, LOut[LN] points to the actual data (the data fork, if there
//is one, or else the resource fork if there is no data fork).
if Assigned(FStream) then begin
TIdStreamHelper.Write(FStream, LOut, LForkLength, LN);
end;
end;
{ TIdEncoderBinHex4 }
{$IFDEF WORKAROUND_INLINE_CONSTRUCTORS}
constructor TIdEncoderBinHex4.Create(AOwner: TComponent);
begin
inherited Create(AOwner);
end;
{$ENDIF}
procedure TIdEncoderBinHex4.InitComponent;
begin
inherited InitComponent;
FCodingTable := ToBytes(GBinHex4CodeTable);
FFillChar := '='; {Do not Localize}
end;
function TIdEncoderBinHex4.GetCRC(const ABlock: TIdBytes; const AOffset: Integer = 0;
const ASize: Integer = -1): Word;
var
LN: Integer;
LActual: Integer;
begin
Result := 0;
LActual := IndyLength(ABlock, ASize, AOffset);
if LActual > 0 then
begin
for LN := 0 to LActual-1 do begin
AddByteCRC(Result, ABlock[AOffset+LN]);
end;
end;
end;
procedure TIdEncoderBinHex4.AddByteCRC(var ACRC: Word; AByte: Byte);
//BinHex 4.0 uses a 16-bit CRC with an 0x1021 seed.
var
LWillShiftedOutBitBeA1: boolean;
LN: integer;
begin
for LN := 1 to 8 do begin
LWillShiftedOutBitBeA1 := (ACRC and $8000) <> 0;
//Shift the CRC left, and add the next bit from our byte...
ACRC := (ACRC shl 1) or (AByte shr 7);
if LWillShiftedOutBitBeA1 then begin
ACRC := ACRC xor $1021;
end;
AByte := (AByte shl 1) and $FF;
end;
end;
procedure TIdEncoderBinHex4.Encode(ASrcStream: TStream; ADestStream: TStream; const ABytes: Integer = -1);
var
LN: Integer;
LOffset: Integer;
LBlocks: Integer;
LOut: TIdBytes;
LSSize, LTemp: Integer;
LFileName: {$IFDEF HAS_AnsiString}AnsiString{$ELSE}TIdBytes{$ENDIF};
LCRC: word;
LRemainder: integer;
begin
if FFileName = '' then begin
raise EIdMissingFileName.Create('Data passed to TIdEncoderBinHex4.Encode is missing a filename'); {Do not Localize}
end;
//Read in the attachment first...
LSSize := IndyLength(ASrcStream, ABytes);
//BinHex4.0 allows filenames to be only 255 bytes long (because the length
//is stored in a byte), so truncate the filename to 255 bytes...
{$IFNDEF HAS_AnsiString}
LFileName := IndyTextEncoding_OSDefault.GetBytes(FFileName);
{$ELSE}
{$IFDEF STRING_IS_UNICODE}
LFileName := AnsiString(FFileName); // explicit convert to Ansi
{$ELSE}
LFileName := FFileName;
{$ENDIF}
{$ENDIF}
if Length(FFileName) > 255 then begin
SetLength(LFileName, 255);
end;
//Construct the header...
SetLength(LOut, 1+Length(LFileName)+1+4+4+2+4+4+2+LSSize+2);
LOut[0] := Length(LFileName); //Length of filename in 1st byte
for LN := 1 to Length(LFileName) do begin
LOut[LN] := {$IFNDEF HAS_AnsiString}LFileName[LN-1]{$ELSE}Byte(LFileName[LN]){$ENDIF};
end;
LOffset := 1+Length(LFileName); //Points to byte after filename
LOut[LOffset] := 0; //Version
Inc(LOffset);
for LN := 0 to 7 do begin
LOut[LOffset+LN] := 32; //Use spaces for Type & Creator
end;
Inc(LOffset, 8);
LOut[LOffset] := 0; //Flags
LOut[LOffset] := 0; //Flags
Inc(LOffset, 2);
LTemp := LSSize;
LOut[LOffset] := LTemp mod 256; //Length of data fork
LTemp := LTemp div 256;
LOut[LOffset+1] := LTemp mod 256; //Length of data fork
LTemp := LTemp div 256;
LOut[LOffset+2] := LTemp mod 256; //Length of data fork
LTemp := LTemp div 256;
LOut[LOffset+3] := LTemp; //Length of data fork
Inc(LOffset, 4);
LOut[LOffset] := 0; //Length of resource fork
LOut[LOffset+1] := 0; //Length of resource fork
LOut[LOffset+2] := 0; //Length of resource fork
LOut[LOffset+3] := 0; //Length of resource fork
Inc(LOffset, 4);
//Next comes the CRC for the header...
LCRC := GetCRC(LOut, 0, LOffset);
LOut[LOffset] := LCRC mod 256; //CRC of data fork
LCRC := LCRC div 256;
LOut[LOffset+1] := LCRC; //CRC of data fork
Inc(LOffset, 2);
//Next comes the data fork (we will not be using the resource fork)...
//Copy in the attachment...
TIdStreamHelper.ReadBytes(ASrcStream, LOut, LSSize, LOffset);
LCRC := GetCRC(LOut, LOffset, LSSize);
Inc(LOffset, LSSize);
LOut[LOffset] := LCRC mod 256; //CRC of data fork
LCRC := LCRC div 256;
LOut[LOffset+1] := LCRC; //CRC of data fork
Inc(LOffset, 2);
//To prepare for the 3to4 encoder, make sure our block is a multiple of 3...
LSSize := LOffset mod 3;
if LSSize > 0 then begin
ExpandBytes(LOut, LOffset, 3-LSSize);
end;
//We now need to 3to4 encode LOut...
//TODO: compress repetitive bytes to "<byte> $90 <run length>"
LOut := InternalEncode(LOut);
//Need to add a colon at the start & end of the block...
InsertByte(LOut, 58, 0);
AppendByte(LOut, 58);
//Expand any bare $90 to $90 $00
LN := 0;
while LN < Length(LOut) do begin
if LOut[LN] = $90 then begin
InsertByte(LOut, 0, LN+1);
Inc(LN);
end;
Inc(LN);
end;
WriteStringToStream(ADestStream, GBinHex4IdentificationString + EOL);
//Put back in our CRLFs. A max of 64 chars are allowed per line.
LBlocks := Length(LOut) div 64;
for LN := 0 to LBlocks-1 do begin
TIdStreamHelper.Write(ADestStream, LOut, 64, LN*64);
WriteStringToStream(ADestStream, EOL);
end;
LRemainder := Length(LOut) mod 64;
if LRemainder > 0 then begin
TIdStreamHelper.Write(ADestStream, LOut, LRemainder, LBlocks*64);
WriteStringToStream(ADestStream, EOL);
end;
end;
initialization
TIdDecoder4to3.ConstructDecodeTable(GBinHex4CodeTable, GBinHex4DecodeTable);
end.