public final class PentaCodec extends Object implements SymbolCodec
PentaCodec
performs symbol coding and serialization using
extensible 5-bit encoding. The eligible characters are assigned penta codes
(either single 5-bit or double 10-bit) according to the following table:
'A' to 'Z' - 5-bit pentas from 1 to 26 '.' - 5-bit penta 27 '/' - 5-bit penta 28 '$' - 5-bit penta 29 ''' and '`' - none (ineligible characters) ' ' to '~' except above - 10-bit pentas from 960 to 1023 all other - none (ineligible characters)The 5-bit penta 0 represents empty space and is eligible only at the start. The 5-bit pentas 30 and 31 are used as a transition mark to switch to 10-bit pentas. The 10-bit pentas from 0 to 959 do not exist as they collide with 5-bit pentas.
The individual penta codes for character sequence are packed into 64-bit value from high bits to low bits aligned to the low bits. This allows representation of up to 35-bit penta-coded character sequences. If some symbol contains one or more ineligible characters or does not fit into 35-bit penta, then it is not subject to penta-coding and is left as a string. The resulting penta-coded value can be serialized as defined below or encoded into the 32-bit cipher if possible. Please note that penta code 0 is a valid code as it represents empty character sequence - do not confuse it with cipher value 0, which means 'void' or 'null'.
The following table defines used serial format (the first byte is given in bits with 'x' representing payload bit; the remaining bytes are given in bit count):
0xxxxxxx 8x - for 15-bit pentas 10xxxxxx 24x - for 30-bit pentas 110xxxxx ??? - reserved (payload TBD) 1110xxxx 16x - for 20-bit pentas 11110xxx 32x - for 35-bit pentas 11111000 - for most recently used event flags 11111001 zzz - for new event flags in the following compact int 1111101x ??? - reserved (payload TBD) 11111100 zzz - for UTF-8 string with length (>=0) in bytes 11111101 zzz - for CESU-8 string with length (>=0) in characters 11111110 - for 0-bit penta (empty symbol) 11111111 - for repeat of the last symbolSee CESU-8 for format basics and
IOUtil.writeUTFString(java.io.DataOutput, java.lang.String)
and IOUtil.writeCharArray(java.io.DataOutput, char[])
for details of string encoding.SymbolCodec.Reader, SymbolCodec.Resolver, SymbolCodec.Writer
Modifier and Type | Field and Description |
---|---|
static PentaCodec |
INSTANCE
The instance of
PentaCodec . |
VALID_CIPHER
Constructor and Description |
---|
PentaCodec()
Deprecated.
Use
INSTANCE . |
Modifier and Type | Method and Description |
---|---|
SymbolCodec.Reader |
createReader()
Creates stateful symbol reader.
|
SymbolCodec.Writer |
createWriter()
Creates stateful symbol writer.
|
String |
decode(int cipher)
Returns decoded symbol for specified cipher.
|
String |
decode(int cipher,
String symbol)
Returns decoded symbol for specified cipher-symbol pair.
|
int |
decodeCharAt(int cipher,
int i)
Decodes one character from the given cipher at the given position.
|
long |
decodeToLong(int cipher)
Returns decoded symbol for specified cipher packed in the primitive long value.
|
int |
encode(char[] chars,
int offset,
int length)
Returns encoded cipher for specified symbol represented in
a character array.
|
int |
encode(String symbol)
Returns encoded cipher for specified symbol.
|
int |
getWildcardCipher()
Returns cipher that is used by the "wildcard" symbol, this implementation returns value that
is equal to
encode("*") . |
int |
hashCode(int cipher)
Returns a hash code for the specified cipher.
|
public static final PentaCodec INSTANCE
PentaCodec
.public PentaCodec()
INSTANCE
.public int encode(String symbol)
SymbolCodec
encode
in interface SymbolCodec
public int encode(char[] chars, int offset, int length)
SymbolCodec
encode(new String(chars, offset, length));
.encode
in interface SymbolCodec
public String decode(int cipher)
SymbolCodec
decode
in interface SymbolCodec
public String decode(int cipher, String symbol)
SymbolCodec
return symbol != null ? symbol : decode(cipher);
.decode
in interface SymbolCodec
public long decodeToLong(int cipher)
SymbolCodec
This is the same encoding as specified by ShortString
class.
The result of
expression
shall be the same as result of ShortString.decode
(decodeToLong(cipher))decode(cipher)
call except for null vs empty string discrepancy.
However this method always aligns bytes in returned value to the highest one rather than lowest one.
decodeToLong
in interface SymbolCodec
public int decodeCharAt(int cipher, int i)
SymbolCodec
decodeCharAt
in interface SymbolCodec
-1
if i >= decode(cipher).length()
.public int hashCode(int cipher)
SymbolCodec
decode(cipher)
.hashCode()
except it does not throw NullPointerException
for 0 cipher.hashCode
in interface SymbolCodec
public int getWildcardCipher()
encode("*")
.getWildcardCipher
in interface SymbolCodec
public SymbolCodec.Reader createReader()
SymbolCodec
createReader
in interface SymbolCodec
public SymbolCodec.Writer createWriter()
SymbolCodec
createWriter
in interface SymbolCodec
Copyright © 2002–2025 Devexperts LLC. All rights reserved.