There are various ways in which the Asyntactic script may (appear to) be corrupted by an application or its environment. This post is an atempt to place the variable symptoms of script corruption into a classification system.
R – Rendering
Asyntactic script characters are rendered :
- as expected
- no characters visible
- unrecognisable characters, (mostly) all the same
- BMP plane (2 byte) characters OK, multi-byte characters no good
- all planes NG
- unrecognisable characters, (mostly) all different. (sub-categories same as 3.)
- recognisable but unexpected characters. (sub-categories same as 3.)
e.g. rendering in the two images in the previous post would be classified respectively as R3.2 and R1.
C -Code Points
Each Asyntactic script code point :
- maintains integrity through input / storage / communication / output
- gets translated into an incorrect code point
- each one into the same character
- each one into different characters in the BMP only
- each one into different characters in the BMP and other planes
e.g. corruption in the modified Spaz Twitter client sample would be classified as R.3.1 C.1. Although Spaz (AIR) did not render most characters correctly, the destination site Twitter demonstrates that data integrity was preserved in the Spaz client.
L – Length of character string
Strings encoded in the Asyntactic script are :
- the expected length
- shorter than expected
- approximately half of the length
- other
- longer than expected
e.g. the Twitter web client and the standard Spaz client both use the Javascript String.length method to count characters in a message. Because this method is broken for higher plane Unicode, they both corrupt data by classification R.2.1.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.