Skip to content


Types of corruption

There are  various ways in which the Asyntactic script may (appear to) be corrupted by an application or its environment. This post is an atempt to place the variable symptoms of script corruption into a classification system.

R – Rendering

Asyntactic script characters are rendered :

  1. as expected
  2. no characters visible
  3. unrecognisable characters, (mostly) all the same
    1. BMP plane (2 byte) characters OK, multi-byte characters no good
    2. all planes NG
  4. unrecognisable characters, (mostly) all different. (sub-categories same as 3.)
  5. recognisable but unexpected characters. (sub-categories same as 3.)

e.g. rendering in the two images in the previous post would be classified respectively as R3.2 and R1.

C -Code Points

Each Asyntactic script code point :

  1. maintains integrity through input / storage / communication / output
  2. gets translated into an incorrect code point
    1. each one into the same character
    2. each one into different characters in the BMP only
    3. each one into different characters in the BMP and other planes

e.g. corruption in the modified Spaz Twitter client sample would be classified as R.3.1 C.1. Although Spaz (AIR) did not render most characters correctly, the destination site  Twitter demonstrates that data integrity was preserved in the Spaz client.

L – Length of character string

Strings encoded in the Asyntactic script  are  :

  1. the expected length
  2. shorter than expected
    1. approximately half of the length
    2. other
  3. longer than expected

e.g. the Twitter web client and the standard Spaz client both use the Javascript String.length method to count characters in a message. Because this method is broken for higher plane Unicode, they both corrupt data by classification R.2.1.

Posted in trouble-shooting.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.