A Specimen of Yardley's Deciphering of Japanese Diplomatic Code Jp (1921)

The "American Black Chamber", led by Herbert O. Yardley, broke a series of Japanese diplomatic codes Ja, Jb, Jc, ... from 1919 and provided deciphering of many Japanese secret messages to the US government. What Yardley considered "the most important and far-reaching telegram" (Yardley, p.312) was an instruction of 28 November 1921 to the Japanese plenipotentiary sent to the Washington Naval Conference, which mentioned a possibility of agreeing to a 10:6 ratio of the Japanese naval tonnage relative to that of the US, relinquishing the 10:7 ratio thus far asserted by Japan. (See Kahn (2004) around p.77 for a broader perspective.)

Coded Message with Plaintext

Yardley (1931) prints a typed working sheet of the "first step in the decipherment" of this telegram. Although the image is tantalizingly unclear in the 2004 reprint or in the Japanese version and is not much better in the original 1931 edition, I could identify most of the individual codes with their plaintext counterparts. (The plaintext printed in the Japanese translation of Yardley (1931) appears to be a back translation from Yardley's deciphering and confused me until I found an authentic Japanese plaintext in Nihongaikoobunsho compiled by the Japanese Ministry of the Foreign Affairs.)

Actually, Kahn (2004) reproduces one page (out of three) of the reconstructed code (p.70) preserved in the US National Security Agency (NSA), with which I could confirm my identifications and clarify some uncertain points.

The following shows the message in code interlined with the plaintext. (The ten-letter groups in the original are divided here to illustrate correspondence with the plaintext.) Light green indicates identifications confirmed by Kahn; dark green indicates those supported by two or more instances; red indicates some inconsistency (discussed below); and unmarked portions are not confirmed but do not contradict, either.

Japanese Diplomatic Code Jp

The code, dubbed Jp, threw Yardley's Cipher Bureau into a panic when it was first encountered on 18 July 1921. Although it had been anticipated by an earlier telegram stating that a new code would soon be put into use, the Cipher Bureau did not expect any difficulty in breaking it. To their dismay, however, the new code Jp was an entirely new type (Yardley p.289). On 11 August, Yardley discovered that it was "24 different small codes instead of the usual 1" (Kahn p.71). From the reconstructed code sheet, Yardley seems to refer to the fact that the same two-letter code may represent different plaintext letters depending on a prefix two-letter code. The following shows some examples.

akam 12-nichi
akaz 9 getsu
akec 12 getsu

amam tei
amaz tadachini
amec taki

azam boku
azaz aruiwa
azec bai

ecec hakari

etam gokuhi
etaz fuu
etec gaimu

ewaz shitagatta
ewec shikarubeki

exam roo
exaz rare
exec rei

Yardley might have simply called these "four-letter codes" (as he spoke of "five-letter code-words" for the Ji code (Kahn, p.69)). Probably, Yardley called them different small codes because it does not fully utilize the combinations of four letters. The code words clearly consist of two two-letter groups, which appear to be reserved for such combinations. That is, the two-letter groups are all vowel-consonant combinations in this order and they are never used as an independent two letter code of their own.

Specifically, the two letter groups used in the four-letter codes (either as the first half or the second half of the four-letter code) appear to be: ab, ak, am, az, ec, eg, eh, es, et, ev, ew, ex, id, ij, in, ob, of, ok, or, ow, oy, ul, up. These account for 23 prefixes and unprefixed two-letter codes constitute the 24th "small code." That this is what Yardley meant is supported by the fact that the reconstructed code table has a whole section with a prefix "eg" (i.e., egab, egak, egam, ...) but none of them have a plaintext counterpart.

Examples of independent two-letter codes are as follows:

ac juu
ad ru
af e
ag fu
ah koku
aj nikansuru
al shakkan
ap oyobi
ar soo
at I
av bi
aw pa
ax mu
ay te

ba ooden
be joo
bi n
bo 5
bu ra
by itashitashi

ca de
ce ri
co na
cu 3
cy se

Thus, two-letter groups used as such and two-letter groups used in four-letter combinations appear to be clearly distinguished. An exception is two-letter codes representing English letters, which may also be used in four-letter combinations.

ab W
ak Z
am F
az V
ec S
es N
et T
ev P
ew H
ex K
....

It is also noted that although the code is not alphabetical (even in Japanese syllabary, whether old (I-RO-HA order) or new (A-I-U-E-O order)), words/syllables beginning with the same sound tend to appear close together.

After the initial breakthrough, the new code did not resist the efforts of the Cipher Bureau much longer. With just fifteen intercepts, the table with 700 plus elements was reconstructed and the first translation of a message in Jp was made as early as 23 August (Kahn, p.71). The reconstructed code table is dated 6 September, 1921.

Notes on Enciphering Practice

On line (11), the word "shii" is enciphered as "dy iz". Here, "dy" indicates "4", which may be read as either "shi" or "yon" in Japanese. This does not necessarily mean the code for numerals is used for their phonetic value because the word "shii" literally means "four sides" (i.e., the surroundings).

On lines (8) and (18), the number "10" (as in 10:6.5 or 10:6) is enciphered as "ac". Apparently, no specific code is reserved for two-digit numbers such as "10" (read "juu" in Japanese) and the code "ac" is the one used for the sound "juu" as in "juubun" (sufficient) on line (20).

In Japanese, the character "ha" is pronounced as "wa" when it is used as a grammatical particle. Such "ha" particle (lines (4)(14)(15)(21)) is enciphered with the same code for "wa" forming a part of an ordinary word (line (5)(10)).

On the other hand, on line (1), the character "he", which is pronounced as "e" when it is used as a grammatical particle, is not enciphered with a code "af" for "e" as on line (7)(12)(13).

Some Clerical Details

In the working sheet printed in Yardley (1931), which is only "the first step in the decipherment", quite a few errors are noticed. Although the final decipherment (as printed on p.313 of Yardley (1931) or p.77 of Kahn (2004)) was accurate, at least it was noted that "6.5" (a preliminary concession to offer before agreeing to 10:6) was "reconstructed from a garbled passage" (Kahn p.77)

The following enumerates such errors in the working sheet, which may have been made either by the Japanese enciphering clerk, by the Japanese telegrapher, by the intercepting American telegrapher, by the typist, or (hopefully not but possibly) by me in transcribing the unclear photocopy.

On line (1), the plaintext "zen" is enciphered as "orup" but it is "oaap" on line (6). The latter seems to be in error at least in that "oa" does not conform to the construction of the four-letter code.

On line (2) of the working sheet, "esby" appears to be corrected as "esoy" by handwriting

On line (2), the plaintext for "inof" should be "kaigi" (conference) according to Japanese records, rather than "kai i" (which does not make sense in Japanese) as appears on the working sheet.

On line (3) of the working sheet, "th" appears to be corrected as "ah" by handwriting. The Morse code for "a" (._) is similar to that for "t" (_).

On line (4), the first letter "e" is omitted in this working sheet. This might have been fatal but once one is familiar with the structure of the code (e.g., vowel-consonant patterns), it would not be difficult to notice the omission.

On line (4), the plaintext "sa" is enciphered as "hu" but it is "ho" on line (22). The latter seems to be in error because "ho" is used for "za" on lines (8)(12)(13)(14).

On line (4), "yoo" is enciphered as "vo", which may be in error because "yo" is enciphered as "ko" on (6)(9)(17)(20). On line (14), an orphan letter "k" should be "ko". On line (15), "ko" is deciphered as "yoo" in the working sheet but the plaintext should be "na", which indicates that the code should be "id".

On line (5), "ab" should be "ad". The Morse code for "b" (_...) is similar to that for "d" (_..).

On line (5), "r" (Morse code: ._.) should be "e" (.) "n"(_.).

On line (6), "et" should be "en".

On line (6), "ef" represents "kan" but it is for "ya" according to the reconstructed code table printed in Kahn. The code "ef" appears to be an error for "if", which occurs on line (16). The Morse code for "e" (.) is similar to that for "i" (..).

On line (6), "suru" is enciphered as "ofok" but as "ouok" on line (9). The latter appears to be in error because it does not conform to the pattern of the four-letter code. The Morse code for "f" (.._.) is similar to that for "u" (.._).

On line (7), the code "ot" represents "arita." The fact that "arita" is only an uninflecting part of a verb phrase "ari-taku" is not a problem by itself but the reconstructed code in print indicates "azak" represents "arita."

On line (8), the code "edup" corresponds to the plaintext "nioitemo" (which is a postposition "nioite" having a similar meaning to the English preposition "in" plus "mo", meaning "also"), while "nioite" should be represented by "da" as on line (14). (By the way, the Japanese record indicates line (14) also has "nioitemo" but the encoded telegram has "nioitewa".) At least, "ed" appears to be in error because it represents "ki" and should not be used in four-letter codes and also because a four-letter code "edup" is not in the "e"-section of the reconstructed code table.

On line (8), "asul" for "an" should be "abul". The Morse code for "b" (_...) may be said to be similar to that for "s" (...).

On line (8), "ug" for "dai" appears to be an error for "uz" as on line (2)(13)(21). The Morse code for "g" (_ _.) is similar to that for "z" (_ _..).

On line (8), the plaintext "6" as in "6.5" (noted as "reconstructed from a garbled passage") is enciphered as "he", which appears to be an error for "hi" as on line (19). The Morse code for "e" (.) is similar to that for "i" (..).

On line (9), the code "ehob" corresponds to a decimal point. The reconstructed code table leaves the plaintext for "ehob" blank but it has "ehec" for "period."

On line (10), the code "ba" for "ra" should be "bu".

On line (11), the code "ehec" corresponds to the plaintext "nao", while the reconstructed code table in print indicates "ehec" is "period" and the working sheet has a period here. The word "nao" means "incidentally" or the like and the Japanese text makes sense if "nao" were to be replaced by a comma (not a full stop). Having said this, the telegram does not include punctuations elsewhere and there must be some error in the code "ehec."

On line (12), the code letters "obaze" corresponding to "notame" must be in error because it does not consist of two-letter units. From the pattern of the four-letter code, the last "e" may be superfluous. (Since the correct translation "notame" appears to be written in the working sheet, the first four letters may have been correct.) While the reconstructed code table has "amin" for "tame" ("sake" in English), it is possible that there is another code for the postposition "notame" ("for the sake of").

On line (12), an orphan code letter "n" should be "ny."

On line (14), the code "azem" for "baai" should be "azet".

On line (17), the code "upan" for "ryoku" appears to be an error for "upak" as on lines (7)(10). The following "ce" should be "ne". The Morse codes for "nc" (_. _._.) may be said to be similar to that for "kn" (_._ _.).

On line (20), the code "le" is translated as "ku" in the working sheet but there must be an error in the code because the plaintext should be "yo".

Further, there must be some error in the codes "itup" on line (2), "oysl" on line (3), and "ijuy" on line (15) in view of the two-letter groups "it", "sl", and "uy", which should not be used in the four-letter codes.

Reference

Herbert O. Yardley (1931, 2004), The American Black Chamber

David Kahn (2004), The Reader of Gentlemen's Mail

外務省『日本外交文書　ワシントン会議・上』（online）p.287-288 (in Japanese)