How to Break a Code (Not a Cipher)

A student of cryptography would have an experience of having been impressed by the way a cryptogram can be deciphered without a key, as demonstrated by Edgar Allan Poe's The Gold Bug (see also another article, which takes Cornwallis' cipher as an example). But this is actually breaking a cipher (as opposed to a code). With cipher, which turns letters in a message into different letters, frequency of occurrence and other characteristics of the 26 letters of the alphabet allow one to methodically identify the original letters represented by the letters in a ciphertext.

On the other hand, code groups represent whole words or names. Although there are some known characteristics of words (e.g., "the" is the most frequently used word in English), characteristics among tens of thousands of words do not give as clear a clue as those among the 26 letters of the alphabet. The present article describes some examples of breaking such a code as opposed to a cipher.


Partial Encoding

Cipher in Code

Yardley (1918, 1926)

Mansfield Dictionary Code (1930s)

Commercial Code

Statistical Analysis

Codebreaking by Espionage

Partial Encoding

When only some particular words are encoded, it would be a matter of knowledge of the background to guess the words referred to. Such has been done by historians such as Coxe (cf. another article), Bergenroth (cf. another article)), and others. It would have been only slightly more difficult for contemporaries to do the same.

Examples during the Peninsular War

Three examples of careless encoding are given in A History of the Peninsular War, Volumne V (Internet Archive), p. 611, Appendix XV.

When Dorsenne wrote to Jourdan on 16 April 1812 "Vous voulez de renseignement sur la situation militaire et administrative de 1238", it was easy to guess the code number stood for "the Army of the North."

When Joseph, Napoleon's brother made King of Spain, wrote to Marmont "J'ai donné l'ordre au général Treillard de 117.8.7 la vallée de 1383, afin de marcher à 498.", the situation indicated that 117.8.7 was almost certainly "evacuate", 1383 "Tagus", and 498 some large town, which still baffled Wellington to make a wrong guess "Plasencia" when it was "Aranjuez".

When Suchet wrote to Soult on 17 September 1812 "Le Général Maitland commande l'expédition anglaise venue de 747: O'Donnell peut réunir 786 692 1102 en y comprenant le corps de l'Anglais Roche. Le 19 août je n'avais que 135 692 1102 à a lui opposer." it was clear that "747" was "Sicilly"; two instances of "692 1102" are "thousand men." Further identifications that 135 stood for 7 and 786 for 12 would be ascertained through further instances. (p.613-614)

Even if longer passages or sentences are encoded, as long as some part of the whole message is in plaintext, the latter provides a clue for codebreakers, who would eventually reveal the code, given sufficient materials.

Mark Urban, The Man Who Broke Napoleon's Codes mentions such examples in passim.

Examples of Bazeries

One French letter in cipher from 1813 (see another article) had a passage "Tant par l'effet de la 107 138 170 122 53 171 122 149 et de la 148 54 53 138 169 du 6 95 107 176 que par le 177 169 161 20 très 69 145 51 115 176 qu'elle a à faire". For Bazeries, it needed no imagination to guess the ending reads "que par le se-r-vi-ce très pé-ni-b-l-e (ou fa-ti-ga-n-t) qu'elle a à faire." (Bazeries, Les chiffres secrets dévoilés, p.182)

Other examples are given in Bazeries, Les chiffres de Napoléon Ier. p.21-22. For example, after having identified "71" as "de" and "637" as "ser", it was easy for Bazeries to see that "On s'est 386.996.110. tous les jours pour 593 po 294.637 - 117 - 595 mais il n'y a rien 477 - 71 - 637.887.874." represents "On s'est battu tous les jours pour repousser l'ennemi mais il n'y a rien eu de sérieux."

Cipher in Code

In modern parlance, a code is to represent words and phrases with figures (or other symbols), whereas a cipher is to represent letters with symbols (e.g., different letters, figures). Still, many codes had provisions to represent words or names not included in the vocabulary. That is, some part of code may constitute a set of cipher symbols, which may give a clue for codebreakers.

Codebreaking by John Wallis

The English mathematician John Wallis (1616-1703) solved many letters in cipher (see another article), which included ones completely in code (see another article). Probably, Wallis detected that low numbers represented single letters and attacked sequences of such low numbers first, which would have been similar to breaking a simple cipher. When words represented in cipher were revealed, he must have used them as a clue to guess the meaning of other code groups.

Codebreaking by Etienne Bazeries

Commandant Bazeries decoded Louis XIV's great cipher with entries up to 587 (see another article). Upon first inspection of the letters in code, Bazeries was sure he could solve it. Since a code with such a small vocabulary has to encode syllable by syllable, once an initial breakthrough is made, similar reasoning as in cipher-breaking would reveal the code numbers one after another.

The great cipher before Bazeries did not have the weakness presumably exploited by John Wallis. So, Bazeries assumed that words "les ennemis" must appear in the letters and found that frequently occurring patterns with slight variation like 124 22 125 46 574 would correspond to "les en-ne-mi-s." This provided the breakthrough needed to proceed.

Stripping Superencipherment of a WWI Code

At the time of World War I, it was recognized by code compilers that "Words spelled out, letter by letter, ... are one of the favorite points of attack by enemy code men" (see another article). But the first American field code was not prepared for such an attack.

A report of 17 May 1918 (Friedman, "American Army Field Codes in the American Expeditionary Forces during the First World War", p.117, Appendix 10) for evaluating the first American field code, given 44 short test messages encoded with monoalphabetically enciphered three-letter code groups and the underlying codebook, successfully identified the superencipherment.

Code groups for single letters provided the first clue. In particular, TKG, which appeared in succession as follows, as well as those around them were considered to represent single letters.

    BCN TKG TKG BCN

    BCN TKG TKG GRO

TWS BCN TKG TKG GWY

TWS BCN TKG TKG GWY

After trying T and S in vain, L was assumed for TKG, which suggested a likely candidate ED to follow L-L. To check this assumption, reference was had to the codebook, which indicated MIH for L and HEG for ED. Matching between the third letter for the code group for L and the first letter for the code group for ED is consistent with the enciphered code group "TKG GWY". Once this was confirmed, the preceding groups were considered to be "K-I", forming the word "killed". (By the way, the report pointed out absence of the word "killed" and other frequent words as defects of the code.)

The reports also pointed out monoalphabetic enciphering provides no security because it does not matter for codebreakers whether L is represented by MIH or superenciphered as TKG. (Of course, superencipherment can conceal the one-part nature of the code but after some identifications were made, it only adds some complications but not a true difficulty of a two-part code.)

The report also pointed out that once the codebook is known, monoalphabetically enciphered "AVA" was immediately recognized to be the codegroup KYK for H because this was the only one having the same initial and final letters for single letters.

The report claimed it took only five hours to reveal the enciphering alphabet and another five hours to complete the decoding.

Yardley (1918, 1926)

German Code (1918)

Herbert O. Yardley's The American Black Chamber (Chapter VI) tells how he decoded two mysterious messages transmitted from a German station in 1918. The messages were in five-figure code, without address or signature, and were transmitted over and over again nearly every day. There was reason to believe that it was addressed to Mexico.

The message No. 1 had 141 code groups and No. 42 had 138 code groups. The code groups ranged from 00308 to 48001.

Of these, the most frequent group 42635 occurred 16 times and 28709, 28223, and 19707 occurred 8 times each. It may be tempting to identify 42635 with "the" but it must be remembered that words such as "a" and "the" were often omitted in telegraphy. Yardley identified that the language was English rather than German or Spanish because assumption of German or Spanish "leads to a blank wall".

The largest numbers were 55927, 55934, 55936, of which the first two occurred twice and thus may be considered to be common words. Since they were towards the end of the alphabet, they might begin with y. Assuming "you" for 55927, 55934 "your" and 55936 "yourself" nicely fit the sequence in the only dictionary Yardley had at hand.

55927 you
55928 young
55929 younger
55930 youngish
55931 youngling
55932 youngster
55933 younker
55934 your
55935 yours
55936 yourself

Then, it was noticed that the last two figures of the code groups were all in the range from 01 to 62. This seemed to suggest that the five-figure numbers were not serial numbers but indicated a three-figure page number and a two-figure word or line number.

Then, Yardley examined the most frequent code group 42635 by listing what came before and after this code group. The results showed that it was often preceded by the same group but was always followed by a different group. Yardley thought it must be a termination or ending of some sort. But neither a period (.) nor any common words were likely because 16 occurrences in a total of no more than 279 words were too many. Considering that the place of 42635 in a sequence up to about 56000 suggested r or s as the initial letter, Yardley assumed it might be a plural ending "s".

Then, with these findings, the beginning of the telegram No. 42 was examined: "19707 21206 31511 31259". Of these, 19707, being one of the groups that occur 8 times, must be a common word. Searching pages around p.197 in some English dictionary of about 600 pages, "for" was found on p.203. Yardley's dictionary appeared to advance by about 6 pages than the actual code used by the Germans. Thus, in order to decode the next group 21206, pages around p.212+6=218 were searched and "German" was found on p.217. Then, pages around page 315+6=321 were searched to identify 31511. Though it had to be born in mind that the same offset might not apply after some hundred pages, it must be a word beginning with m. Indeed, "minister" on p.312 seemed right.

The next group 31259 should be found 3 pages earlier. The telegram was supposed to be addressed to Mexico. So "For German Minister Mexico" seemed to be the plaintext for the first four code groups.

One might continue such a process to gradually reveal code groups one by one. Alternatively, one may identify the dictionary that has "for" as the 7th line or word on p.197, etc. As it turned out, it was the English-French half of Clifton's Nouveau Dictionnaire Français.

The decoded message is as follows. Words and names not in the dictionary such as Bleichroeder (Wikipedia) seem to be represented by portions of words. (Association of code groups with plaintext portions is my conjecture. Code groups such as 28223, 28709 seem to have some predefined function rather than mere nulls but details are not known to me.)


Peruvian Code (1926)

Yardley (Chapter XIX) describes another example of breaking a code. Even if his descrptions are inaccurate, it gives an idea how codebreaking (as opposed to cipher-breaking) may work.

In 1926, Yardley was given by the Department of State a document in code containing 390 code words (of which 250 occurred only once). It began as follows:

8453207440 5400000001 19977 NCOTRAL 2116388212 0000178607 4747722681 2212567444 0757021928 2105311032 8151788212 6742358138 4346728381

That the message is composed of five-figure code sent in groups of ten is pretty much obvious from the common practice at the time as well as from the recurrence of 00001. NCORTRAL would be a word not in the code, enciphered letter by letter.

Yardley says although he was not given details about the document, he assumed that the message concerned a dispute between Chile and Peru, for which the United States was acting as a mediator. Since the disposition of the City of Arica was at issue, they reasoned that "ciudad de Arica" should appear in the message. From the past experience with Chilean and Peruvian codes, it was expected that the code would be alphabetical (one-part code).

Assuming that the most frequent group 36166 represented de, "Arica" would be one of the groups that follow 36166, and would be "00xxx." No success was obtained with this or the next hypothesis 36166=en.

When another high-frequency group 27359 was supposed to be de, "ciudad de Arica" was identified (though it must be remembered that such an early identification may occasionally have to be given up at a later stage). Thus, a few common words such as de, en, el, que, y, a, etc. were identified on the first day.

The next morning, the words "Secretary of State" (Secretario de Estado in Spanish) and the Spanish word for "said" were found.

Now sure of the topic of the message, Yardley requested a resumé of the conference between the Secretary of State and the Chilean or Peruvian ambassador. The crib allowed decoding of the whole message in a few days. It turned out to be a report by the Peruvian ambassador.

The message begain in translation as "No. 37. 1. Last night I had dinner alone with STABLER. After dinner ...." So, the enciphered name was S(=N)T(=C)A(=O)B(=T)L(=R)E(=A)R(=L).

Mansfield Dictionary Code (1930s)

As demonstrated by Yardley's example above, as long as assignment of plaintext words to numbers is in alphabetical order, the relative position of a number allows estimating an approximate place in a dictionary. Mansfield's progressive dictionary list is a reference tool to facilitate such estimation.

Lanaki (1996), Classical Cryptography Course, Lecture 20: Codes demonstrates solution of a very short code message. (I made some corrections in the following.) (It cites [DAGA] D'agapeyeff, Alexander, "Codes and Ciphers," Oxford University Press, London, 1974; [MANS] Mansfield, Louis C. S., "The Solution of Codes and Ciphers", Alexander Maclehose & Co., London, 1936; and [MAN1] Mansfield, L.C.S, "One Hundred Problems in Cipher. London, 1936. According to a posting "Mansfield Dictionary Code", this example appears to have been created by D'agapeyeff by picking up code groups of Mansfield to suit his demonstration.

The message in code is as follows.

55381 42872 35284 55381 45174 56037 55381 46882 23171 44234 55366 55381 00723 12050 61571 36173 55381 56442

Arranging the above code groups in ascending order results in

00723 12050 23171 35284 36173 42872 44234 45174 46882 55366 55381 (5 times)
56037 56442 61571

The group "55381", which occurs no less than five times in this short message, is assumed to be THE. (A possibility that THE is omitted in the message or that 5 times in such a short message is too many was not considered, at least for a starter.) The highest number "61571" may be assumed to be a word beginning with W. ("You" may also have been a possibility.)

Now, use is made of a tool called Mansfield's progressive dictionary lists, which provide serial numbers for words beginning with any two letters in dictionaries of 10,000-100,000 words. For example, given the total number of words in the dictionary would be around 65,000, the list tells that the last word beginning with DA has a serial number 11646 and the last word beginning with DE has a serial number 12850, which implies the group 12050 is a word beginning with DE. (Today, a calculator makes things easier. If you have a dictionary of say 800 pages, the page 120 out of a total of 650 would correspond to around page 148 (=120/650*800) of your dictionary.)

Working similarly for the other code numbers, we may have the following:

THE RE-- OF-- THE RO-- TO-- THE SE-- HA-- RE-- TH-- THE AE-- DE-- WA-- OV-- THE TO--

While, of course, the border between bigrams varies depending on the particular dictionary used, at least the position of THE (55381) is confirmed. Somewhat preceding this is 55366 (TH--). Of candidates such as THAN, THANK, THAT, etc., THAT would be the most probable. Two groups 56037 and 56442 are in the TO-- section, which starts at 56037 and ends at 56466. So the former would be TO. The latter, may be TOWN. Similarly, the R- words may be guessed by considering, e.g., 42872 is about 300 words from the end of the RA section at 42573.

THE RECONNAISSANCE OF-- THE ROUTE TO THE SE-- HA-- REVEALED THAT THE AE-- DE-- WA-- OV-- THE TOWN.

Similarly, AE-- may be AEROPLANE. And DE-- is one quarter of the way into the DE section, which suggests DEF--. Thus, "aeroplane defensive" sounds right.

The rest may be fairly obvious.

THE RECONNAISSANCE OF THE ROUTE TO THE SEA HAS REVEALED THAT THE AEROPLANE DEFENSE WAS OVER THE TOWN.

According to [MANS], actually "AEROPLANE DEFENSE" should be "air defenses."

André Langie's Example

The technique similar to those of Yardley and Mansfield is also described in André Langie, Cryptography (1922; English translation (adaptation) of the original in French), p.88 ff.

The following message is given.

5761 3922 7642 0001 9219 6448 6016 4570 4368 7159 8686 8576 1378 2799 6018 4212 3940 0644 7262 8686 7670 4049 3261 4176 6638 4833 4827 0001 3696 6062 8686 2137 4049 2485 7948 0300 9712 0300 4212 9576 2475 8576 8337 0702 9185

Sorting this in numerical order gives:

0001 2485 4212 6062 8576
0001 2799 4212 6448 8576
0300 3261 4368 6638 8686
0300 3696 4570 7159 8686
0644 3922 4827 7262 8686
0702 3940 4833 7642 9185
1378 4049 5761 7670 9219
2137 4049 6016 7948 9576
2475 4176 6018 8337 9712

The most frequent groups are: 8686 (3x), 0001, 0300, 4049, 4212, 8576 (2x).

Pairs of nearby groups are: 2475:2485, 3922:3940, 4827:4833, 6016:6018, 7642:7670, and 9185:9219.

0001 is supposed to be "a".

Other common words that may be expected to occur are "of", "to", "and", "the", etc. We first consider "of".

Now, an English dictionary is used as a reference. It shows words beginning with O occupies pages corresponding to 58-61 percent from the beginning. It means O words would be between 5800 and 6100 in this code. This assumption gives three candidates: 6016, 6018, and 6062. If 6016 is taken as "of", the dictionary gives a plausible reading "offensive" for 6018.

As for "the" and "to", it is noted that 8576 occurs twice and 8686 occurs three times. Although this is outside the range of T-words between 8715 and 9298 expected from the proportion of initials in the English dictionary, it seems within allowable error bounds. So, one may suppose 8576 to be "the" and 8686 to be "to".

Another frequent group 0300 is in an expected range of A (6.43%, or up to 0643). The word "and" is found just about the right position.

The remaining groups occurring multiple times --4049 and 4212-- are in the H section, which, however, has too many candidates: "have", "has", "he", "him", etc.

If the above candidates are substituted for the code groups, there is found "TO THE 1378 2799 OFFENSIVE." Since 2799 is in the E section, "enemy" is a good candidate. The number 1378 is just about halfway between 0001 (A) and 2799 (ENEMY), and "coming" is found.

Then, we look at "AND 9712 AND 4212", where 4212 also occurs after "offensive." The number 4212 is about halfway between 0001 (A) and 8576 (THE). So, although the dictionary proportion puts it in the H section, 4212 is supposed to be the pronoun "I." Considering 9712 is the highest number, it may be supposed to mean "you." This reveals "AND YOU AND I."

The above sequence is followed by 9576, which is the second highest number and may be supposed to be WERE or WILL. It is followed by 2475 in the D section, which in turn is followed by THE. A closer examination of the position shows it is DI... or DO.... So, this (2475) or 2485 may be "do." Assuming the latter is "do", "divulge" and "divide" are found as candidates for 2475, of which the latter seems more plausible. This gives the reading "AND YOU AND I WILL DIVIDE THE."

With similar guidance from the dictionary position and context, the groups following this are found to be "SUM BETWEEN US." When "A GOOD O... TO" is revealed, one may arrive at "a good opportunity to."

Further processing like this reveals the following plaintext.

Mi ... has secured a valuable piece of information in regard to the coming enemv offensive. I have been requested to send him five hundred pounds. It is a good opportunity to denounce him. Do so, and you and I will divide the sum between us.

Dictionary Code Challenge

The feasibility of this method of guessing a word by its relative position in a dictionary was testified by a challenge presented in Klausis Krypto Kolumne in 2018.

Commercial Code

Some telegraphic codebooks provided for some measure of secrecy (see another article). Continental codebooks like Sittler represented a word with a four-digit number consisting of a page number and a line number and correspondents could make their own arrangement to reassign page numbers to keep the secrecy of their message.

Etienne Bazeries demonstrated how easily such enciphering could be broken (Bazeries, Les chiffres secrets dévoilés, p.142-145).

He was given the following message coded by Sittler.

2213 2379 2836 5034 6360 9051
1302 1086 7131 2394 7514 1933

Given a hint that the message was about finance, he started by assuming some words such as bourse, titres, millions were used in the message.

The word million appears on real page 57, line 04 in Sittler. Although the page numbers might be reassigned, the line number 04 is assumed to be unchanged. Then, it is noted that the fourth group 5034 contains both 0 and 4. The 1st and 3rd digits may indicate the page and the 2nd and the 4th digits may indicate the line. If this group indeed is million, the preceding group 2836 would be a figure. From its 2nd and 4th digits, one might look for a numeral appearing on line 86 on some page. This finds 17 on page 27 and 41 on page 74.

Considering that the real page 57 is assigned "53" (1st and 3rd digits), the difference being 4, it is noted that "23" formed by the 1st and 3rd digits of 2836 is also displaced from the real page 27 by 4.

With the assumptions thus far, the preceding group 2379 must refer to real page 31 (=27+4), line 39, which gives emprunter (borrow), which nicely fits the context.

The plaintext turned out to be "Je désire emprunter 17 millions, pouvez-vous vous charger de les réaliser et à quelles conditions?"

Panizzardi Telegram (1894)

The Panizzardi telegram, intercepted at an early stage of the Dreyfus Affair, was also enciphered based on Baravelli's commercial code. As with the above, the unchanged line numbers gave a clue to the codebreakers. See another article for details.

Slater's Code

The Canadian government used Slater's code (Telegraphic Code (1870) by Robert Slater, for which see another article). In 2021, Matthew Brown from England found more than a hundred encoded telegrams from John A. Macdonald (Wikipedia), the first prime minister of Canada (1867-1873, 1878-1891), or people around him, in the Canadian archives and succeeded in decoding most of them (Cipherbrain, where a link is given to a list of all the telegrams he found). The telegrams are from 1873, 1879, 1881-1891, when Macdonald was in office.

Since Slater's code does not simply translate a word into a code group (a code word or a code number) by table lookup, knowledge of the codebook is not sufficient to allow reading the telegrams in code.

Specifically, Slater's code involves translating a word into a number and, after some manipulation of the number, translating it back into a different word. While Slater proposed many kinds of manipulation including addition, subtraction, transposition, and regrouping, Brown assumed the manipulation involves simply adding some key number, and wrote a computer program to check every possibility (Slater's code includes 25000 entries) to see if the resulting plaintext makes sense. This plausibility is scored according to the frequency of words occurring in the text. That is, if the text is full of uncommon words, its score is low. (More specifically, Brown sorted the words by frequency and simply used each word's position as its score. So, actually, lower scores are better.)

As it turned out, in 1873, an additive key such as 250, 50, or 500 was used. In 1881-1884, multiples of 100 such as 300, 400, 500 were used as the key, though other keys were also used (e.g., 365 for a message to Tupper (Wikipedia)).

From about 1885, keys were not so simple. As Brown found out, sometimes, the date was added to a base key (e.g., in December 1887, the key was 1242(=1234+8) on the 8th, 1243(=1234+9) on the 9th, and 1244(=1234+10) on the 10th; all to Tupper). It appears telegrams to Tupper (and others) follow this convention afterwards.

A subtractive key was sometimes used (e.g., -200 in a telegram of 15 June 1886 to J.W. Trutch).


Brown left unsolved a few telegrams, which were tackled by George Lasry, who used frequencies of word bigrams for scoring. Although not all the results were quite satisfactory, Brown confirmed they seemed to be the only ones which made grammatical sense among the top 100 solutions obtained by his method.

Statistical Analysis

Some of the examples above are only the simplest cases. Actual codebreaking seldom proceeds as readily as these and usually requires much more materials in code to get an insight.

After finding out what code groups occur how frequently, codebreakers examine what other code groups tend to precede or follow each particular code group. Before computers or tabulators (Wikipedia) were introduced, such a task required many typists, as told by Yardley (Chapter XIV). By such a tedious process, code groups are identified one by one.

Codebreaking by Espionage

Codebreakers were occasionally given a head start by espionage.

Spanish Codes (1918)

Yardley's The American Black Chamber, Chapter VIII, describes how he used agents to obtain information to break Spanish codes.

In 1918, Yardley was urged for solution of Spanish codes because Spain was suspected of assisting German espionage. Yardley sent an agent to a Spanish consulate in South America. The agent stole into the consulate at night, opened the steel safe, and found the diplomatic code but it would take time to photograph the entire codebook because only a few pages could be photographed each night. Although the code did not match the messages between Spain, America, and Germany, Yardley had expected that. Analysis indicated there were about ten different codes for different channels but Yardley thought there were only one or two basic codes and the others were merely secondary codes based on them.

Yardley hired a woman to draw information from a Spanish diplomatic secretary in the Spanish embassy in America. As it turned out, the Spaniards used 25 different codes for different stations, which were grouped into 9 different groups.

Then, a circular telegram was sent from the Spanish Foreign Office in Madrid in four different codes: to Washington and Costa Rica in code number ("indicator") 301, to Lima in 141, to Santo Domingo in 32, and to Panama in 74.

Soon, the agent provided the photograph of the Code No.74. It assigned four-figure numbers to alphabetically arranged words. The plaintext entry was written as, e.g., "abdic-ar-acion-es", which represented various inflected forms of the word, of which the decoder should identify the intended one according to the context. (This kind of entry is also seen in Napoleonic codes.)

This allowed decoding the circular telegram. With the plaintext revealed, it was a matter of time to identify code groups for the other codes.

Gray Code

Yokoi Toshiyuki, Teikoku Kaigun Kimitsu Shitsu (The Black Chamber of the Imperial Navy) (1953) describes decoding by the Japanese of the Gray Code of the US Department of State.

The Japanese Navy designated the code as NADED because the code group NADED occurred very frequently. First, code groups of the intercepted messages (which were in the form of either CVCVC or CVCCV, with C being consonant and V vowel) were alphabetically arranged and their occurrences were recorded. The most frequent group NADED was readily identified as a period.

But it was far from decoding messages. The Navy asked the military police to obtain waste paper from an American consulate. Laborious searching of the waste was sometimes rewarded with a draft, which revealed some code groups and their plaintext counterparts. When the US ambassador submitted a memorandum of the US Department of State to the Japanese government, a telegram intercepted before that had counterparts for the memorandum, which revealed many code groups.

Before long, about 5000 code groups were identified. Although it was only a small fraction of what seemed to have 100,000 entries, it allowed the general meaning of telegrams to be known.

Japanese Attempts at British Codes

In late 1934, the Navy asked the military police to get hold of a British codebook. When the British consulate in Sapporo was moving, a Japanese typist helping the work dropped from a window of the second floor a codebook, which was picked up and carried away by a workman. In about an hour, the codebook was returned to the original casket. In March 1935, the stolen codebook, which was a British inter-departmental codebook, allowed decoding of a telegram of the commander of the British Eastern fleet in Hankou to the ministers at home.

It was, however, a rarely used one. The main code of the British Foreign Ministry was a two-part code. In the summer of 1935, when some 400 code groups had been identified, the code was replaced.

Again, they resorted to the military police, who apparently had already made attempts at the request of the army. But its first attempt at an embassy in Tokyo had been a fiasco. The agent could open the safe but was detected and arrested. (Yokoi deplores their naïveté by referring to Yardley, whose agent stole into a consulate in South America rather than in Washington (Yardley p.192).)

Success came from Osaka. A military police officer disguised as a tailor had obtained access to the British consulate in Osaka since late 1934 and soon succeeded in inducing a clerk to obtain an imprint of the key to the safe in wax. One night when the consul was absent, Japanese agents opened the safe with a copy key and photographed ten-plus codebooks.

The codebooks were delivered to the Navy's secret agency in Shanghai. However, this was not the end, for important telegrams tended to be missing in the intercepts. As it turned out, they were sent via wireline rather than radio. The Japanese agency had to bribe telegraph operators of the Commercial Pacific and Great Eastern cable companies to obtain copies of telegrams.

Other Examples

Similar activities abound in history. When Yardley was struggling with Japanese telegrams (see another article), he contemplated stealing into the Japanese consulate in New York (Yardley p.264) or handing the Japanese military attaché a memorandum for transmission to Japan in order to obtain an encoded message of a known plaintext (ibid. p.266-268).

In 1935, the US Navy stole into a Japanese naval attache's apartment in Washington in an attempt to obtain information of the RED cipher machine.

Back in 1905, it was discovered that Japanese codebooks at the consulate at The Hague had been secretly photographed by a Russian agent.



©2014 S.Tomokiyo
First posted on 1 May 2014. Last modified on 25 July 2021.
Articles on Historical Cryptography
inserted by FC2 system