I recently succeeded in deciphering a fragment of a letter entirely in cipher printed by Davys (1737) for the reader. The letter was from the Earl of Clanricarde to the Marquis of Ormonde. Although the cipher turned out to be one of the simplest kind (it took Davys only about four hours), a brief description of how I solved the cipher "might be of use to an industrious man," as Davys put it (p.30).
The fragment in cipher is as follows.
Numerical ciphers in the sixteenth century England generally used numbers up to several hundred (see here for ciphers of Charles I). Although there are a few instances of numbers above 1000 (Peterson's, Manning's, and Bampfield's; see here), it never got to such many-digit numbers. Davys also says he had never seen this kind of cipher.
Come to think of it, however, the number of codes would not be much more than 1000, if at all. These long figures should represent some combinations of a limited number of low figures. A simplest assumption would be that the long figures actually represent two-digit codes run into one another. Such an assumption can be supported by the fact that all the long figures (i.e., those except for 102, 107, etc.) have an even number of digits. Further, when one breaks down the long figures into two-digit groups, such groups generally appear to be in the range from 40 to 90. Thus, after all, the cipher appears to be essentially a two-digit code system, whereby each letter of the alphabet is assigned two figures (at least in average). It would be similar to many of the ciphers of Charles I (see here).
(In reaching such a working hypothesis, I have to admit that, unlike Davys, I did have seen a cipher like this. A letter dated Rome, January 12 1675, addressed to Edward Coleman, who was executed in the turmol of the Popish Plot, contains similar runs of many digits, which turned out to consist of two-digit groups. (Hay p.203 ff.) [After writing this, I found Davys described this cipher letter in his Postscript. See another article.])
It would then be natural to assume that three-digit figures and possibly figures above 90 represent frequently used words or names. Probably breaks of figures represent word breaks. This last assumption may be challenged by sequences such as "78 30" and "37 73" because two one-letter words, "a" or "I" in English, succeeding one another is unlikely in English. However, such a problem may be solved by nulls or transcription errors ("7830" might easily be copied as "78 30").
The usual frequency analysis indicated the following high-frequency figures: "69" (21 times) - "45" (17) - "83" (15) - "78" (14) - "53" (12) - "60"/"73"/"49"/"76" (10). These figures are likely to represent letters of ETAONRISH, the letters known to be of high-frequency in English. In particular, the highest frequency figure "69" may represent "e" but the margin is meagre to say anything conclusive.
Taking advantage of what seems to be word breaks, frequency of "words" was examined. The high-frequency groups were "83 78" (7 times), "76 63" (5 times), and "66 45 68" (4 times). In English, it is well-known that the most frequently used three-letter word is "the." I was tempted to identify 66(t)-45(h)-68(e) but this hypothesis had to be discarded outright. First of all, the frequency of "66 45 68" is too low compared with the other two-letter words. Further, "68", which should be of very high-frequency if it represents "e", appears only 4 times. Again, considering that this is not a simple substitution cipher, nothing is conclusive, since fifty figures 40-90 allows frequency of individual letters to be obscured by assigning many figures to high-frequency letters such as "e". At least, it seemed very likely that such a high-frequency word like "the" would be given a special code above 90.
The most distinguishing result of the word frequency count is the high frequency of "83 78". With this, however, it can represent any of common two-letter words such as "he", "me", "we", "be", "at", "in", "to", "or". (Other possibilities "so" or "no" are not likely to appear as many as seven times in such a short fragment.) In order to find out which is correct, it was examined how each of "83" and "78" was used in the ciphertext. The fact that figure "83", which is the third most frequent figure, occurs at the end of a word implies it would not be "a", "i", or "o." Further, it would not be "e" because it is the first letter of a high-frequency two-letter word. Thus, of ETAONRISH, "83" may correspond to "h" or "t" and the two-letter word "83 78" may be "he" or "to." Considering that "78" is the fourth most frequent letter, it would be more likely to be "o" than "e". Of course, however, considering the homophonicity with the fifty figures, we cannot jump to the conclusion.
About this time, I also produced a contact chart (see Kahn pp.99-105). Although it did not provide a clue as in the textbook, it did show that "78" does not appear together with other high-frequency letters. This is consistent with the above hypothesis that "83" would be "o" rather than "e." Thus, I assumed "83" to be "t" and "78" to be "o."
The next step was to substitue these findings in the original ciphertext and see any known pattern appears. With the short fragment, the most that could be obtained was "75 78(o) 58 83(t)", which might be "boat", "bolt", "boot", "bout", etc. etc. It might be just anything and there seemed to be no way to further proceed.
Then came a breakthrough. I noticed that the distance between "83" and "78" is just the right distance between "t" and "o." This implies that the letters of the alphabet are assigned code numbers in a regular order. Further, there is another two-letter combination "59 78", occurring twice. A two-letter word ending in "o" (78) might again be "to", though with such a low frequency, "so" and "no" might not be rejected altogether. However, again, the distance between "83" and "59" is 24, the number of letters in the alphabet! (At the time, "i" and "j" as well as "u" and "v" were identified.)
I was sure this could not be a coincidence. Without testing the hypothesis any further, I keyed in the following cipher table.
Running a Perl script for deciphering (yes, I did have an advantage of computer tools over Davys) immediately showed that my assumption was correct.
As it turned out, the high-frequency figures were identified as follows: "69" (e) - "45" (e) - "83" (t) - "78" (o) - "53" (n) - "60"(u)/"73"(i)/"49"(i)/"76"(m). The four-letter pattern was found to be "75(l) 78(o) 58(s) 83(t)".
The numbers up to 40 appeared to be nulls. It was not difficult to guess "108" stood for "the."
The final decoding is as follows. This letter (17 March 1643 [i.e., 1644]) is printed as CCLI (p.63) in Carte. (Actually, I filled further blanks from this printed version.) In postscript, Clanricarde makes an excuse of using cipher: "I beleeue I should not haue troubled your lordship with a character [i.e., cipher], but that St. Patrick's day makes letters subject to miscarry; and yet at best leisure your lordship may be pleased to be the translator of it."
John Davys, An Essay on the Art of Decyphering (1737)
Thomas Carte, The life of James duke of Ormond, a new edition (1851) (Google)
Malcom V. Hay, Jesuits and the Popish Plot (Google, reprint in 2003)
David Kahn, The Codebreakers (1967)