Secret codes (excerpt left: Voynich manuscript in the public domain; excerpt right: image Colourbox)

Scientific Codebreakers

An interdisciplinary cooperation between computer scientists and historians at Bergische Universität is working on modern decryption programs for deciphering secret codes

Cryptography, the word alone sounds mysterious and describes the science of encryption. Passionate readers and moviegoers quickly think of the U.S. thriller author Dan Brown and his hero, the symbol researcher Robert Langdon, who makes the secrets of coded writings comprehensible to us laymen. Not quite as simple, but just as exciting, researchers from various disciplines at Bergische Universität are working on both encrypted historical texts and modern information security. What all scientists have in common is the discovery of coded information and its simultaneous protection. In this interdisciplinary cooperation, historians and computer scientists of the Bergische Universität come together for the first time and find that they can help each other in their research.

A ciphered chance find was the beginning

Dr. Jessika Nowak, a staff member in Medieval History, came across ciphered legation reports in Milan's Archivio di Stato while doing research for her dissertation on a 15th-century Italian cardinal. "They were letters that were entirely or partially ciphered," the researcher says, "In a few cases, one was lucky that contemporaries had already made appropriate decryptions, now and then the original, not yet coded drafts of the letters were preserved, and sometimes there were keys that had been handed down, so that one could reconstruct what had been in those passages." Most of the time, however, he said, the encodings were very complex and elaborate, and the relevant passages were often unreadable to historians. Contact with computer science colleagues Prof. Dr.-Ing. Tibor Jager, Dr.-Ing. Kai Gellert, Jonas von der Heyden and the head of the Digital Humanities department, Prof. Dr. Patrick Sahle, whose work forms a kind of intersection between the humanities and computer science, suddenly opened up new possibilities for her.

Where do secret languages come from?

As far back as ancient Egypt, there was evidence of coded texts describing a deity who could no longer be named, and a potter from Mesopotamia ciphered his glazes around 1500 B.C. to protect himself from idea theft. The Spartans had also developed a clever secret language in the 5th century BC. On a leather strap, which one wound around a staff, military messages were delivered. Only those who had a staff of the same diameter could receive the correct information when unwinding the leather strap. Caesar, for example, used two discs with the Latin alphabet on top of each other, which had to be turned towards each other to get the right information, Charlemagne communicated with his generals through secret codes, and Hildegard of Bingen is even the oldest known medieval celebrity who designed a secret script, Lingua Ignota, which was supposed to serve magic purposes. "In the classical Caesar cipher, each letter in the cipher is replaced by the letter three places further back in the alphabet," Nowak explains. "Similar substitutions are also known from the Middle Ages. So letters are simply arranged differently or, for example, vowels are replaced by an ascending or descending number of dots. Vowels are also replaced by altered H-bars, i.e. components of the letter H, and sometimes individual letters are replaced by runes, or, to make access even more difficult, by fantasy characters." Even whole words, which were supposed to conceal particularly important persons, such as king, emperor or pope, were often ciphered. In order to conceal explosive information, people liked to use familiar words from everyday life. In Venetian legation reports, for example, sensitive key words were replaced by terms from trade goods. "There one finds then very many fish and fruits, whereas in risky letters from the ecclesiastical area one veiled sometimes explosive with vocabulary from the monastic life." In the very elaborate Milanese letters of the 15th century, with which the historian is concerned, one also knew how to eradicate further break-in points; substitutions were made exclusively with fancy characters. One balanced the letter frequency by the employment of several alphabets, consisting of abstract characters. Especially for the vowels, which occur most frequently, there were up to four characters. One renounced a word separation, sometimes one merged also the last letter of a word with the first of the following word. Furthermore, frequently occurring words in the text, especially conjunctions and prepositions, as well as key words such as war or peace, but also central actors and places were hidden by a fancy character each. Classical break-in points for decoding were also so-called qu-combinations, because the letter "q" was usually always followed by a "u", therefore the qu-combinations were also eliminated and replaced by secret characters. And characters devoid of meaning were also interspersed in the text for the confusion of the unauthorized reader, as well as intentional misspellings for this purpose. "It is a combination of a polyalphabetic cipher and a long list of coded keywords consisting of fancy characters that further raise the mental hurdle, coupled with meaningless characters. And each addressee has a different key. So, as a historian, if I don't have the particular key for it, I'm lost," she says.

The Voynich manuscript, an unsolvable riddle?

In the same century as the Milanese ciphers, the most discussed object in historical cryptography to date was created, the so-called Voynich Manuscript, on which historians still cut their teeth. "It's very exciting," the scientist reports enthusiastically, "because it's an extremely mysterious manuscript. The name comes from Wilfrid Michael Voynich, who acquired it in 1912. The manuscript is so famous because it is written in a script that has not been deciphered to this day. It has been provided with many subsequently colored illustrations, some of them very bizarre, which also raise many questions, and because it is not so easy to work with the text, one has, according to the illustrations, divided this manuscript into different sections." These deal with herbalism, anatomy, the cosmos, and pharmacy, among other things. And although the manuscript even mentions a key, it has not helped decipher it to this day, he said. Although there are many speculations, but the mystery could not be solved until today.

A cat-and-mouse game thousands of years old

Secret ciphers have been used time and again for tactical purposes for political reasons right up to the present day, including during World War I and World War II. Thanks to the 2014 movie "The Imitation Game," which focused on the German Wehrmacht's rotor key machine called "Enigma" that was cracked by Englishman Alan Turings, numerous people are once again interested in this topic. The "Enigma" was only one of many rotor cipher machines that can be admired today in the German Spy Museum in Berlin. Prof. Dr.-Ing. Tibor Jager, head of the IT Security and Cryptography department at the Faculty of Electrical Engineering, Information Technology and Media Technology, who develops cryptographic methods that can effectively protect security and privacy in modern applications, says: "First of all, we have to say that it was not only Turing, but above all Polish cryptanalysts such as Marian Rejewski and also a few other scientists who did the decisive preliminary work, had quite central ideas and are usually not mentioned. However, they mainly lacked the computing power. They teamed up with the British, who then built machines called bombas - they ticked like a bomb." The "Enigma" resembled an old typewriter, Jager says, continuing, "The idea behind it was that instead of little arms that normally press letters on paper, there were rollers, so that every time you pressed a key for a letter, a little light would come on. Each little light in turn also corresponds to a letter. So for example, I press an 'A` and a 'W` lights up, and which letter is encoded into which other letter is determined by the position of the rollers. This is how I know how to encrypt. The rollers that are under the keypad work similar to an odometer on the earlier cars. You have a wheel, which turns one step further with each keystroke and when one turn is finished, the next wheel jumps over and I am one step further again. So if you then press 'A` again, this is then encoded into a different letter than before, for example a 'D`". Variations of this machine exist with different rollers and it results in a new letter every time a wheel turns, he said. Other connectors that were part of the key further complicated decipherment, so the combinations were almost infinite, he said. "This is where the genius of these Polish mathematicians comes into play. They understood that these are all permutations, that is, arrangements of objects in a certain order. They looked into mathematical theories of permutations and realized that you could break them down into cycles and use that as a lever to gradually constrain this huge key space. With the machines created for that purpose, you could then search this huge set of keys in an automated way." From 1940 to the end of the war, two and a half million radio messages could thus be intercepted and deciphered.

The invention of provable security

Much has happened since the Caesar cipher, but the most impressive advance didn't occur until 1984. "That's not so long ago," Jager says, "when so-called provable security was invented. Some people say that was when the art of encryption became the science of encryption. Since then, we've been able to build particularly good and mathematically provable secure and practical ciphers." Dr.-Ing. Kai Gellert, a member of Team Jager, regularly offers events titled "Modern Cryptography" with the 1984 findings in mind. Secure ciphers are important for every citizen in the area of online banking, for example. In addition to confidentiality, however, people have other requirements there, Jager knows, saying, "If I transfer 10 euros to someone, I don't want someone to suddenly turn it into 100 euros." In the course of digitalization, cryptographic techniques are also becoming increasingly important on the Internet, which, apart from banking secrecy, should also protect the user, because there have already been cases where my search behavior can also be exploited there. Jager cites a prominent example. "There are reports, for example, that the company Cambridge Analytica analyzed the behavior of Internet users in order to specifically manipulate their voting behavior in U.S. elections and the Brexit vote. Such methods are made possible because we communicate with each other digitally. Cryptography can help there, too."

Modern encryption methods = new secret scripts

"There are, of course, new ciphers. These are, for example, today's modern encryption methods, all the cryptographic methods, which are still based on partly similar principles, but are mathematically much better understood," Gellert explains. Based on the original method of symmetrical encryption, which could be ciphered by the sender and deciphered by the receiver because both possessed the same key, complicated methods have since developed in the computer age that take a long time to explain, even to students with technical expertise. A whole new world is opening up there, Gellert enthuses, which must also continuously take dangers into account and ward them off. "You hear in the media that quantum computers are becoming more and more realistic, which are a danger for certain mathematical problems on which our methods are based," Gellert says. As a result, researchers are currently looking for methods that will be standardized to protect against these computer attacks. A recent decision by the U.S. standardization authority could therefore be a first groundbreaking success in this regard. Says Gellert: "The American standardization authority (National Institute of Standards and Technology, or NIST for short) recently selected the first procedures that are now to be standardized after years of discussion. This will allow us to ensure the security of our data in the future."

History meets computer science

To understand the approach of cryptographic history research and the application of secure cryptographic procedures of computer science, it first required some conversations in which the very different subjects explained their working methods, because the modern cryptocommunity has a very different approach, researches with different means and also different literature. "There's also a technical component that plays into it," Gellert says, "because back then, people were still encrypting by hand. So these were letters that are difficult to access in the archives. Now with World War II, these cipher machines come along, you could intercept radio, and more materials come together on a topic that are also more readily available," and Jager adds, "Plus, in modern scholarship, these old writings are considered insecure, so they're not as interesting." The big problem with old pieces, he says, is the way they are stored. Various documents lie in various boxes on endless rows of shelves in the archives that have to be sifted through by hand, he says. These are not digitized and thus can only be viewed in a very time-consuming manner, Jager knows. "At this point, it becomes interesting again for us computer scientists, because the question is not so much how to crack it, but how to crack it automatically on a large scale, i.e. without this laborious work. That's where it gets algorithmically interesting for us again, because you can now teach the computer to decrypt these things. What we always need is a computer-readable representation." Once this prerequisite is met, Jonas von der Heyden explains, "it is interesting to develop algorithms that can then be used to automatically examine and decrypt encrypted documents. For this, we then need historians to check the results for plausibility and also explain symbols that can then be entered into the program." This is where the third subject, "Digital Humanities," headed by Prof. Dr. Patrick Sahle, comes into play. He acts as a mediator, so to speak, between history and computer science; he knows how these historical documents can be transferred to digitization.

Authenticity of documents must be preserved

Sahle believes that preparing historically encoded documents for modern research is already largely standard practice. "However, it becomes more difficult in smaller archives or those that do not have their own digitization department," the expert explains. "For this, however, there is 'semi-professional` equipment with which users can make digitized copies, if the archives allow it." If that is not the case, one must unfortunately fall back on simple cell phone photos, photocopies or, for example, the microfilms that already exist in archives. Descriptions in the form of 'metadata' are needed for the digital copies, and there is also an established practice that can be followed. Digital editions are now published both in the form of facsimiles, or images, and transcriptions. "In the process, the transcriptions can also be normalized, regularized, modernized, or transferred to another writing system multiple times, allowing them to be used more closely to the source or user."

Of course, the cultural-historical dimension and authenticity of the documents should not be lost, Sahle explains. "For transparency and verifiability of processing and decryption, one would also like to reproduce the ciphered texts in the ciphers themselves. Creating an appropriate font (typeface) is not witchcraft and 'medium effort`." Since there are a great many cipher keys and widely differing character systems, he said, the main thing to examine is the extent to which such processes can be automated. "To this end, we are testing image recognition algorithms with colleagues in Erlangen that can independently identify and cluster the characters that occur. The next step would then be automatic font generation. However, this still requires some development effort and will probably not be able to run completely without human intervention."

Until it is possible to algorithmically evaluate old manuscripts in archives across the board using computer programs, historians such as Dr. Jessika Nowak will continue to meticulously research encrypted documents and letters about intrigue, love, power and rule for a long time to come, and will be extremely pleased about even small decryptions.

Uwe Blass

Dr. Jessika Nowak works as a research assistant in medieval history. Prof. Dr.-Ing. Tibor Jager heads the IT Security and Cryptography department of the Faculty of Electrical Engineering, Information Technology and Media Technology. His colleagues include Dr.-Ing. Kai Gellert and Jonas von der Heyden. Prof. Dr. Patrick Sahle is head of Digital Humanities in the History Department.

More infos about #uniwuppertal: