The Voynich Manuscript

The Voynich manuscript is an illustrated codex hand-written in an unknown writing system. The vellum on which it is written has been carbon-dated to the early 15th century (1404–1438), and may have been composed in Northern Italy during the Italian Renaissance. The manuscript is named after Wilfrid Voynich, a Polish book dealer who purchased it in 1912

Some of the pages are missing, but about 240 remain. The text is written from left to right, and most of the pages have illustrations or diagrams.

The Voynich manuscript has been studied by many professional and amateur cryptographers, including American and British codebreakers from both World War I and World War II. No one has yet succeeded in deciphering the text, and it has become a famous case in the history of cryptography. The mystery of the meaning and origin of the manuscript has excited the popular imagination, making the manuscript the subject of novels and speculation. None of the many hypotheses proposed over the last hundred years has yet been independently verified

The Voynich manuscript was donated by Hans P. Kraus to Yale University's Beinecke Rare Book and Manuscript Library in 1969, where it is catalogued under call number MS 408

AN award-winning professor from the University of Bedfordshire has followed in the footsteps of Indiana Jones by cracking the code of a 600 year old manuscript, deemed as ‘the most mysterious’ document in the world.

Stephen Bax, Professor of Applied Linguistics, has just become the first professional linguist to crack the code of the Voynich manuscript using an analytical approach.

The world-renowned manuscript is full of illustrations of exotic plants, stars, and mysterious human figures, as well as many pages written in an unknown text.

Up until now the 15th century cryptic work has baffled scholars, cryptographers and codebreakers who have failed to read a single letter of the script or any word of the text.

Over time it has attained an infamous reputation, even featuring in the latest hit computer game Assassin’s Creed, as well as in the Indiana Jones novels, when Indiana decoded the Voynich and used it to find the ‘Philosopher's Stone’.

However in reality no one has come close to revealing the Voynich’s true messages.

Many grand theories have been proposed. Some suggest it was the work of Leonardo da Vinci as a boy, or secret Cathars, or the lost tribe of Israel, or most recently Aztecs … some have even proclaimed it was done by aliens!

Professor Bax however has begun to unlock the mystery meanings of the Voynich manuscript using his wide knowledge of mediaeval manuscripts and his familiarity with Semitic languages such as Arabic. Using careful linguistic analysis he is working on the script letter by letter.

“I hit on the idea of identifying proper names in the text, following historic approaches which successfully deciphered Egyptian hieroglyphs and other mystery scripts, and I then used those names to work out part of the script,” explained Professor Bax.

“The manuscript has a lot of illustrations of stars and plants. I was able to identify some of these, with their names, by looking at mediaeval herbal manuscripts in Arabic and other languages, and I then made a start on a decoding, with some exciting results.”

The manuscript measures 23.5 by 16.2 by 5 centimetres (9.3 by 6.4 by 2.0 in), with hundreds of vellum pages collected into eighteen quires. The total number of pages is around 240, but the exact number depends on how the manuscript's unusual foldouts are counted.The quires have been 
numbered from 1 to 20 in various locations, with numerals consistent with the 1400s, and the top righthand corner of each recto (righthand) page has been numbered from 1 to 116, with numerals of a later date. From the various numbering gaps in the quires and pages, it seems likely that in the past the manuscript had at least 272 pages in 20 quires, some of which were already missing when Wilfrid Voynich acquired the manuscript in 1912. There is strong evidence that many of the book's bifolios were reordered at various points in its history, and that the original page order may well have been quite different from what it is today.

The binding and covers are not original to the book, but date to during its possession by the Collegio Romano

Every page in the manuscript contains text, mostly in an unknown script, but some have extraneous writing in Latin script. Many pages contain substantial drawings or charts which are colored with paint. Based on modern analysis, it has been determined that a quill pen and iron gall ink were used for the text and figure outlines; the colored paint was applied (somewhat crudely) to the figures, possibly at a later date

Among the words he has identified is the term for Taurus, alongside a picture of seven stars which seem to be the Pleiades, and also the word KANTAIRON alongside a picture of the plant Centaury, a known mediaeval herb, as well as a number of other plants.

Although Professor Bax’s decoding is still only partial, it has generated a lot of excitement in the world of codebreaking and linguistics because it could prove a crucial breakthrough for an eventual full decipherment.

“My aim in reporting on my findings at this stage is to encourage other linguists to work with me to decode the whole script using the same approach, though it still won’t be easy. That way we can finally understand what the mysterious authors were trying to tell us,” he added.

“But already my research shows conclusively that the manuscript is not a hoax, as some have claimed, and is probably a treatise on nature, perhaps in a Near Eastern or Asian language.”

Marcelo Montemurro, a theoretical physicist from the University of Manchester, UK, has spent many years analysing its linguistic patterns and says he hopes to unravel the manuscript's mystery, which he believes his new research is one step closer to doing.

"The text is unique, there are no similar works and all attempts to decode any possible message in the text have failed. It's not easy to dismiss the manuscript as simple nonsensical gibberish, as it shows a significant [linguistic] structure,"
Dr Montemurro and a colleague used a computerised statistical method to analyse the text, an approach that has been known to work on other languages.
They focused on patterns of how the words were arranged in order to extract meaningful content-bearing words.
"There is substantial evidence that content-bearing words tend to occur in a clustered pattern, where they are required as part of the specific information being written," he explains.
"Over long spans of texts, words leave a statistical signature about their use. When the topic shifts, other words are needed.
"The semantic networks we obtained clearly show that related words tend to share structure similarities. This also happens to a certain degree in real languages."
Dr Montemurro believes it unlikely that these features were simply "incorporated" into the text to make a hoax more realistic, as most of the required academic knowledge of these structures did not exist at the time the Voynich manuscript was created.
Though he has found a pattern, what the words mean remains a mystery. The very fact that a century of brilliant minds have analysed the work with little progress means some believe a hoax is the only likely explanation.
Unidentified language

Gordon Rugg, a mathematician from Keele University, UK, is one such academic. He has even produced his own complex code deliberately similar to "Voynichese" to show how a text can appear to have meaningful patterns, even though it is "gibberish hoax text".

He says the new findings do not rule out the hoax theory, which the researchers argue.

"The findings aren't anything new. It's been accepted for decades that the statistical properties of Voynichese are similar, but not identical, to those of real languages.

"I don't think there's much chance that the Voynich manuscript is simply an unidentified language, because there are too many features in its text that are very different from anything found in any real language."

Image captionDr Rugg made a code purposely similar to the Voynich text to show how easy it was to produce

Gordon Rugg does not believe it contains an unknown code, which is another theory of what the text may be: "Some of the features of the manuscript's text, such as the way that it consists of separate words, are inconsistent with most methods of encoding text. Modern codes almost invariably avoid having separate words, as those would be an easy way to crack most coding systems."

As to its enduring appeal, an unsolved cipher could be "hiding almost anything", says Craig Bauer, author of Secret History: The Story of Cryptology.

"It could solve a major crime, reveal buried treasure worth millions or in the case of the Voynich manuscript, rewrite the history of science," he adds.

Dr Bauer's opinion of whether it is meaningful is often swayed, he admits. While he recently believed it to be a hoax, the new analysis has now shifted his opinion.

But despite this, he still believes it is a made up language, as opposed to a real naturally evolving one, or "it would have been broken years ago".

"However, I still feel that it's very much an open question and I may change my mind a few times before a proof is obtained one way or the other."

But Dr Montemurro is firm in his belief, and argues that the hoax hypothesis cannot possibly explain the semantic patterns he has discovered.

He is aware that his analysis leaves many questions still unanswered, such as whether it is an encoded version of a known language or whether a totally invented language.

"After this study, any new support for the hoax hypothesis should address the emergence of this sophisticated structure explicitly. So far, this has not been done.

"There must be a story behind it, which we may never know," Dr Montemurro adds.

The bulk of the text in the manuscript of 240 pages is written in an unknown script, running left to right. Most of the characters are composed of one or two simple pen strokes. While there is some dispute as to whether certain characters are distinct or not, a script of 20–25 characters would account for virtually all of the text; the exceptions are a few dozen rarer characters that occur only once or twice each. There is no obvious punctuation.

Much of the text is written in a single column in the body of a page, with a slightly ragged right margin and paragraph divisions, and sometimes with stars in the left margin.Other text occurs in charts or as labels associated with illustrations. There are no indications of any errors or corrections made at any place in the document. The ductus flows smoothly, giving the impression that the symbols were not enciphered, as there is no delay between characters as would normally be expected in written encoded text.

The text consists of over 170,000 characters, with spaces dividing the text into about 35,000 groups of varying length, usually referred to as "words". The structure of these words seems to follow phonological or orthographic laws of some sort, e.g., certain characters must appear in each word (like English vowels), some characters never follow others, some may be doubled or tripled but others may not, etc. The distribution of letters within words is also rather peculiar: some characters occur only at the beginning of a word, some only at the end, and some always in the middle section.Many researchers have commented upon the highly regular structure of the words.

Some words occur only in certain sections, or in only a few pages; others occur throughout the manuscript. There are very few repetitions among the thousand or so labels attached to the illustrations. There are practically no words with fewer than two letters or more than ten.There are instances where the same common word appears up to three times in a row.Words that differ by only one letter also repeat with unusual frequency, causing single-substitution alphabet decipherings to yield babble-like text. In 1962, Elizebeth Friedman described such attempts as "doomed to utter frustration"

Various transcription alphabets have been created to equate the Voynich characters with Latin characters in order to help with cryptanalysis, such as the European Voynich Alphabet. The first major one was created by cryptographer William F. Friedman in the 1940s, where each line of the manuscript was transcribed to an IBM punch card to make it machine readable

Only a few words in the manuscript are considered not to be written in the unknown script:
f1r: A sequence of Latin letters in the right margin parallel with characters from the unknown script. There is also the now unreadable signature of "Jacobj à Tepenece" in the bottom margin.
f17r: A line of writing in the Latin script in the top margin.
f70v–f73v: The astrological series of diagrams in the astronomical section has the names of ten of the months (from March to December) written in Latin script, with spelling suggestive of the medieval languages of France, northwest Italy or the Iberian Peninsula.
f66r: A small number of words in the bottom left corner near a drawing of a naked man. They have been read as "der musz del", a High German word for a widow's share.
f116v: Four lines of writing written in rather distorted Latin script, except for two words in the unknown script. The words in Latin script appear to be distorted with characteristics of the unknown language. The lettering resembles European alphabets of the late 14th and 15th centuries, but the words do not seem to make sense in any language.

It is not known whether these bits of Latin script were part of the original text or were added later.

Because the text cannot be read the illustrations are conventionally used to divide most of the manuscript into six different sections. Each section is typified by illustrations with different styles and supposed subject matter, except for the last section, in which the only drawings are small stars in the margin. Following are the sections and their conventional names:
Herbal: Each page displays one or two plants and a few paragraphs of text—a format typical of European herbals of the time. Some parts of these drawings are larger and cleaner copies of sketches seen in the "pharmaceutical" section. None of the plants depicted are unambiguously identifiable.
Astronomical: Contains circular diagrams, some of them with suns, moons, and stars, suggestive of astronomy or astrology. One series of 12 diagrams depicts conventional symbols for the zodiacal constellations (two fish for Pisces, a bull for Taurus, a hunter with crossbow for Sagittarius, etc.). Each of these has 30 female figures arranged in two or more concentric bands. Most of the females are at least partly naked, and each holds what appears to be a labeled star or is shown with the star attached by what could be a tether or cord of some kind to either arm. The last two pages of this section (Aquarius and Capricornus, roughly January and February) were lost, while Aries and Taurus are split into four paired diagrams with 15 women and 15 stars each. Some of these diagrams are on fold-out pages.
Biological: A dense continuous text interspersed with figures, mostly showing small naked women, some wearing crowns, bathing in pools or tubs connected by an elaborate network of pipes.
Cosmological: More circular diagrams, but of an obscure nature. This section also has foldouts; one of them spans six pages and contains a map or diagram, with nine "islands" or "rosettes" connected by "causeways" and containing castles, as well as what might be a volcano.
Pharmaceutical: Many labeled drawings of isolated plant parts (roots, leaves, etc.); objects resembling apothecary jars, ranging in style from the mundane to the fantastical; and a few text paragraphs.
Recipes: Full pages of text broken into many short paragraphs each marked with a star in the left margin

The overall impression given by the surviving leaves of the manuscript is that it was meant to serve as apharmacopoeia or to address topics in medieval or early modern medicine. However, the puzzling details of illustrations have fueled many theories about the book's origins, the contents of its text, and the purpose for which it was intended.

The first section of the book is almost certainly herbal, but attempts to identify the plants, either with actual specimens or with the stylized drawings of contemporary herbals, have largely failed.Only a few of the plant drawings (such as a wild pansy and the maidenhair fern) can be identified with reasonable certainty. Those herbal pictures that match pharmacological sketches appear to be clean copies of these, except that missing parts were completed with improbable-looking details. In fact, many of the plant drawings in the herbal section seem to be composite: the roots of one species have been fastened to the leaves of another, with flowers from a third.

Hugh O'Neill believed that one illustration depicted a New World sunflower, which would help date the manuscript and open up intriguing possibilities for its origin; unfortunately the identification is only speculative.

The basins and tubes in the "biological" section are sometimes interpreted as implying a connection to alchemy, yet bear little obvious resemblance to the alchemical equipment of the period

Astrological considerations frequently played a prominent role in herb gathering, bloodletting and other medical procedures common during the likeliest dates of the manuscript. However, apart from the obvious Zodiac symbols, and one diagram possibly showing the classical planets, interpretation remains speculative
Some suspected Voynich of having fabricated the manuscript himself As an antique book dealer, he probably had the necessary knowledge and means, and a "lost book" by Roger Bacon would have been worth a fortune. Furthermore, Baresch's letter (and Marci's as well) only establish the existence of a manuscript, not that the Voynich manuscript is the same one spoken of there. In other words, these letters could possibly have been the motivation for Voynich to fabricate the manuscript (assuming he was aware of them), rather than as proofs authenticating it. However, many consider the expert internal dating of the manuscript and the recent discovery of Baresch's letter to Kircher as having eliminated this possibility

Voynich was able, sometime before 1921, to read a name faintly written at the foot of the manuscript's first page: "Jacobj à Tepenece". This is taken to be a reference to Jakub Hořčický of Tepenec (1575–1622), also known by his Latin name Jacobus Sinapius. Rudolph II had ennobled him in 1607; appointed him his Imperial Distiller; and had made him both curator of his botanical gardens as well as one of his personal physicians. Voynich, and many other people after him, concluded from this that Jacobus owned the Voynich manuscript prior to Baresch, and drew a link to Rudolf's court from that, in confirmation of Mnishovsky's story.

Jacobus's name is still clearly visible under UV light: however, it does not match the copy of his signature in a document located by Jan Hurych in 2003 As a result, it has been suggested that the signature was added later, possibly even fraudulently by Voynich himself. Yet because the writing on page f1r might well have been an ownership mark added by a librarian at the time, the difference between the two signatures does not necessarily disprove Horczicky's ownership.

It has been noted that Baresch's letter bears some resemblance to a hoax that orientalist Andreas Mueller once played on Kircher. Mueller sent some unintelligible text to Kircher with a note explaining that it had come from Egypt, and asking Kircher for a translation: which Kircher, reportedly, produced at once. It has been speculated that these were both cryptographic tricks played on Kircher to make him look foolish: but the Voynich manuscript is on such a vastly different scale to a few signs in a letter that this seems somewhat out of scale for such an endeavor.
Raphael Mnishovsky, the friend of Marci who was the reputed source of Bacon's story, was himself a cryptographer (among many other things) and apparently invented a cipher that he claimed was uncrackable (ca. 1618). This has led to the speculation that Mnishovsky might have produced the Voynich manuscript as a practical demonstration of his cipher and made Baresch his unwitting test subject. Indeed, the disclaimer in the Voynich manuscript cover letter could mean that Marci suspected some kind of deception was at play. However, there is no definite evidence for this theory.

In his 2006 book, Nick Pelling proposed that the Voynich manuscript was written by the 15th century North Italian architect Antonio Averlino (also known as "Filarete"), a theory broadly consistent with the radiocarbon dating.

Richard SantaColoma has speculated that the Voynich Manuscript may be connected to Cornelis Drebbel, initially suggesting it was Drebbel's cipher notebook on microscopy and alchemy, and then later hypothesising it is a fictional "tie-in" to Francis Bacon's utopian novel New Atlantis in which some Drebbel-related items (submarine, perpetual clock) are said to appear
According to the "letter-based cipher" theory, the Voynich manuscript contains a meaningful text in some European language that was intentionally rendered obscure by mapping it to the Voynich manuscript "alphabet" through a cipher of some sort—an algorithm that operated on individual letters. This has been the working hypothesis for most twentieth-century deciphering attempts, including an informal team of NSA cryptographers led by William F. Friedman in the early 1950s.The main argument for this theory is that the use of a strange alphabet by a European author is awkward to explain except as an attempt to hide information. Indeed, even Roger Bacon knew about ciphers, and the estimated date for the manuscript roughly coincides with the birth of cryptography in Europe as a relatively systematic discipline.

The counterargument is that almost all cipher systems consistent with that era fail to match what we see in the Voynich manuscript. For example, simple monoalphabetic ciphers can be excluded because the distribution of letter frequencies does not resemble that of any common language; while the small number of different letter-shapes used implies that we can rule out nomenclator ciphers and homophonic ciphers, because these typically employ larger cipher alphabets. Similarly, polyalphabetic ciphers, first invented by Alberti in the 1460s and including the later Vigenère cipher, usually yield ciphertexts where all cipher shapes occur with roughly equal probability, quite unlike the language-like letter distribution the Voynich Manuscript appears to have.

However, the presence of many tightly grouped shapes in the Voynich manuscript (such as "or", "ar", "ol", "al", "an", "ain", "aiin", "air", "aiir", "am", "ee", "eee", etc.) does suggest that its cipher system may make use of a ""verbose cipher"", where single letters in a plaintext get enciphered into groups of fake letters. For example, the first two lines of page f15v (seen above) contain "or or or" and "or or oro r", which strongly resemble how Roman numbers such as "CCC" or "XXXX" would look if verbosely enciphered. Yet, even though verbose encipherment is arguably the best match, it still falls well short of being able to explain all of the Voynich manuscript's odd textual properties

It is also entirely possible that the encryption system started from a fundamentally simple cipher and then augmented it by adding nulls (meaningless symbols), homophones (duplicate symbols), transposition cipher (letter rearrangement), false word breaks, and so on

According to the "codebook cipher" theory, the Voynich manuscript "words" would actually be codes to be looked up in a "dictionary" or codebook. The main evidence for this theory is that the internal structure and length distribution of many words are similar to those of Roman numerals—which, at the time, would be a natural choice for the codes. However, book-based ciphers are viable only for short messages, because they are very cumbersome to write and to read.


This theory holds that the text of the Voynich manuscript is mostly meaningless, but contains meaningful information hidden in inconspicuous details—e.g. the second letter of every word, or the number of letters in each line. This technique, called steganography, is very old, and was described by Johannes Trithemius in 1499. Though it has been speculated that the plain text was to be extracted by a Cardan grille of some sort, this seems somewhat unlikely because the words and letters are not arranged on 

anything like a regular grid. Still, steganographic claims are hard to prove or disprove, since stegotexts can be arbitrarily hard to find. An argument against steganography is that having a cipher-like cover text highlights the very existence of the secret message, which would be self-defeating: yet because the cover text no less resembles an unknown natural language, this argument is not hugely persuasive.

It has been suggested that the meaningful text could be encoded in the length or shape of certain pen strokesThere are indeed examples of steganography from about that time that use letter shape (italic vs. upright) to hide information. However, when examined at high magnification, the Voynich manuscript pen strokes seem quite natural, and substantially affected by the uneven surface of the vellum.

Natural language

Statistical analysis of the text reveals patterns similar to those of natural languages. For instance, the word entropy (about 10 bits per word) is similar to that of English or Latin texts In 2013, Diego Amancio et al argued that the Voynich manuscript "is mostly compatible with natural languages and incompatible with random texts"

The linguist Jacques Guy once suggested that the Voynich manuscript text could be some little-known natural language, written in the plain with an invented alphabet. The word structure is similar to that of many language families of East and Central Asia, mainly Sino-Tibetan (Chinese, Tibetan, and Burmese), Austroasiatic (Vietnamese, Khmer, etc.) and possibly Tai (Thai, Lao, etc.). In many of these languages, the words have only one syllable; and syllables have a rather rich structure, 
including tonal patterns.This theory has some historical plausibility. While those languages generally had native scripts, these were notoriously difficult for Western visitors. This difficulty motivated the invention of several phonetic scripts, mostly with Latin letters but sometimes with invented alphabets. Although the known examples are much later than the Voynich manuscript, history records hundreds of explorers and missionaries who could have done it—even before Marco Polo's thirteenth century journey, but especially after Vasco da Gama sailed the sea route to the Orient in 1499.