the fantasies & fallacies of universal (digital) knowledge

THE FANTASIES & FALLACIES OF UNIVERSAL (DIGITAL) KNOWLEDGE

In 1961, the staff at the Bureau of the Census had access to computers for the first time. In order to simplify some of the data analysis that the Census Bureau must conduct, they used new computers to create the “micro-aggregation files” that contain statistical information. This information had been entered on punch cards in earlier censuses, but magnetic tape was the storage medium of choice in the ’60s. The Bureau of the Census had the required data stored on 9,121 reels of magnetic tape: 7,297 reels “readable” with UNIVAC II-A tape drives; 1,678 tapes “readable” with UNIVAC III-A tape drives, and another 146 magnetic tapes created on still other brands of tape drives and “readable” with the new contemporary industry-compatible tape drives. The reports needed were generated and printed on paper. Once the reports were completed, the tapes were placed in storage.[i]

When the United States Census Bureau spent countless hours recording statistical information onto UNIVAC magnetic tape in 1961, it was unaware that the information would nearly be lost fifteen years later. It was not until this potential disaster, around 1976, that the Census Bureau realized just how precious their data was.

Prior to the use of magnetic tape read by the first generations of computers, The Census Bureau manually recorded data into punch cards. These punch cards were then used to generate reports which, in this particular case, could be described as “mostly demographic in nature; they described the ethnic make–up of the U.S. population…American migration patterns…”[i]

The new data recording technology used by the US Census Bureau saved time, money, and space. Such a technological advance promised to completely alter the way in which the Bureau operated. The recorded data which previously took up countless boxes within multiple storage rooms could now be saved onto a few compact sleek black reels. The Census Bureau must have been enthused by the new method of data storage, and even more so, because this was UNIVAC’s second iteration of the magnetic tape, making it a state-of-the- art.[ii] Fourteen years after the reels were archived into “permanent storage,” a consultation with the National Archives led to a review of the micro-aggregation files which the Census Bureau had saved from the 1960 Census.[i] Upon this review, they realized the magnitude of the dilemma. Seven of the low-level micro aggregations were deemed to have “long-term” value and plans were made to transfer the data from the magnetic tapes to a contemporary method of data storage. Unfortunately, like many other forms of contemporary data storage, the fifteen year-old devices used to read the magnetic tape data became quickly obsolete, resulted in a “major engineering challenge” to transfer the data.

If the first risk was the extinction of reading devices, the second risk which threatened the magnetic reels was related to the environment in which they were “permanently stored.” This was a factor not realized until the transferring process had already begun and, because this new media storage technology was unfamiliar to the Census Bureau, its staff had no knowledge of the ideal storage conditions (primarily temperature and humidity) which were best suited for preserving the tape. This innocent negligence on the Bureau’s part was impossible to avoid as there had never been reason to question the durability of media used to store data. The punch cards used in the past were simply packaged and placed in storage with little thought to how the paper might decay or deteriorate over extended periods of time. There was a proven durability of the material to which no one ever gave a second thought and, because there had only ever been one method of recording data in the Census Bureau’s history, there was no expectation of an environmental storage protocol to be administered for the new magnetic tape. They were to find out later that, despite the Census Bureau’s efforts, 0.7% of the data was unrecovered due to loss by deterioration or human error.[i]

On the surface, data storage may appear to be a lifeless and mechanical subject, and one which only reclusive technocrats are concerned with. However, through the following discourse we will examine the fallacies, fantasies (both past and present) and consequences surrounding contemporary forms of data storage. As a result we will be able to peel back the curtain which has blinded us to the truths about digital data storage and see how this seemingly abstract substance reveals itself in our daily lives.


THE FANTASY OF UNIVERSAL KNOWLEDGE

There is, in our contemporary lives, a growing fantasy in which all the world’s knowledge is contained within a single, uniform platform with complete accessibility. This notion of infinite and completely accessible knowledge dates back much further than the birth of the computer or that of the internet. The Enlightenment’s architect, Étienne-Louis Boullée proposed several of these ideas in his Deuxieme projet pour la Bibliothèque du Roi in 1785. During the Enlightenment the perceived abundance of new knowledge, manifested above all in the encyclopaedic efforts of Diderot and d’Alembert, inspired Boullée to propose a state-run library making all of its contents open to the public. The notion of the infinity within Boullée’s proposal is based upon the assumption that it continues beyond the framework of the image: “These seemingly endless bookcases were open and easily browsable, in dramatic contrast to the earlier medieval system of chaining that bound both books, and readers, to a specific location.  Visitors [were] free to wander about and converse in small groups, but there is no provision of study desks or chairs for extensive research in this idealized environment.”[iii] In Boullée’s proposal a massive skylight is placed where the two vaults would, in realty, meet. The skylight runs the entire length of the library to provide natural daylight throughout the space dedicated to the public. This wondrous structural suggestion is similarly fantastical to Boullée’s concept of the infinite bureau, in which any published work is attainable by the public.

Taking Boullée’s ideas one step further, the Argentinian short story writer, Jorge Luis Borges, proposed an even more outlandish idea within his 1939 essay “The Library of Babel.” In his scheme, Borges conceived a hexagonal room which housed exactly 410 books with 3200 characters per page.[iv] These rooms would be linked together with two sides of the hexagon serving as entry and exit points. Thus, Borges developed a unique spatial template which could be continuously expanded upon until every possible combination of character arrangement could be written. A writer from Brooklyn, Jonathan Basile describes the Library as one that would contain “every book that ever has been written, and every book that ever could be – including every play, every song, every scientific paper, every legal decision, every constitution, every piece of scripture, and so on.”[v] This notion of rearrangement and reorganization of characters, words, and numbers is one which we are much more familiar with. This fantasy is continually propelled forward by breakthroughs such as the internet, and cloud storage which suggest that, one day, we may all have the digital version of Borges’ Library sitting in our home office.

Basile certainly believed this. Almost 80 years after Borges’ provocative paper, Basile has attempted to recreate the fantasy and believes that there must be a digital equivalent to the Total Library. His investigations have revealed the magnitude of the library’s seemingly endlessness as Basile states it would take “about 104668 years to go through the library” assuming one clicked through each book at a rate of one per second.[v]

Despite its absurdity, we continue to believe that one day we will be infinitely knowledgeable through a yet-to-be-developed technological means. Today, this theory is often correlated to the process of digitalizing analog text within books to convert all the knowledge of the past into a “contemporary” storage medium.  This digitalization process, as Wade Roushe suggests in his article “The Infinite Library”, is no small task and one that requires many resources, but holds the potential for widely accessible material available to all.[vi] Since publishing his article over ten years ago, we have seen this digitalization manifest itself within online readers, such as Google Books. Similarly, it is becoming more common to see recently published periodicals or journal articles infiltrate the contemporary library in a format such as “.pdf” without ever being printed. This then allows one to log into a library catalogue and search for text not just bounded by the title, author, publisher, etc., but also directly into the article itself. Both Google Books and the contemporary library catalogue are methods of standardizing data to increase its ease of ‘searchability’ or, what Foucault referred to as, “universal addressability.”[vii]

THE CONTRADICTION WITHIN UNIVERSAL (DIGITAL) KNOWLEDGE

Despite the fantasies and the technological advances which appear to support the search for infinite universal knowledge, as the case of the 1961 Census Bureau incident demonstrates, the notion is limited by several material frictions. The first friction, as mentioned above, can be termed “device extinction”. This rendering obsolete of systems which are used to create and read forms of digital data. In the case of the Census Bureau incident, it was the tape reader which became obsolete due to technological advances, leaving no way to read or extract the data encoded onto the magnetic tape. This is a loss many of us have experienced with mainstream methods of data storage that have disappeared such as the floppy disk, zip disk, tape cassette, video cassette, etc. If you were unfortunate enough to have left your old data storage devices to gather dust in the attic and then remember them several years later, you can only hope that someone still owns an archaic device which can play your data back- otherwise you are left to track down expensive data recovery specialists, or worse yet, lose your data altogether. Interestingly enough, the rendition of making devices obsolete happens to be integral to the economic logic within our current digital culture. That is to say, devices are purposely programmed to become obsolete so that hardware manufacturers can sell new, ‘upgraded’ models the following year.

The second friction in the way of the fantasy of universal digital data storage is temporality. The reference to temporal or impermanent data can be found within the act of transferring data from one storage device to another. During this process there is a small time frame where data is deleted from the second source and then overwritten from the first. Within this time frame data is in movement and, essentially, non-existent on the second source. Today, this process can be seen within cloud or online storage platforms such as Dropbox or Basecamp. Wolfgang Ernst describes this transition thoughtfully: “nowadays the static residential archive as permanent storage is being replaced by dynamic temporal storage, the time-based archive as a topological place of permanent data transfer.”[viii] These online storage devices continue to grow rapidly as the ease of use (and lack of cables) is quite appealing to the average household user. Simply drag and drop, and your files are safe. Somewhere.

The third friction, and perhaps the most tangible, is that of decay. Universal digital knowledge is most limited by its own material foundations which become susceptible to deterioration over lengths of time. Not even the glorious digital age can escape its own material infrastructures- perhaps its biggest restriction of all. As we have already seen, the data which had not already been lost under the Census Bureau staff fell victim to decay over the course of about 15 years. If we go back a little further, we can find the beginning of this idea from a well-known philosophical mathematician who, over 50 years ago, stated, “Every digital device is really an analogical device.”[ix] The profundity within this statement cannot be expressed enough. In brief, all digital archiving is constructed from analogical components and, thus, every digital storage devices is comprised of many small parts, all with tangible, physical, material properties. Unfortunately, all materials eventually deteriorate over time and some much quicker than others, especially when stored in spaces with extreme temperature and humidity.

Furthermore, decay is applicable not only within the internal (static) components of storage devices, but also occurs during the process of data transfer. This version of decay, one similarly related to temporality, occurs when friction is introduced between moving parts. If you’ve ever worn through a pair of running shoes, then you are already familiar with this concept. Every time your shoes’ rubber treads meet with the pavement, small fragments are left embedded in the ground. Over the course of a few months, or a few days if you happen to be marathon runner, your shoe treads could be completely deteriorated. The same principal applies when transferring data between storage devices: a cable is taken from Device A, inserted into Device B, and data moves through conductors (usually made of metal) from Device A into Device B and is the data is then stored onto a magnetic plate (magnets still hold their charge even after the power is turned off.) Most of us are unaware that every time this process occurs fragments of metal are worn away due to the heat caused by friction from electricity flowing through the wires. With enough wear, these wires will eventually deteriorate and your precious data would be trapped inside a magnetic plate with no way to escape. Once again, your next move is call a data recovery specialist and pray they have the tools and knowledge to extract your life’s work from the slab of metal.  

In summary, all of the frictions standing in the way of universal digital knowledge are, fundamentally, material. No matter how weightless the internet or online data storage may appear on the surface (of your computer screen) there is, in fact, a material substance associated with every piece of information visible through these means. What we are unable to comprehend is the back-of-house requirements or, in other words, the material foundations which are required to support our digital devices. Moreover, these ‘invisible’ infrastructures manifest themselves in various forms such as the cosmic communications satellites which power our cellphones, and trans-continental submarine cables which connect our internet.[x] To reiterate Wiener’s point, all digital devices are made from analogical, material components. It is the properties of these material components which have very real, tangible limits. Infinite digital knowledge cannot be made possible, without an infinite material supply to support it. Your handheld device is merely a platform for displaying the information which is housed in factory-sized magnetic storage cabinets located in a bunker somewhere between California and Mumbai. Every time you drop a file into Dropbox, it does not float in space but, rather, is housed in that bunker far away and out of sight. To clarify, I am not suggesting we vilify the desire for universal knowledge, but that we cannot forget the material realities in which our world is constructed from. As we have seen, these material realities manifest themselves in various frictions which restrict infinite mobility.

THE FATAL CONSEQUENCES OF ‘LOST’ DATA

During the latter stages of the Vietnam War, the United States began recording air combat data which were termed The Combat Air Activities File (CACTA.) These logs were descriptive in nature, and included various information about each mission such as the “name and date; function and location of the mission; type, number, and identification of aircraft; results of the mission, including loss and damage data about aircraft and crew; and free-text comments.”[i] In addition, the records described all the bomb type and geographical location of where each was dropped. This last piece of information was critical because over 30% of the 2 million tons of explosives dropped were not detonated upon impact. One would expect that the undetonated bombs were recovered using the CACTA information. Unfortunately, this was not the case as it was simple data loss which prevented the bomb recovery. The details of the situation are as follows:

The initial effort was unsuccessful because the geographic coordinates were flawed. The reason for these anomalies is that the data initially were created and used in a report generator system called the National Information Processing System (otherwise known as NIPS). NIPS was developed for the Department of Defense. By the time the records came to the National Archives they were legacy records and the staff on the Machine Readable Archives Division began a process of “deNIPSing” the records that is moving them out of NIPS by reformatting them to a flat-file non-proprietary format in standard EBCDIC. The “de-NIPSed” files are no longer dependent on the NIPS software with which they were created. Instead, as flat files, users can process and manipulate them using widely-available software applications.[i]

It was a simple discrepancy in combination with human error which then caused faulty bomb coordinates:

For 25 years it was believed that the “deNIPSed” files were trustworthy reformatted records. However, the data anomalies found in the Combat Activities File raised a question. It now appears that at the time the “deNIPSing” occurred, the documentation accompanying the data file either was incomplete or perhaps even missing, because the geographic coordinates, which were encoded in binary (binary angular measurement, a form of “packing data”) in order to conserve space, were incorrectly treated as 7-bit ASCII in each data field. Consequently, all of the geographic coordinates were wrong.[i]

The biggest concern which the CACTA files raise, is that of blindness in the face of digital conversion and transfer. One could argue that the scale in which information is transferred significantly prohibits us from understanding the process. Thus, within many institutions of power where data is in continual movement between devices, one small error could have fatal consequences. Lastly, the other argument to be made revolves around the nature of mistakes. Just as humans make mistakes so can machines as we are both constructed from and limited by our material composition; we have seen how material inaccuracies can cost us important data and, at its most extreme, cost lives.

 

NOTES

  1. Ruggiero, Alessandra, ed. n.d. "Preservation of digital memories: risks and emergencies." Accessed November 07, 2015. http://www.iccu.sbn.it/opencms/export/sites/iccu/documenti/emergenze.pdf.

  2. U.S. Census Bureau. 2015. Univac I. September 24. Accessed November 07, 2015. https://www.census.gov/history/www/innovations/technology/univac_i.html.

  3. The University of Chicago Library. 2011. The Enlightenment and grand library design. April 26. Accessed November 07, 2015. http://news.lib.uchicago.edu/blog/2011/04/26/the-enlightenment-and-grand-library-design/.

  4. Borges, Jorge Luis. 1998. "The Library of Babel." In Collected Fictions. New York: Penguin.

  5. Flood, Alison. 2015 . Virtual Library of Babel makes Borges's infinite store of books a reality – almos. May 04. Accessed December 18, 2015. http://www.theguardian.com/books/2015/may/04/virtual-library-of-babel-makes-borgess-infinite-store-of-books-a-reality-almost.

  6. Roushe, Wade. 2005. The Infinite Library. May 01. Accessed December 15, 2015. http://www.technologyreview.com/featuredstory/404002/the-infinite-library/.

  7. Fuggle, Sophie. 2015. Foucault and the History of our Present. Edited by Sophie Fuggie, Yari Lanci and Martina Tazzioli. New York: Palgrave Macmillan.

  8. Ernst, Wolfgang. 2010. "Archival Times. Tempor(e)alities of Media Theoy." Oslo: Not Published. Accessed November 05, 2015.

  9. Pias, Claus. 2003. "Transactions/Protokolle, vol. 1." Cybernetic / Kybernetik: The Macy Conferences 1946-1953. Zurich: Diaphanes. 158.

  10. Miller, Greg. 2015. Undersea Internet Cables Are Surprisingly Vulnerable. October 29. Accessed December 20, 2015. http://www.wired.com/2015/10/undersea-cable-maps/.