Patrick Durusau

Patrick Durusau, patrick@durusau.net

Research

In very broad strokes, my research interests include:

Ancient Near Eastern Languages and Texts: Ancient Near Eastern languages and texts were the original impetus for my journey into markup languages. It seemed odd to me at the time (and still does) that the study of such languages and texts are still largely confined to print media. A notable exception to that observation is Steve Tinney's work at University of Pennslyvania, which has resulted in the entire corpus of Sumerian being encoded for the Sumerian Dictionary project. If similar work existed for Akkadian, Egyptian, Ugaritic and other ANE languages, grammatical rules and usage studies could be based on all the evidence, as opposed to what a single scholar can read and retain over the course of an academic career.
Bible encoding, analysis and delivery: Computers have been employed for analysis of the Bible since the 1960's but have too often resulted in proprietary data sets that are not commonly available to all scholars. The unfortunate result has been that the Hebrew Bible, for instance, has been entered, proofed and re-entered and re-proofed, more than a few times, with each project starting from ground zero. Such practices hardly represent a cumulation of scholarship at best and at worst, are simply duplication of rote effort under the guise of scholarship.

A common text upon which scholars could add their analysis, as opposed to simply duplicating the work of others, would be a starting point for cumulative scholarship. Beyond that, scholars need tools that allow them to apply their scholarly knowledge and tools without getting in the way of that task. Since most biblical scholars already work in several languages, is seems churlish to insist that they learn markup syntax or arcane computer languages simply to go about their common tasks.
Collaborative scholarship: An area where biblical and and ANE scholars have lagged behind their counterparts in the natural sciences is in collaborative research. It is true that small groups of scholars may work together on projects, but routine multi-institution approaches taken as a matter of course in other disciplines are sorely missing.

It is also unfortunate that biblical scholars have made few efforts to enlist the aid of those who are not employed as biblical scholars. Colleges and seminaries graduate far more people trained in Hebrew and Greek than there are positions for employment, resulting in a vast pool of talent that is being ignored by the traditional biblical studies community.

The WWW has the potential to tap into the talent pool of non-traditional biblical scholars, whose only lack is professional employment as biblical scholars. There are any number of collaboration environments that can be adapted to utilize that pool of talent.
Digitization (both imaging and encoding) of primary/secondary materials: Part of my interest in digitization (both senses) stems from my early interest in ANE studies. I was located over 200 miles from the nearest research library and specialized materials could be obtained only with difficulty. That certainly remains the case for scholars at second tier (an arrogant designation that is disconnected with quality of teaching or scholarship) and lower institutions in the United States and even more the case in developing countries. To say nothing of talented individuals who are not employed as scholars.

The technologies exist now or are easily adaptable to make access to primary and secondary materials a matter of choice and not physical location. Granted that substantial resources would be required to make everything available, but the natural sciences have done quite well with, at least initially, a minimum of resources.
Fonts for biblical and Ancient Near Eastern studies: Rendering of encoded resources for biblical and Ancient Near Eastern studies has long been problematic. Even with the advent of Unicode, assuming successful proposals for Akkadian and Egyptian, there remains the problem of encoding and displaying the texts as written.

That is in part due to the absence of the character/glyph distinction that works relatively well in post-Gutenberg typography, but not in all cases. The further prior to Gutenberg the composition of the text, the more likely the distinction is to be pernicious. Still, some form of universal interchange is necessary and the weight of full representation of ANE languages will fall back onto markup systems.

Very complex representations, such as Egyptian tomb inscriptions, will no doubt require a combination of Unicode, markup and interchangeable representation formats such as Scaleable Vector Graphics (SVG) for rendering of texts in their typical textbook rendition as well as as written.
Markup Languages: While I originally became interested in markup languages while pursuing studies in ANE languages, I have come to appreciate them as languages in their own right. It is interesting to note that at a certain level of abstraction, that the lessons from linguistics, formal languages (and automata), markup parsers and similar disciplines all begin to coalesce, despite surface differences.
Overlapping markup: The problem of texts not following the rather simplistic content models offered by most markup languages or supported by parsers for those that do offer support for more complex content models has been my concern for the past decade or so. There have been any number of solutions offered, several by Matthew O'Donnell and myself, but none has ever quite produced an elegant, and widely accepted, solution to the problem.

This is one of the major unsolved problems for the use of markup in academic work since representation of a text as it is seen by the researcher is obviously of more benefit than to flatten it to conform to arbitrary limitations of good enough markup languages. (There are a number of solutions to the overlapping markup problem but suffice it to say that none has attracted a significant following. All involve trade-offs of one sort or another.)

If a commercial motivation is necessary, it should be noted that "solving" the problem of overlapping markup will have an enormous impact on legal, governmental, publishing and other areas. Any activity that has changes to its texts, will benefit in terms of access, cost of production and management of changes from a solution to this problem.
Topic Maps: I became involved in topic maps during the formation of TopicMaps.Org, an organization that produced an XML version of ISO 13250. While XTM is certainly one way to produce topic maps for the WWW, it certainly does not represent the warp and woof of the topic maps paradigm. It does have the conveniences of a semantically interoperable syntax and a simple notion of subject identity, while may be sufficient for many tasks, but not all.

At the core of the topic maps paradigm is a notion of semantic integration. That is to say that if you tell me the basis upon which you have identified subjects and I know the same information about my subjects, I can meaningfully integrate information about subjects that you and I have identified independly of each other. And, if I have taken the trouble to perform that tasks with my subjects, you can, without consulting me, achieve the same end. The ultimate result is that either of us can have all the information about a particular subject, however either of us identified it, in a single location.

It is important to note that semantic interoperability, that is the seamless interchange of information, relies upon having a common language, like XML or RDF, for the exchange of information. If we all mean the same thing by car, such a system works fairly well. Where such systems start to fall apart, is when the terms are not easy to define or agree upon, such as democracy or freedom, or even marriage. (It has been reported to me that EU automobile manufacturers cannot even agree on general classes of parts, something that is rather critical for inventory, ordering and other information systems.)

Enabling semantic integration, the goal of topic maps, is much more modest. All topic maps request is that the user of a term say how they identify the subject that is represented by that term. That does not guarantee that another user will find it possible to integrate a particular term with the one they use, but it does enable such a process to take place. Without that information, semantic integration is by definition impossible.