Thursday, 28 June 2007

John Sinclair 1933-2007

I first met John under the best of circumstances. It was at the idyllic surroundings of his Tuscan Word Centre, and I had four days of exposure to his ideas, his barbed criticisms of my work, and his unrivalled hospitality. I could also add that it was my first visit to Italy, and that I'd also just fallen in love, and my partner, a close colleague of John's at the time, had ensured a place for me on the course, all of which added to the glamour and excitement of the trip.

But I have to say that on first acquaintance I found him somewhat arrogant, as well as deliberately and pointlessly controversial, and I didn't agree with him on some key points about the design and analysis of corpora. Not to mention the fact that he appeared to fall asleep every time I starting speaking. Having said that, I had a wonderful and stimulating time in Tuscany, and met several people whom I still value as colleagues and friends.

In subsequent years, I was invited back several times to teach on TWC courses, meeting many more wonderful people, hearing the same talks from John and Elena, and also reflecting on his ideas in my work, and slowly realising their value. I will treasure for ever the opportunity that he gave to me to take part in his hugely influential work at the TWC in training a new generation of scholars in his ideas. Furthermore, I had the opportunity to work with him for a short time on the TELRI project, through which I saw the enormous impact he had on corpus and computational linguistics in Central and Eastern Europe, and the huge esteem in which he was held in many countries.

Which is not to say that I came to agree with John about everything. I will now forever regret not having completed in John's lifetime and received his response to an unfinished paper under the working title of 'There is no degree zero of text encoding' in response to his appeals to eradicate markup and maintain the integrity of electronic text.

In 2001 I applied for a job at the Oxford Text Archive, but was unable to attend the interview because I was teaching at the TWC. Oxford kindly allowed me to have a telephone interview, which I conducted from John's office. I was fortunate enough to get the job.

When I came to work in Oxford, I soon received a query from Birmingham about whether I could track down in the archive a copy of John's 1963 spoken corpus which had been collected and analysed for the OSTI report. Birmingham University Press wanted the corpus, or a sample of it, to print in the republication of the OSTI report. Unfortunately, the copy of the corpus which we held in the OTA appeared to be incomplete. As far as we could work out, what is likely to have happened is that a deal would have been made with John in which he would deposit a copy of this corpus in the archive, and in return the OTA would scan some documents for him for his next corpus-building project. OTA texts were normally collected in this way by some sort of barter process in those days. But John liked to drive a hard bargain. It appears that he must have only given us half the corpus, without anyone at the OTA realising at the time. Unfortunately, this subterfuge rebounded on him 20 years later when he found that he no longer had a copy and wanted it back. If nothing else, this now stands as an excellent cautionary tale in the field of digital preservation. (The half-corpus is freely available now from the Oxford Text Archive, catalogued as the Lexis corpus, catalogue number 0163).

I was thrilled to have the opportunity to invite John to talk at the first event I organised in Oxford, a seminar on corpus building, and to have him contribute a key chapter to a book which I edited, Developing Linguistic Corpora: a Guide to Good Practice. John not only completed his chapter ahead of the deadline, he contributed exactly the type of chapter I wanted, and was also on hand to chivvy me periodically about the delays in publication. This chapter will stand, I hope, as a summary and testament to his views on corpus design (available free online at

Whenever my email client told me I had mail from John, I opened it with trepidation, wondering what inactivity or backsliding I was to be taken to task for this time. He was a spiky character, although I found his views always stimulating and his criticism constructive. His uncompromising attitudes and strong work ethic were a constant inspiration.

After John attended a PALA conference in Istanbul in 2003 as a keynote speaker, he confided in me that he was horrified that, to his mind, no-one was following an authentically empirical and evidence-based approach to stylistics, and typically he had no hesitation in pointing out that I was as guilty as anyone. But his criticism led to the proposal of a practical and constructive organisational solution, namely a project to invite PALA members to write a short analysis of a poem, in which every interpretative assertion is backed up by clear linguistic evidence - a project which, to my shame, I was not sufficiently well-organised to have put into operation in time for him to be able take part.

After struggling for some years to find a text which adequately presented his theories and methods, I was extremely pleased to see the appearance of Trust the Text in 2004. John's death is a great loss to scholarship, but this work, along with the republication of English Collocation Studies, stand as great testaments to his work. We can also look forward to the posthumous release of several papers currently in press from the highly productive final period of his life.

Yet another recent successful initiative of John's was to get the agreement of the Scottish ministry of education to make corpus resources available to every child in Scottish schools. A corpus and an analysis tool have been developed for this project. When he recently unveiled the tool, I was astonished to see that rather than simply presenting the data to the student for them to conduct their own data-driven analysis and learning, the tool will pre-process the concordance lines, find typical examples and present only these to the user. After years of training from John and Elena to read concordances to find patterns in the co-text, I was horrified. The user won't have the opportunity to see for themselves repeated patterns of usage in the corpus. I think this is a big mistake. But I have learned to expect that, once more, I will be wrong and he will be proved right, and that this project will start another revolution the teaching of English.

