An Executive Summary




The present thesis stands at the interface of several converging developments in linguistics and language teaching:



All of the above strands of research have received further impetus from the ‘corpus revolution’ which has been reshaping language science since at least the 1980s. It has enabled linguists to go beyond intuition and pen-and-paper analysis, so that their research can now bear comparison with that of hard-pure sciences such as physics and chemistry. 

This study attempted to weave together the aforementioned strands of research with a view to describing a particular type of conventionalized expression which was here termed the ‘multi-word’ or ‘second-level’ discourse marker. Second-level discourse markers are fixed expressions or restricted collocations usually composed of two or more printed words; typical examples are it is argued that, the same goes for, strictly speaking or with this in mind. Although ubiquitous in both academic and journalistic language, they have so far been paid scant attention. This is the first large-scale contrastive study of such expressions using natural language corpora.


Study Objective


The chief objective of this thesis was for a functional taxonomy of second-level discourse markers and for a contrastive analysis of their use in English, French and German academic and journalistic text. By the term ‘contrastive analysis’ was here meant the processes involved in identifying and recording multi-word units which assume identical or similar functions in actual manifestations of English, French and German language use. Put differently and in simpler terms, the type of contrastive analysis undertaken in this study aimed to set up equivalent categories of second-level markers and to describe the equivalence relations obtaining between them. Since there were found to be more than twenty categories of discourse markers each counting hundreds of members, the analysis was restricted to four major categories: exemplifiers, reformulators, inferrers and restrictors.




The present thesis locates itself within the British tradition of text analysis established by Firth. It took as its object of study actual occurrences of language rather than introspective data. These occurrences were drawn from four different types of computer-readable text archive in each language:



The academic corpora were purpose-built from Internet sources and private contributions; running into more than 30 million words, they are easily the largest and most diverse electronic archives of academic language ever created, including substantial quantities of text from various subject fields and genres. The size and diversity of these corpora lends additional authority to the statements based on them.

The academic corpora were searched using commercially available utility programs, such as Wordsmith and Microconcord. The use of such software was impossible in the case of the CD-ROM encyclopaedias. Thus the Encyclopaedia Britannica had to be searched using the Netscape Navigator, and the French and German reference works cited above also came with their own restricted search facilities.


Unlike the bulk of recent corpus-based scholarship, this study could not rely exclusively on computer-driven analysis. This is because the above-mentioned retrieval software is still limited in its ability to extract complex, variable sequences of words, with the result that it was impossible to identify all the instances, including all the permutations, of a particular marker (e.g. it is to be noted, it must be noted, it will be noted, it is notable, it is noticeable, it is worth noting, etc.). Therefore an ocular-scan based inventory was established and its content categorized using Mann and Thompson’s (1988) rhetorical structure theory and common-sense criteria; this inventory was based on an extensive list of multi-word markers compiled by the author during the six years of his studies and beyond. Next the computer corpora were tapped to check the categories thus developed against a larger amount of data. This kind of investigation then provided feedback which necessitated a rethinking of categories, additions to the inventory, and so on, in an iterative cycle.

Once a categorized, corpus-informed list had been drawn up, the investigation could proceed on a quantitative basis, enabling the analyst to assign frequencies to various tokens of markers and to approximate to citation forms of discourse markers to be used in dictionaries and teaching materials. Frequency data based on the parallel academic corpora allowed a ranking of categorized markers by frequency of use as well as a cross-language comparison of the overall frequency of some marker types.


Definition and Description of Multi-word Markers


A review of the linguistic literature showed two fields of research to be of prime importance to the definition of multi-word discourse markers: pragmatics and lexicology. Research in pragmatics has so far focussed on the functions of oral discourse markers; this has led to the erroneous assumption that discourse markers are short items carrying pragmatic meaning which are primarily found in the spoken language. The present study refuted such claims, demonstrating that the term ‘discourse marker’ can be applied to natural-language strings of varying length which carry pragmatic and/or propositional meaning and occur in both speech and writing. Just like the well-researched oral discourse markers, multi-word markers were found to serve as signalling devices which indicate the coherence relations obtaining between a particular unit of discourse and other, surrounding units and/or aspects of the communicative situation. They thus serve to facilitate the listener’s or reader’s task of comprehending the discourse.

On the lexicographic side, the focus of attention has been on issues of phraseological fixity. Recent research in the Firthian tradition argues that the boundary between compositional, or non-idiomatic, and non-compositional, or idiomatic, word sequences is more fluid than traditionally assumed. There is now overwhelming evidence on the frequency of word sequences showing that the primary division between collocations and non-collocations is a matter of more or less rather than yes or no. For example, depending on the size and content of the corpus used, a word sequence such as book + proclaim may be viewed either as a free combination or a collocation. Research has also suggested that the notion of collocation itself must be widened to include more than combinations of two words (e.g. not + wildly + original, not + forget + in a hurry). Accordingly, this study applied Howarth’s (1996) three-level classification of two-item collocations to multi-word discourse markers, which were defined as collocations of varying degrees of restrictedness or as fixed expressions.

It then turned out that the pragmatic and lexicological criteria just discussed were not sufficient to account for some items intuited to be equivalent to multi-word markers, such as English worse or French pire. To arrive at a fully satisfactory definition, it was necessary to have recourse to the additional criterion of probability of occurrence, or ‘frequency level’. It was found that typical ‘one-word’ or ‘first-level’ markers, at upwards of 150 tokens in 10 million words, occur with significantly higher frequency than typical ‘multi-word’ or ‘second-level’ markers, at between 3 to 50 tokens in 10 million words.

With this in mind, the term ‘second-level discourse marker’ was defined as follows:


‘Second-level discourse markers are medium-frequency fixed expressions or restricted collocations composed of two or more printed words acting as a single unit. Their function is to facilitate the process of interpreting coherence relation(s) between elements, sequences or text segments and/or aspects of the communicative situation.’ 


On the basis of this definition, the great variety of syntactic realizations of second-level markers were then described. In so doing, three major categories (set expressions, sentence fragments and sentence-integrated markers) were distinguished, which in turn were divided into a number of subcategories. These subcategories turned out to be somewhat different for each of the languages involved. It was then shown that interlingual equivalence cannot be established on the basis of structural similarity, so that a functional taxonomy became necessary.


A Functional Taxonomy


The rationale for a functional taxonomy was that correspondences between source and target-language markers can only be inferred from functional similarities in their contextual uses. Two points in particular militate in favour of the taxonomy established in this study: firstly, the detailed textual evidence compiled both manually and by computer allowed a higher degree of descriptive delicacy than could have been achieved in earlier work based on unaided intuition or a slim research base. Thus, while all the relations encoded by markers classified in the taxonomy could be described in terms of RST, usually such description turned out to be far more coarsely grained than the functional taxonomy here proposed. The elaboration relation, for example, was found to be encoded by announcers, topic initiators, digression markers and clarification markers. Secondly, the taxonomy derived additional authority from being multilingual, whereas previous research had tended to be monolingual, mainly focussing on British and American English.

The taxonomy comprises 22 categories of second-level markers, the names of which are meant to be self-explanatory:


·                      comparison and contrast markers

·                      concession markers

·                      exemplifiers

·                      explainers

·                      definers

·                      enumerators

·                      summarizers and concluders

·                      inferrers

·                      cause and reason markers

·                      announcers

·                      topic initiators

·                      excluders

·                      digression markers

·                      question and answer markers

·                      emphasizers

·                      informers

·                      clarification markers

·                      suggestors

·                      hypothesis and model markers

·                      restrictors

·                      referrers and attributors

·                      reformulators and resumers


A few points of interest emerged from this taxonomy. One was that the boundaries between some types of second-level marker are quite fluid, so that some items may be said to have dual or even triple category membership; this is obvious with such items as le parallèle s’arrête là or da hören die Gemeinsamkeiten auf, which function simultaneously as enumerators, contrast markers and concluders (or initiators). The second point was that some some second-level markers are functionally and semantically close to first-level markers (e.g. a complication is that and however), whereas others, such as it is often said that bear no such resemblance. A third point which emerged was the existence of lexical dependencies between second-level markers which sometimes operate over some considerable distance. The kinds of collocational pattern involved here, such as with this in mind + let us turn to or turning to + we find that, had so far gone completely unnoticed. They necessitated a radical widening of the idiom principle (Sinclair 1991) to accommodate ‘collocational combinations’ and ‘long-distance collocations’.


Marker Equivalence


Two chapters of Part I of this study were devoted to interlingual equivalence. Chapter 4 attempted to show how general features of written language impact on the setting up of interlingual equivalence between second-level markers; Chapter 5 looked at interlingual correspondences between four of the 22 categories discussed above.

It was demonstrated that syntactic differences between English, French and German second-level markers can be accounted for in terms of five general theses, four of which were first put forward by Blumenthal (1987). They concern divergences in verb valency, in the order of information in the clause, in the degree of specificity of expression, in the degree of activity inherent in the clause and in syntactic progression at clause and sentence level. Related to this is the thematic status of second-level markers as staging devices; it was found that they ease the reader’s text processing by shifting the informational focus onto subsequent elements, and by marking a change in thematic choice.

The subsequent analysis of the literature on cross-cultural difference in writing styles revealed that opinion among contrastivists is divided as to the frequency of marker use in different languages and its implications for clarity of style. In this respect a major shortcoming of previous studies was found to lie in the neglect of second-level markers.

This shortcoming was to be addressed in Chapter 5, where empirical evidence was adduced showing that French second-level markers outnumber those available in English and German. Not only were French writers found to show particular partiality for second-level markers, but they also exhibited greater stylistic variability and elegance, often replacing an object complement clause by an object noun phrase, as with on voit que NP est important -> on voit l’importance de NP.

Beyond this, the analysis yielded a rich harvest of interlingual equivalents between the four categories under discussion, most of which had so far gone unrecorded, and provided a finely grained analysis of the syntactic, semantic and pragmatic properties of second-level markers which impact on translational equivalence. It was shown that interlingual differences may sometimes be very wide, as when there is an equivalence relationship between first-level and second-level markers. One example among many is the translation of French cela dit by German allerdings. An even more striking instance was afforded by such French restaters as disons-nous, which give rise to quite distinct linguistic environments difficult to imitate in English and German.  

At other times the differences were found to be fairly small, but they nevertheless gave rise to serious translation problems. This was illustrated with SLDMs formed from nouns such as example, exemple and Beispiel, which showed extremely subtle divergences in their collocational patterns. Even where the languages under investigation offered clear similarities, straightforward equivalences were sometimes found to be barred for reasons of frequency. The zero connector is a case in point. For some types of reformulation zero usage was shown to be more frequent than the use of a marker.

From a lexicological point of view, there was ample evidence of collocational phenomena which have not yet received their fair share of attention from linguists. It was shown, for example, that adverbial markers such as voire and carrément may exhibit a statistically significant frequency of co-occurrence.

It has been noted by several authors that some first-level markers put in place either paradigmatic relationships which make two successive sentences the evocation of a whole and of one of its elements or syntagmatic relationships that make one sentence the background of another (MacNamara 1995, Blumenthal 1980). The same method of categorization was found to have applicability to the SLDMs discussed here: while exemplifiers mark a paradigmatic relationship, reformulators, inferrers and restrictors mark a syntagmatic relationship.

Another fact of general intralingual as well as interlingual importance concerns correspondences between nominal and verbal SLDMs such as the implication is that and this implies that. These equivalences were first noted by Gallagher (1992). It should be borne in mind, however, that they are far from perfect, as seen with this suggests that and the suggestion is that, where the former comes within the province of inferrers while the latter is a suggestor. A relatively significant tendency emerging from our analysis is that English and especially French build up a large number of SLDMs using lexically variable nominal patterns, where German exhibits a partiality for relatively fixed verbal structures. A clear example is provided by restrictors introducing an adverse point such as a complication is that vs. erschwerend kommt hinzu, dass. This divergence is a result of the more general differences in information structuring between the two languages discussed in Chapter 4. Such findings appear to contradict the widely held view that German is generally more nominal in style than English.

A final point to be noted is the implicit assumption in much of the literature (e.g. Grote, Lenke and Stede 1997, Fraser 1998) that a typical sentence will contain just one discourse marker cuing only one relation. The reality was found to be different from this: firstly, one-word and second-level markers may occur in one sentence; secondly, two discourse markers both from the same category (that said and however) and from different categories (c’est-à-dire and finalement) may be used together; thirdly, as already mentioned, some such co-occurrences form strong collocations: with this in mind + let us revisit, voire + franchement/carrément, c’est-à-dire + en l’occurrence, to name but a few.

Following is a more detailed overview of the results for the four types of second-level marker which were subjected to detailed analysis:




English, French and German were found to use semantically and pragmatically similar sets of exemplifiers. These have so far been perceived as free combinations, but were here shown to be fairly rigid collocations which exhibit only a small degree of variation. Generalizing across all groups, such variation appears to be somewhat higher in French and in German than in English. Close analysis of exemplificatory infinitive clauses suggested that the large degree of variation found among German items may result from a general German tendency to ad-hoc formulation which stands in marked contrast with English and French reliance on stock phrases.

Further, frequency data obtained from the parallel corpora showed that French exemplifiers occur with considerably higher frequency than English or German items. There is thus empirical support for the hitherto unfounded claim that, on average, French writers make more extensive use of connectors than their English or German counterparts. The frequency counts also illustrated a principle familiar from other areas of corpus-driven lexicography: just as the commonest meanings of words have been shown to be many times more frequent than their next commonest meanings (Sinclair 1991), so too some standard realizations of SLDMs have a far higher likelihood of occurrence than other items. Finally, some evidence was found of a correlation between length of SLDM types and frequency of occurrence, although less so in German than in English and French.

The translation problems posed by exemplifiers are usually easy to solve, with the exception of collocational gaps: a noun-verb collocation such as exemple + donner, which cannot be translated literally when it encodes a relational process, may turn out to be a pitfall even for the experienced translator. The same holds true for noun-adjective collocations such as exemple + criant.




Reformulators were divided into four subcategories: pure reformulators, gradational reformulators, repetitional reformulators and reformulatory stance markers.

Two basic modes of reformulation were distinguished, viz. the intensional and the extensional, which in turn gave rise to more subtle distinctions. A browsing of the monolingual corpora showed that the pure reformulators that is, c’est-à-dire and das heißt all have the full range of intensional and extensional modes, which suggests a great degree of translational equivalence. This finding was confirmed and further refined through an inspection of the English and French sections of the multilingual translation corpus, which showed the zero connector to be another important choice in rendering that is. This choice was particularly frequent when that is occurred in an extensional mode close in meaning to namely or when it introduced intensional definitions. In the other translation direction c’est-à-dire was commonly rendered by zero when it occurred in bracketed or dashed glosses.

The pure reformulators in other words, en d’autres termes and mit anderen Worten all occur in the intensional mode; this mode therefore poses few translation problems. However, in other words also has an extensional/quantificational mode not paralleled by its French and German equivalents. In this mode it is normally rendered by soit.

Generally speaking, there were found to be perfect equivalents between pure reformulators of the type also called, most of which have so far escaped scholarly notice: then calledalors appelé, previously known asanciennement nommé, locally calledappelé localement, etc. However, the apparent simplicity of such pairings tends to conceal many subtleties of usage. Thus, désormais appelé translates either as henceforth or as thereafter called, depending on context, and variously called finds only a partial equivalent in diversement nommé.

The combination of monolingual and multilingual approaches showed that French possesses a more subtly differentiated set of pure reformulators, with a large number of different items serving functions performed by a smaller number of English equivalents.

As for gradational reformulators, we found that not to say, pour ne pas dire and um nicht zu sagen are exact semantic and syntactic equivalents, whereas things are more complex with if not, voire, ja and similar markers. For example, if not cannot be placed in front of prepositional phrases without changing its meaning, so that or even, or indeed and if not indeed must be used as translation equivalents; voire can introduce combinations of verb and noun phrases, a feature which German can only replicate through changes in word order. We also noted interesting collocational features of gradational reformulators which complexify equivalence relations. Thus, voire was found to collocate with simplement, carrément, franchement and tout court.

Our analysis of repetitional reformulators revealed an intriguing lexical gap: French restaters of the type disons-nous were found to have no immediate equivalent in either English or German. Recapitulors of the type comme on l’a déjà noté were shown to be syntactically rather than semantically difficult to handle, as the rules governing their position are not identical in all three languages.

Reformulatory stance markers such as strictly speaking were found to be so numerous that they would deserve book-length treatment in their own right. There appeared to be certain functional assymetries across the languages under survey; thus, French markers of the type pour le dire vite have no direct equivalent in English or German. In these languages the metaphorization of approximation as lack of speaking time is uncommon; a more common metaphor is simplification, as in simply put or vergröbernd gesprochen.




Our distinction between three types of inferrers yielded the following results:


·              Most inferrers based on verba dicendi display total or partial equivalence with at least one item in the other languages (e.g. this is not to say that -> cela n’est pas pour dire que) Sometimes such equivalence may, for all practical purposes, be total while at the same time being subject to usage and frequency restrictions. The English inferrer this is not to say that, for instance, rarely collocates with an adversative first-level marker, whereas French and German markers with a similar function do so in at least fifty per cent of cases. Thus, syntagmas such as ce n’est pas dire cependant que should normally be rendered by a mere this is not to say that.

·              Suggestive inferrers provide an assessment of a situation as obvious or introduce an important fact directly inferrable from the previous discourse. Correspondences between languages are fairly easy to establish, although subject to differences in verb valency (es überrascht also nicht, dass vs. il n’est donc pas surprenant que [*il ne surprend pas que]) and adjective choice (il est donc normal que -> es ist somit einsichtig, dass).

·              The closest similarities between languages were found to exist among two-element inferrers. Two-element inferrers were so called because they consist of a verb or adjective phrase indicating inference or certainty and a noun phrase referring back to the previous discourse (e.g. it is concluded from this research that). Although French two-element inferrers display greater transformation potential than their English and German counterparts, allowing also the verbal or adjectival element to be left implicit (d’après ce qui précède), equivalence relations can be established on the basis of categorization into five types. A minor complication is that some subtypes occur with differing frequencies in the languages under investigation.




There is a bewildering variety of different types of restrictors. Five of these were subjected to close scrutiny, with the following results:

English and French restrictors with topic shift and inferential functions are closely similar in function. All the uses of cela dit, for example, closely parallel those of that is, with the exception of the mode in which cela dit is followed by a si d’opposition. Variants of cela dit, such as ce point acquis, can be translated using with this in mind or one of its nominal variants, although English and French differ somewhat with regard to the nouns occurring within this pattern. German, on the other hand, prefers adversative one-word connectors such as allerdings, dennoch or freilich or the zero connector to mark restrictive topic shifts and inferences.

In English and French most restrictors introducing an adverse point are built around an abstract head noun occurring either in prepositional phrases (with the added complication that) or as subjects of clauses (one problem is that). German, while also offering a fair number of such constructions (mit dem Unterschied, dass), appears to favour verbal constructions (erschwerend kommt hinzu, dass; einschränkend ist zu sagen, dass). There is a one-to-many relationship between German restrictors such as erschwerend kommt hinzu, dass on the one hand and their English and French equivalents (e.g. a further complication is that, to further confound the picture; pour compliquer le tout, plus gravement encore) on the other; such correspondences further vindicate one of the general translation principles posited in this thesis, whereby the clause-internal informational order needs to be redistributed in English-German and English-French translation. There are also complex phraseological constraints on the set of nouns which go to make up the nominal restrictors in question. Thus, while it is possible to say the caveat/constraint is that in English, we cannot say *la réserve est que in French, and so on.

Another translation problem is that there are wide interlingual differences in the range and distribution of nouns occurring in prepositional phrases or as subjects of clauses. Thus, for example, English and German have no direct equivalent for French avec cette parenthèse que. In such cases one of the other noun structures available in English and German usually fills the bill; avec la parenthèse que, for example, can usually be rendered by with the exception that and mit der Ausnahme, dass without any serious loss of semantic information.

Restrictors expressing a degree of uncertainty are superficially similar, albeit found with differing frequencies. The English restrictor in the current state of our knowledge, for example, which is directly equivalent to French en l’état actuel de nos connaissances and German nach dem gegenwärtigen Erkenntnisstand, was found to be notably uncommon.

Similar remarks hold true for restrictors expressing doubt. It is their very similarity across languages which may cause translation problems. Thus, the English restrictor NP + should be viewed with some reservations translates into French as NP + est à considérer avec prudence or as NP + appelle des réserves.


Marker Use in Non-native English


It is no great exaggeration to say that second-level markers have so far completely escaped the attention of teachers and language pedagogues alike. This is just one aspect of the still fashionable neglect of language form in ‘communicative’ and ‘neo-communicative’ language teaching. This neglect became evident from the critical analysis undertaken in Part II of advanced English-language writing by German academics and students as well as of a small number of published translations by professional translators. The major finding was that, in both quantitative and qualitative terms, second-level marker use by advanced German writers of English compares unfavourably with that of natives.

Quantitatively speaking, there emerged a fairly consistent pattern of over- and under-use in the texts by German natives. Frequency counts indicated that their writing is heavily skewed in favour of lexicalized first-level markers. In particular, it appeared that the wide variety of syntactically integrated markers in native academic prose (e.g. an example is provided by) is replaced by a limited number of short, non-integrated devices (e.g. for example) in non-native writing. Where German writers of English do resort to second-level markers, they tend to use the commonest of these with much greater frequency than natives and to fight shy of structures which lack a ‘direct’ equivalent in their mother tongue. Yet it was also observed that they are much less inclined to over-use sentence fragments followed by a that-clause than their French counterparts, and that there seems to be a correlation between the adequate use of SLDMs by non-natives and their coverage in dictionaries or textbooks.

Qualitatively speaking, the analysis revealed a number of recurrent error types across different categories of second-level markers. Many of these errors concerned complex points of usage, such as semantic prosody and verb valency. The reasons for such errors are not entirely clear. It is a reasonable assumption that most non-native professionals, although very advanced learners by most standards, continue to base their writing around a number of lexical and rhetorical ‘teddy bears’ (Hasselgreen 1994) manifesting themselves as preferred (i.e. overused) lexical choices and rhetorical strategies. There is a strong case for believing that such fixed points are remnants of misinformed instruction and/or teaching materials relying on tightly circumscribed sets of one-for-one correspondences between languages. By contrast, L1 writers use a broad repertoire of rhetorical strategies, and these rhetorical choices in turn determine the use of a wider range of lexis.

Evidence from published translations completed this picture. Translators were shown to succumb to source-language interference in translating second-level markers. Two reasons may account for this: firstly, translators working into their mother tongue may find it difficult to recognize source-language markers and hence to memorize them; in other words, they cannot tell a second-level marker from a free combination of words and therefore treat it in terms of general lexico-grammatical rules. Secondly, there are practically no teaching materials on second-level markers, and they are rarely discussed in foreign language and translation classrooms.


Lexicographic Remedies


An analysis of a number of unabridged dictionaries and vocabulary books revealed that both monolingual and bilingual lexicography are still a long way from giving second-level markers adequate treatment. The general neglect of these markers by lexicographers and teachers, as well as the underuse of these items by non-native speakers, made it necessary to discuss issues related to their inclusion in dictionaries. Among these issues were: factors governing the selection of second-level markers; organizing principles and entry structures; the provision of metalinguistic information; exemplification.

It was suggested that selection be based on a large inventory of the lexical class under consideration. The computer corpus can then be queried to determine the frequency of each inventoried item, and it can be decided which items to include in the dictionary by defining an arbitrary frequency threshold.

As for the positioning of second-level markers, it was concluded from a review of the phraseological literature that there should be neither consistent conflation into end-of-article nests nor arbitrary allocation to a particular sense division. Rather, a middle course should be steered between considerations of semantic relatedness, user convenience, and economy of treatment.

As regards metalinguistic information, the minimum requirement for coverage of second-level markers was found to be the provision of explicit guidance on syntactic variation, textual function and collocational patterning. Ideally, all the usual types of metalinguistic information should be supplied, with the one exception that phonetic information can usually be dispensed with. Learner dictionaries could additionally benefit from the inclusion of warnings against typical errors gleaned from corpus-based studies such as that undertaken in this study.

Examples of second-level marker use should preferably be authentic; slight editing may be permissible to remove or replace words and phrases that may cause difficulty for the non-native. Since the sometimes numerous functions of discourse markers cannot easily be illustrated within the limited compass of a paper dictionary, the ideal place for them is the electronic dictionary, which apart from being cheaper to produce and update, offers easier and faster access.

These suggestions for lexicographic treatment were then illustrated by means of sample entries for various types of dictionaries.


Second-level Markers and Composition Teaching


This study has provided further evidence for the importance of phraseology to native-like target-language performance. Indeed, the imitative use of phraseology is the foundation upon which effective processing and communication on the one hand, and creative expression on the other are grounded: the mature native’s command of fixed expressions and collocations facilitates text production through savings in processing time and the quasi-automatic provision of textual bulk, while at the same time enhancing the reader’s chances of interpreting the text; conversely, the lower density of familiar phraseology in an L2 writer’s text found in this study is likely to have an adverse effect on naturalness and on reader comprehension.

This comparatively simple realization, it was argued, has long been obscured by the inordinate weight given within communicative language teaching to ‘message focus’ rather than ‘form focus’, and this is where one of the major deficiencies of much composition teaching seems to lie. Under the process-oriented paradigm, students have been encouraged to hone strategic skills such as ‘prewriting’, ‘drafting’ and ‘revision’ without paying serious attention to textual building blocks or language tout court. The result is that, despite non-natives’ mastery of general writing strategies, which usually transfer positively to the L2, even very advanced non-native writers continue to flout lexico-grammatical and discursive norms.

It was suggested that one way of tackling this problem might be to raise student awareness of first-level and second-level markers on the one hand, and of the tight interplay between long-distance collocations formed by discourse markers and ‘standard’ rhetorical patterns on the other. A number of proposals were then made for classroom interaction and exercises.