Actas / Atas
1988-2002
Presentación / Apresentação
I Simposio (1988)

II Simpósio (1990)

    Índice
III Simposio (1992)
IV Simposio (1994)
V Simposio (1996)
VI Simposio (1998)
VII Simpósio (2000)
VIII Simpósio (2002)
Índice por autores

 

 

From MC4 to MC5: an evolution in term bank management

Jean-Michel Henning
Université de Clermont-Ferrand
França

 

Resumo

MC4 is the fourth version of a software designed by Terminformatique for the management of terminological data banks. A new version: MC5 is in preparation. It wiil take into account the changes which have occurred in terminology practice since MC4 was designed although the maio principies on which it was based remain unchanged. We shall describe the software MC5, currently in use at Terminformatique and several Terminology centers in Europe, and point at several improvements envisaged for the new software and made necessary by a general evolution and progress in terminology work, equipment facilities and new requirements.

MC4 is the fourth version of a software designed by Terminformatique for the management of terminological data banks. The first version, written in FORTRAN 77 was operational as early as 1982, while MC4, written in DBASE 3 and CLEPPER, was available in 1987. This last version has thus the benefit of seven years of practice in terminological data management. The new version in preparation, MC5, will take into account the changes which have occurred in terminology practice since MC4 was designed, several years ago, although the main principies on which it was based remain practically unchanged in the new package. We shall thus describe the software MC4, currently in use at Terminformatique and in many other terminology organizations, and point at several improvements envisaged for the new software and made necessary by a general evolution and progress in terminology work, equipment facilities, new requirements, etc. MC5, like MC4, was designed in Clermont-Ferrand (France), by a team of University teachers and researchers composed of linguistics and computer science specialists respec-tively interested in terminology and data base stracturing and management.The software MC4: MC4 was to be consistem with the theory of terminology and offer ali facilities likely to be required by potential users. It was to be easily portable and run on ali PC-compatible microcomputers with a minimum RAM of 512 kb. On the other hand, it was to be powerful enough to handle a large term bank of several hundred thousand terms, with interfaces to other banks, exchange facilities between banks, dictionary printouts, etc. In fact, many users of the software, who are now using it to manage important term banks, were pleased to receive the package by post, install it in a few minutes and run it on their old XT. Changes in terminology practices: PC-XTs have been replaced by ATs and are now being replaced by PSs with large disk capacity allowing terminologists to handle important collections of terms with the same MC4 software. However, terminology practices are obviously changing as the nc?ds for translation and terminology work are steadily increasing. Large and middle-sized firms, government organizations, ministries, etc. employ teams of translators using term banks and familiar with word processors. In many cases, separate teams of terminologists provide them with the information they need, are familiar with micro and mini computers and manage and maintain large terminology banks accessible from a number of working stations organized in a network. The equipment used is more sophisticated, more powerful and still very diverse and the requirements for a software adapted to such organizations are of course more exacting. One of those requirements is, for instance, the necessity for a tenninology bank management software to cope with non-Latin characters. That necessity appeared in Europe when Greece joined the European Community and became more and more obvious as communication and exchanges increased with such countries as USSR, North African, Middle East countries, etc. And finally, due to the growing experience of terminologists and an increasing diversity in tenninology practices, new requirements have appeared through articles, comments, meetings and remarks from some users of MC4. Some regretted that the software did not allow them to define fields for their personal use in addition to those available in MC4. Others expressed the need of a network between terms in addition to the conceptual network. Others again (translators using word processors) wished to have facilities to access their bank without leaving the word processing program. The sum of ali the requirements stated above (and others which have been left aside) is enough to discour age any ordinary programmer, but it has been shown that the needs were really there, and on the other hand, progress in equipment and software facilities can certainly help in designing specialized softwares more adapted to the needs expressed by specialists. The software MC5: MC5 (to be operational in 1991) will be written in TURBO-C and will thus be widely transportable, particularly on UNIX-operated systems. It will accept ali left-to-right-written characters. This, however will necessitate an EGA or VGA card and a minimuni RAM of 640 kb. But we know that this equipment is now available at reasonable prices (another consequence of progress hi equipment) and that it is, or will become, quite common among tenninology or translation organizations. The program will have more flexibility than MC4. The user will be able to define a certain number of parameters adapted to bis personal needs (including new fields, non-Latin characters, word processor, network, etc.) and the TURBO-C language will allow further modifications and updatings in the program which were impossible with DBASE 3. The users will be given the possibility to organize terms in networks, in one language or from one language to another, in the same way as conceptual networks can be organized in MC4 (explained in the next paragraphs). This was suggested by translators interested in preparing lists of equivalents adapted to particular needs (particular clients for instance, or specific vocabulary used in the firm, etc.). The term bank will be accessible directly from most word processors. Those, and other minor changes, constitute improvements brought to MC4 with a view to make it more powerful, flexible and adapted to a modera context. The principies on which MC4 was based will, however, remain unchanged, namely: classification of human activities into subject-fields, definition of concepts as units of knowledge, organization of concepts in conceptual networks within each subject-field. The following explanations are thus valid for both MC4 and MC5. MC4/MC5 common structure Terminological principies

 

1. Subject-field classification

The classification of terminologies into subject-fields and .subfields means that the users of a term bankmanaged by MC4/MC5 should be able to define and organize a network of fields and subfields and select one of them when beginning a working session, just like a translator who selects a specialized dictionary on a shelf and lays it on his desk. The bank, in this way, can well be compared to a number of dictionaries, glossaries, etc., among which one can choose one or several volumes. The difference is that the terminologists using MC4/MC5 are given the possibility to define their own classification system or to choose among the different existing systems as no agreement has been reached yet on the use of one international system. This first aspect of the structure of the bank is tinis very rough and very simple but it does gjve the bank the sound foundations that are necessary to proceed with more sophis-ticated structuring. It also has several consequences on the facilities offered by the system (term retrieval, printouts, etc.), of the facilities for exchanges of data, and even on the theoretical rales applied in the bank. As a consequence of this classification, homographs, for example, cannot be considered as ambiguous if they belong to different subject-fields.

 

2. Relational system of concepts within a subject-field

The stracture of a relational data base fits exactly that of a system of concepts within a subject-field since both define entities, respectively called "records" and "concepts" to which a number of fields can be attached.

Concepts in MC4/MC5 are thus represented by numbers which are as many terminological records. At this stage. they are abstract entities, with no linguistic supports (fig. 1).

Record nbTerm
Definition
...
1...
.................
...

In figure 2, concept number 1 does have linguistic supports wich are a term: "tree", and a definition: "A type of tall plant..."

Record nbTerm
Definition
...
1TREE
A type od tall plant...
...

The structure, of course, is not as simple as it appears in fugure 1 as the linguistic supports of a concept have their own structure (fig. 3), but all fields are related to the same record (concept) and can be accessed from a record number.

Ou the other hand, eacfa record (ooncept) can bc aocessed from one of its fields, so that MARBREM, being linked by a pointer to concept nb. l, leads to "TREE" and/or different other fields attached to the record. With proper instructions in the program, we can obtain: The French equivalem of TREE is ARBRE" and further information like: "ARBRE is a masculine noun, ... The French definition of ARBRE is ..., etc. Once again, this is a very simplified representation of the structure of MC4/MC5 and a good deal of progranuning is needed to access and display information in a reasonable time, or deal with homographs, synonyms, etc.

 

3. Conceptual networks

The theory of terminology insists on the necessity to organize concepts in conceptual networks within each subject-field. This, however, is optional in MC4/MC5, the reason being that ali term-bank users are not ready to undertake the very difficult and time-consuming work of organizing concepts in logical or ontological networks. The user can thus waive this f acility, and in this case, the records (concepts) contained in the bank will be considered as a set of individual entities, with no relationships between them, each of them being accessed directly and separately. Li the case of a conceptual network, the relationships defined by the user of the bank can be hierarchical or not. Hierarchical relationships are used to create tree-structures with hierarchical depen-dencies between concepts like, for example, the logical two-ways relationship "is a type of...", "the types are..." as shown in figure 4:

Non-herarchical relationships are used to created limited networks of concepts linked to one another by pointers instead of a tree structure. They can be of all types that can be useful to the user of the bank and produce networks like in figure 5:

The organization, in a data base, of concepts in tree-structures or networks is of course of a great interest for users interested in data-retrieval procedures as such questions as: "What is X T will prompt the answer: "X is a type of Y" or "What are the parts of X (an engine for instance)?" which will prompt: "The parts of X are A, B, C, D, ...". Its interest is certainly even greater when considered as a means to elaborate some form, even primitive, of knowledge representation that could be used by expert systems. A non-hierarchical network between terms, envisaged in MC5, will also certainly help in the elaboration of knowledge representation. In a terminological data base, such or-ganization between terms, managed by object-oriented programs, can be very useful to handle particular terminological difficulties, like in figure 6:

Particular and general needs It would take a long time indeed to explain in details how information is accessed in MC4 or MC5 (some improvements in MC5 are not significant enough for a separate explanation), how terms, definitions, notes, etc. can be amended, parameters modified, new data entered, security and consistency of data ensured, describe the characteristics and contents of each field, etc. The general structure of a modern term bank is known by ali tenninologists, and national and International organizations have proposed Standard terminological records which, when compared to one another, do not appear significantiy different. A good number of softwares are available today which offer more or less the same minimiim facilities. They serve different purposes or are adapted to different types of equipment, volumes of data, work specializations, number of users, etc. The software MC5 can be considered as an effort to propose a package which could cover as large a range of needs as possible being more powerful than its predecessor MC4. However, two points have to be considered. MC5, as we have seen, was not only meant to meet a large nurnber of requirements but also to be easily adapted to new equipment, new habits, new terminological activities, whether they already exist or are still to come. On the other hand, the two softwares should be considered as serving two different purposes, for it is not true that the most sophisticated, most powerful and up-to-date software should be the best adapted to everyone's needs. Some tenninologists use word processors perfectly adapted to the simple jobs they are doing while others require the services of very complex programs running on huge computers. Within those two extreme limits, MC4 will certainly, and still for many years, serve the needs of a majority of terminology or translation centers.

Both softwares were designed to meet what we considered as essential requirements: comply with the essential principies of theoretical terminology, be simple to operate, user-friendly, portable, and offer ali facilities for exchanges of data and cooperation between users which, we feel, should be given a priority in terminological activities.

 

Editado con el apoyo de
Editado com o apoio da: