|
|
|
|
|
PLAN FOR THE ESTABLISHMENT OF ISO/TC 37/SC4 “Language Resource Management”
 

  Key-Sun Choi

Korea Advanced Institute of Science and Technology

1 BUSINESS ENVIRONMENT OF THE ISO/TC37/SC4

  1.1 Description of the Business Environment.

Language resources consist of content represented by linguistic data and their format for all aspects of human language (e.g., speech data, written (full) text corpora, general language lexical corpora). Text corpus, lexicon, grammar, and terminology are typical types of language resources to be used for language and knowledge engineering. Wherever and whenever information and knowledge content are being

-          prepared (e.g. in research and development),

-          used (e.g., in texts or in data fields),

-          recorded and processed (e.g. in databases),

-          represented (in the form of language texts or data),

-          passed on (e.g. via training and teaching),

-          transformed (e.g. in the course of re-use),

-          implemented and transferred (e.g. in knowledge and technology transfer), and

-          translated and interpreted (e.g. in localization of global market)

in both mono-lingual and multi-lingual environment, language resources are accorded a crucial role to prepare, process and manage the information and knowledge by human and computer. Therefore, one can rightfully say, “There is no information, communication and knowledge processing without language resources”. But in order to be prepared, recorded, processed, distributed and applied efficiently and effectively, it needs methodology (and methodology standards), software tools (and the respective standards for mark-up, interchange, evaluation, etc.).

  Relevant research areas are computational linguistics and computational lexicography, language engineering, etc. that have provided industrial or de-facto standards that wait to become official standards, which in turn helps develop the language industries at large.

  Language engineering and computational linguistics provide the methodology for the preparation, recording, processing and re-use of language resources. Computerized lexicography supplies the tools for the efficient preparation and processing of dictionary data. Language engineering and natural language processing provide the tools to represent, manage and access knowledge represented by linguistic data of different degrees of complexity. Language resource management cannot be efficient without a strong language engineering component (comprising language data, methods and tools).

  In the constantly accelerating development of the global multilingual information society, characterized by the all-pervading influence of information and communication technology (ICT), language resource management including the respective data, methods and tools is becoming more and more important not only in the field of linguistics and language engineering itself, but even more in many fields of application whether integrated or not into larger systems. The emerging knowledge and contents industries will strongly rely on language resources, methods and tools.

  In addition more and more experts are becoming active in language engineering (or human language technology). Every year new language communities are getting interested or involved in language resource management activities. New types of language (in terms of special-purpose languages, language register, kinds of texts, etc.) are becoming needed on the market to provide the ‘raw material’ for all kinds of consultancy services, training, and enhanced language products (e.g., word processors, speech recognition, machine translation, internet information retrieval, knowledge management, etc.). All of information and knowledge management applications need both terminological data and general language resources with their methods and tools.

    1.2 Quantitative Indicators of the Business Environment.

  The following list of quantitative indicators describes the business environment in order to provide adequate information to support actions of ISO/TC 37/SC 4:

  All over the world linguistic infrastructures are being established or re-enforced as part of the rapid evolution of the information and communication society. Globalization and its other side of the coin: localization – not to mention personalization, customization etc. - require multilingual communication. The ubiquity of the Internet requires computational standards for language processing. Computer-assisted language learning at all educational levels, and the increased demand for access to language resources of all kinds require standards for language resources that are accepted both by commercial software companies and open source developers. International standards are the pre-requisite to meet these new requirements concerning the reuse, interoperability, usability of data and the respective systems (or system components)

  Activities by experts related to language resource sharing and standardization increase in

-          intergovernmental governmental (IGOs) and international non-governmental organizations (NGOs, e.g., Universal Networking Language of UN University, ELRA/ELDA)

-          regional associations and their international federations (e.g., EAGLES for EU, ISLE for EU and USA, Asian Federation of Natural Language Processing),

-          national NGOs and non-profit organizations (NPOs),

-          public institutions and organizations,

-          standards bodies,

-          educational and training institutions/organizations,

-          international activities for web documents (e.g., SemWeb is a forum for semantic linking of chunks inside web documents in European Conference of Digital Library, and NKOS stands for the Network Knowledge Organizing System),

-          commercial entreprises, etc.

  New language and knowledge engineering tools assist all products of knowledge management and information management. A variety of value-added information products and services are conceived on the basis of language resources, as well as the respective methods and tools.

  Each regional language engineering experts’ group or association has introduced language resources for distribution to users, institutions and companies without standardization of language resource formats. Increasingly there is a need for new standardization as well as a fast recognition of already existing de-facto standards and their transformation into International Standards. ISO/TC 37/SC 4, therefore, has a broad range of potential standardizing activities, which are pivotal to the further development of the language, content and knowledge industries.

  2     BENEFITS EXPECTED FROM THE WORK OF ISO/TC 37/SC 4.

  The rules for language resources endorsed in International Standards should contribute to all languages irrespective of different grammar and writing systems, whose language communities want to develop their mother tongue into modern tools of communication by developing the necessary language resources. Special consideration is needed to meet the various needs of developing countries and to co-operate with specialized regional and international organizations. ISO/TC 37/SC4 documents should particularly allow the development of linguistic and language engineering applications in a multilingual environment.

  ISO/TC 37/SC4 sees to it that new developments in language engineering (or human language technology), knowledge management and information engineering are followed in international standardization work.

 

3     REPRESENTATION AND PARTICIPATION IN ISO/TC 37/SC4

  P-members and O-members of ISO/TC 37 are herewith called upon to register their interest in the new ISO/TC 37/SC 4. This indication of interest will be forwarded together with the decision of ISO/TC 37 to establish SC 4 to ISO/CS for circulation to all ISO members in order to call for the nomination of experts to participate in the standardizing activities of ISO/TC 37/SC 4.

  The Republic of Korean national standardization body, KATS (Korea Agency of Technical Standards) will support the secretariat operation for ISO TC37/SC4.

  Many high-ranking international Organizations (e.g., EAGLES, ISLE, Korean KIBS, Japanese GSK, etc.) will be in Liaison with ISO/TC 37/SC4, because most of them have extensive language resource activities (for operational purposes, as they all are operating on a multilingual basis and for their very mission, which necessitates language engineering work). Most of the international and regional organizations focussing on language resources proper will be in liaison with ISO/TC 37/SC4.

 

4  OBJECTIVES OF ISO/TC 37/SC 4 AND THE STRATEGIES FOR THEIR ACHIEVEMENT.

 

4.1  Defined objectives of ISO/TC 37/SC 4.

 

The objective of ISO/TC 37/SC4 is to prepare standards specifying principles and methods for creating, coding, processing and managing, language resources, such as written corpora, lexical corpora, speech corpora, etc. Standards produced by ISO/TC 37/SC4 particularly address the needs of industry, international trade and global economy regarding cross-lingual information retrieval, multi-lingual knowledge management and human language communication. Its technical work results in International Standards (and Technical Reports) covering language resource management principles and methods, as well as various aspects of computer-assisted lexicography and language engineering – not to mention their application in a broad array of applications.

  The objective of ISO/TC 37/SC 4 would be to develop standards containing specifications for computer-assisted language resource management, focusing on data modeling, mark-up, data exchange, and evaluation of language resources (other than terminologies).

 

4.2  Identified strategies to achieve ISO/TC 37/SC 4’s defined objectives.

 The standardization of principles and methods for the collection, processing and presentation of language resources is a distinct type of standardization activity. Its results are basic standards that have a wide-ranging application.

 The point of reference of ISO/TC 37/SC4 standards includes EAGLES documents EAG-CSG/IR-T1.1 “Recommendations on Corpus Typology”, EAG-TCWG-TTYP/P “Recommendations on Text Typology”, and EAG-TCWG-CES/R-F “Corpus Encoding”. ISO 1087 may be extended by a part 4 to cover the terminology related to SC 4.

  VISION:

ISO/TC 37/SC4 shall prepare the International Standards to support language resource management (knowledge management, translation management, language resource aspects within global content management, etc.) in the multilingual information society.

 World-wide use of ISO/TC 37/SC 4 standards will help to:

-          enhance overall quality of language resources in human language communication aspects;

-          improve information management within various industrial, technical and scientific environments and to reduce its costs;

-          increase efficiency in computer-supported language communication.

  ORGANIZATION OF WORKING GROUPS:

WG 1 Terminology (of SC 4)

Project: either a part 4 of ISO 1087 (with extended scope) or a terminology standard with a new number containing the key concepts of the work of SC 4

  WG 2 data modeling and mark-up methods

Projects: existing and future EAGLES and ISLE standard documents

 

WG 3 data exchange

Projects: e.g. OLIF

  WG 4 evaluation of language resources and language resource management systems

See EAGLES and ISLE documents, Speechdat etc.

  5 FACTORS AFFECTING IMPLEMENTATION OF THE ISO/TC 37/SC4 WORK PROGRAMME.

  The recent efforts to integrate European and US efforts for language resources have been implemented through ISLE (International Standards for Language Engineering) preceded by EAGLES project in Europe. ISO standards for language resource management must be extended to

-          include other languages (e.g., Asian languages) that are not covered by the regime of ISLE;

-          prepare ISO standards for de-facto standards that had already been set up in EAGLES;

-          promote the language resource standards to industries, research institutes, academic societies for efficient and effective re-sharing of very large-size and very reliable language resources;

-          provide the evaluation method and tools for human language technologies based on language resource management.

 

This new ISO TC37/SC4 will promote new policies, research and projects to develop language resources on the basis of its standards in all languages and in all countries where there is a need for it.

 
HOME CLOSE

 Welcome

ABOUT EAFTERM

Some methodological problems of Mongolian terminology
  Some problems of correspondences in Mongolian and Chinese terminologies  
  PLAN FOR THE ESTABLISHMENT OF ISO/TC 37/SC4 “Language Resource Management”  
Modern and Classical Language Studies Terminology & Computer Seminar
vv 关于我们vv 版权申明 vv 联系方式 vv 术语论坛 vv

建议使用IE4.0以上版本浏览器及800*600分辨率浏览本站
版权所有:中国术语信息网(China Network for terminology)
北京市朝阳区育慧南路3号  邮编:100029
电话:01064951177——3202
E-MAIL:bjeatrc@public3.bta.net.cn