Jump to content

NooJ

From Wikipedia, the free encyclopedia
Nooj
Written in Java C#
Websitehttps://nooj.univ-fcomte.fr/

NooJ is a sophisticated linguistic development environment and corpus processing software created by Max Silberztein. It enables linguists to construct and analyze the four classes of the Chomsky-Schützenberger hierarchy of generative grammars: Regular Grammars, Context-Free Grammars, Context-Sensitive Grammars as well as Unrestricted Grammars. Users can create these grammars using either a text editor for writing regular expressions or a graphical editor for visual representation.[1]

NooJ allows linguists to develop orthographical and morphological grammars, dictionaries of simple words, of compound words as well as discontinuous expressions, local syntactic grammars (such as Named Entities Recognizers),[2][3] structural syntactic grammars (that produce syntactic trees) as well as Zellig Harris transformational grammars.

All NooJ parsers operate on Atomic Linguistic Units (ALUs) rather than traditional word forms (i.e. sequences of letters between two space characters),[4] allowing for nuanced parsing of phrases like “can not” as well as their contracted forms, such as “cannot” or “can’t”. This feature facilitates the creation of relatively simple syntactic grammars, even for agglutinative languages.

ALUs are represented by annotations stored in the Text Annotation Structure (or TAS) enabling NooJ parsers to add or remove annotations dynamically. A typical analysis in NooJ involves applying a series of elementary grammars in a cascading, bottom-up approach, progressing from spelling to semantics.

History of NooJ

[edit]

NooJ originated from the research conducted by Max Silberztein and the INTEX community, a group of linguists focused on the Lexicon-Grammar approach developed by Maurice Gross at the LADL (Laboratoire d’Automatique Documentaire et Linguistique). This approach posits that no grammar rule can be formulated without a precise definition of its domain of application, emphasizing the importance of context in linguistic analysis.

Since its inception, NooJ has been utilized as a powerful corpus processor across various fields, including: Linguistics (for analyzing language structures and patterns),[5][6] History (for processing historical texts and documents),[7] Psychology (in studies examining language use and cognition),[8][9] in Literature studies (for textual analysis and interpretation),[10] in sentiment analysis projects (to assess emotional tone in written content),[11] data mining (for extracting useful information from large datasets),[12][13][14] and even to analyze and interpret musical notation.[15] Notably, NooJ played a significant role in the MARS 500 experiment,[16] showcasing its versatility in handling complex data. Additionally, several software companies have leveraged NooJ to develop Information Extraction and Information Retrieval systems, further demonstrating its impact on computational linguistics and related disciplines.

Complexity and application

[edit]

NooJ’s dictionaries are represented by finite-state transducers and can effectively handle various linguistic constructs, including simple words[17] (e.g. 'table'), compound words[18] (e.g. 'as a matter of fact') as well as discontinuous expressions such as phrasal verbs (e.g. 'to turn … off'),[19] idiomatic expressions[20] (e.g. 'to take the bull by the horns') as well as support verb/predicative noun associations (e.g. 'to take a nap').

NooJ enables linguists to create, edit, debug and maintain a wide number of grammars that fall within the four classes of generative grammars in the Chomsky-Schützenberger hierarchy: regular grammars, context-free grammars, context-sensitive grammars, and unrestricted grammars.

NooJ is capable of applying grammars to texts in linear time. For instance, many NooJ context-free grammars can be derecursived for efficiency. Context-sensitive grammars in NooJ consist of two components: one part is a context-free (or even regular) grammar applied efficiently to texts, while the second part includes a set of constraints applied to matching sequences, each executed in constant time.

Unrestricted grammars in NooJ are context-sensitive grammars that can incorporate variables and modify the input text. These grammars are typically used for transformational analysis and generation (see Zellig Harris). Moreover, several research teams have demonstrated that when combined with multilingual lexicons, NooJ can effectively perform Machine Translation.[21][22]

References

[edit]
  1. ^ Silberztein M., 2016. Formalizing Natural Languages: The NooJ Approach, Cognitive science series, Wiley-ISTE, UK, 2016. ISBN 9781848219021.
  2. ^ Fehri H., Haddar K. and Ben Hamadou A. 2011. A new representation model for the automatic recognition and translation of Arabic Named Entities with NooJ. RANLP 2011 (Hissar, Bulgaria)[1]
  3. ^ Mota C. and Grishman R. 2008. Is this NE tagger getting old? Proceedings of LREC 2008. Marrakech: ELRA, pp. 1196-1202.[2]
  4. ^ Silberztein M., 2003. NooJ manual
  5. ^ Mesfar S. 2011. Towards a Cascade of Morpho-syntactic Tools for Arabic Natural Language Processing. Computational Linguistics and Intelligent Text Processing, LNCS Vol 6008, Springer, pp. 150-162
  6. ^ Trouilleux, F. 2014. Un dictionnaire et une grammaire de composés français. TALN 2014, Marseille [3]
  7. ^ Gucul-Milojević S., Radulović V. and Krstev C. 2010. A View on the Representation of Women in Serbian Newspaper Texts. Applications of Finite-State Language Processing : Selected Papers from the NooJ 2008 International Conference (Budapest, Hungaria). Edited by Kuti Judit, Silberztein Max, Varadi Tamas. Cambridge Scholars Publishing, Newcastle., UK: 166-176
  8. ^ Ehmann B., Lendvai P., Pólya T., Vincze O., Miháltz M., Tihanyi L., Váradi T. and László J. 2012. Narrative Psychological Application of Semantic Role Labeling. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2011 International Conference (Dubrovnik, Croatia). Edited by Kristina Vučković, Božo Bekavac and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 218-228
  9. ^ Pilar L. and Reimerink A. 2014. From term dynamics to concept dynamics: term variation and multidimensionality in the psychiatric domain. Proceedings of EURALEX 2014. Bolzano, July 15-19., Italy [4]
  10. ^ Mesfar S., Gambin M. and Piton O. 2012. In the Pursuit of a Lost Manuscript: Ptolemy’s Planisphaerium. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2011 International Conference (Dubrovnik, Croatia). Edited by Kristina Vučković, Božo Bekavac and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 205-217
  11. ^ Merkler D. and Agić Ž. 2013. Sentiscope: A System for Sentiment Analysis in Daily Horoscopes. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2012 International Conference (Paris, France). Edited by Anaïd Donabédian, Victoria Khurshudian and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 173-181
  12. ^ Elia A., Vietri S., Postiglione A., Monteleone M. and Marano F. 2010. Data Mining Modular Software System. SWWS2010 - Proceedings of the 2010 International Conference on Semantic Web & Web Services, Las Vegas, Nevada, USA, pp. 127-133. ISBN 9781601321619
  13. ^ Matos S., Barreiro A. and Oliveira J.L. 2009. Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature. Progress in Artificial Intelligence, LNCS Vol. 5816, pp. 79-85.
  14. ^ Pilar L. and Faber P. 2012. Causality in the Specialized Domain of the Environment. Proceedings of the Workshop Semantic Relations-II. Enhancing Resources and Applications (LREC12), eds. Mititelu V.B., Popescu O. and Pekar V. Istanbul:ELRA, Turkey, pp. 10-17.
  15. ^ Kocijan K., Librenjak S. and Dovedan Z. 2014. Introducing Music to NooJ . Formalising Natural Languages with NooJ 2013 : Selected Papers from the NooJ 2013 International Conference (Saarbrücken, Germany). Edited by Svetla Koeva, Slim Mesfar and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 209-222
  16. ^ Ehmann B., Balázs L., Shved D., Bénet V. and Gushin V. 2013. The Russian Linguistic Resources in Space Psychological Research. Formalising Natural Languages with NooJ : Selected Papers from the NooJ 2012 International Conference (Paris, France). Edited by Anaïd Donabédian, Victoria Khurshudian and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 150-161
  17. ^ Piton O., Lagji Kl. and Përnaska R. 2007. Electronic Dictionaries and Transducers for Automatic Processing of Albanian Language. Proceedings of 12th International conference NLDB 2007, CNAM, Paris, France. LNCS Series, Springer Verlag, pp.407-413.
  18. ^ Chadjipapa E., Papadopoulou E. and Gavriilidou Z. 2010. New data in the Greek NooJ module: Compounds and Proper Nouns. Applications of Finite-State Language Processing : Selected Papers from the NooJ 2008 International Conference (Budapest, Hungaria). Edited by Kuti Judit, Silberztein Max, Varadi Tamas. Cambridge Scholars Publishing, Newcastle., UK: 93-100
  19. ^ Machonis P.A. 2010. English Phrasal Verbs: from Lexicon-Grammar to Natural Language Processing. Southern Journal of Linguistics 34.1, United-States: 21-48
  20. ^ Vietri S. 2014. Idiomatic Constructions in Italian. A Lexicon-Grammar Approach. John Benjamins BV: Amsterdam Netherlands. ISBN 9789027231413
  21. ^ Barreiro A. 2008. Port4NooJ: Portuguese Linguistic Module and Bilingual Resources for Machine Translation. In Proceedings of the 2007 International NooJ Conference (Barcelona, Spain). Edited by Xavier Blanco and Max Silberztein. Cambridge Scholars Publishing, Newcastle , UK: 19-47
  22. ^ Soussi R., Mesfar S. and Faget M. 2014. STORM Project: Towards a NooJ Module within Armadillo Database to Manage Museum Collection . Formalising Natural Languages with NooJ 2013 : Selected Papers from the NooJ 2013 International Conference (Saarbrücken, Germany). Edited by Svetla Koeva, Slim Mesfar and Max Silberztein. Cambridge Scholars Publishing, Newcastle., UK: 223-232
[edit]