ONLP-LAB::About YAP

About

General

YAP provides a complete automatic, morphological and syntactic annotation for Hebrew texts. The code in freely available in our git page here. The model is based on More et al. 2019

Citation

If you make use of this parser or software for research, please cite:

            
@article{more-etal-2019-joint,
    title = "Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing Strategies
            for {MRL}s and a Case Study from Modern {H}ebrew",
    author = "More, Amir  and
            Seker, Amit  and
            Basmova, Victoria  and
            Tsarfaty, Reut",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    url = "https://www.aclweb.org/anthology/Q19-1003",
    doi = "10.1162/tacl_a_00253",
    pages = "33--48",
}

Background

Computational models for the automatic analysis of natural language texts which were originally developed for parsing Europen languages are not well suited for the analysis of languages that exhibit strong interaction between morphology and syntax -- also called Mophologically Rich Language (MRL). These languages, including Hebrew, Arabic and other Semitic languages pose a specific challenge to the standard language processing pipeline.
We at the ONLP lab aim to develop computational models that can successfully cope with parsing MRLs.

The Challenge

The challenge, in a nutshell, is as follows: In order to syntactically analyze MRL texts, we first have to break the input tokens down to their constituent morphemes. For instance, the word אהבתיה needs to be broken down into אני + אהבתי + את + היא in order to find out its predicate argument structure, or as commonly termed: "who did what to whom". However, due to extreme morphological ambiguity, global context is required in order to correctly decompose raw tokens into morphemes. For instance, the token הקפה will receive a different analyses in different contexts:

השלמנו הקפה מלאה של המסלול
הקפה היה מאוד מאוד מאוד חם

In the first clause הקפה is a single word, and in the second it corresponds to two distinct segments ה + קפה.
So, global syntactic analysis requires, but at the same time is required for, accurate morpholgical analysis.
How can we solve this apparent loop?

The Solution

To overcome this challenge, we at the ONLP lab develop joint morphosyntactic models that exploit morphological-syntactic interaction in order to allow for accurate morphological as well as syntactic parsing of raw Hebrew texts. On our parser page we present yap, yet another parser, written in Go, by Amir More's in his MSc thesis. The yap parser has been implemented to test the joint morphosyntactic parsing hypothesis in a transition-based framework.

Yap currently presents the state of the art on both morphological and syntactic analysis of Modern Hebrew texts. It is trained on an updated version of the SPMRL 2013 Hebrew treebank, which contains about 6K annotated sentences from the Hebrew nespaper Haaretz. Yap is under active development and documentation. The project page can be found here.

If you have any questions or comments, do not hesitate to get in touch.