Motivation: NLP researchers design new word representation models to address the shortcomings of traditional word embeddings. However, (1) there is no standard method to evaluate the quality of these new models, and (2) majority of existing techniques are developed only for English and limited to similarity measurements among words.
Task Description:
- Main Task: Implement a unified evaluation framework in Pytorch. It will consist of a number of classifiers to answer the questions such as 'Does the model understand that the word is plural/Noun ?', 'Can the model predict its gender ?'. The final software should be easy-to-use API-like tool, supporting various forms of word representations.
- Evaluate existing word representation models with your framework
- Optional: Analyze the correlation between your classifiers and downstream NLP tasks
Conneau et. al, What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties, ACL (1) 2018: 2126-2136
Required Skills:
Interest in NLP and world languages. Excellent programming skills in Python. Familiarity with Linux, shell scripting. Previous experience of PyTorch is a plus but not a must. In a nutshell:
- Programming (5/5)
- Analysis (3/5)
- Literature (1/5)
If you are interested, please send an e-mail to: thesis@ukp.informatik.tu-darmstadt.de with the title of the posting as the subject.
- Dr. Gözde Gül Şahin