rau-nlp.github.io

Natural Language Processing

Российско-Армянский Университет, II семестр 2017-2018 уч. года

Учебный план

Урок: средам, часы 10:45 – 11:30 и 11:35 – 12:20
Часы открытой двери: по субботам, по заказу

Пожалуйста, присоединитесь к списку https://groups.google.com/group/rau-nlp-2017-2018-ii

Можете также присоединиться к каналу Slack

Проекты

d-lemma

Lemmatisation using Deep Learning, Tsolak Ghukasyan
github.com/tsolakghukasyan/d-lemma

Content Parsing

Finding the main content of web pages, Sargis Abrahamyan
github.com/sargisabrahamyan/contentparser

Authorify

Text author style reduction - identifying and applying style, Harutyun Beybutyan
github.com/hbeybutyan/authorify

Misclassification detector

Harutyun Shmavonyan
github.com/harutyun3172800/Misclassification_detector

ModelJack

Nerses Nersesyan
github.com/nerses0/ModelJack

Universal Transliteration

Tigran Mardanyan

уроки и задания

I неделя
Slides
Read Norvig’s spelling corrector
Try the Google Cloud demo (scroll down) and the spaCy demo

II неделя
Slides
Break a parser - install spaCy, find an example that breaks it, explain why

III неделя
Slides
Review NLP fundamentals (tokens, lemmata, surface forms, n-grams, typology)
Play with language library ngrams module
Play with spaCy token attributes like lemma_, is_oov and pos_

IV неделя
Slides
Review NLP fundamentals for upcoming exam

V неделя
Exam I
Slides

VI неделя
Slides
Visualise word embeddings at projector.tensorflow.org or anvaka.github.io/pm/#/galaxy/word2vec-wiki
Assignment:
Download the Amazon Reviews sentiment dataset
Clone and build fastText
Email your models and any pre-processing script to the instructor, and email your score and training time to the group

VII неделя
Slides
Think about project ideas Assignment:
Train a fastText model to distinguish between three languages written in the same alphabet (for example English, Spanish and Russian translit), see fasttext.cc/blog/2017/10/02/blog-post.html

VIII неделя
Slides
Look at Twitter lang id eval blog post, fastText lang id blog post, YerevaNN translit blog post, spaCy sense2vec blog post and demo, spaCy adding a language instructions and blog post, Quora question pairs challenge, NYU Winograd Schema Challenge and rules and dataset

IX неделя
Quiz I
Slides
Write project proposal according to the project proposal guidelines
Submit a link to the git repo with the proposal to the class mailing list
Reading on seq2seq: github.com/google/seq2seq, Tensorflow tutorial

X-XI неделя
YerevaNN guest lecturer
Slides from Stanford CS 231N
Experiment with char-RNNs on an interesting text corpus or even source code

XII
Notes on machine translation
Project updates
More reading: byte-pair encodings for deep learning as presented in [Neural Machine Translation of Rare Words with Subword Units] by Sennrich, Haddow and Birch

XIII
Quiz II (by email)
Project presentations (by email)

XIV
День победы

XV
nlpguide.github.io
nlpguide.github.io/2017

XVI
How do teams build systems for the top 100-200 languages?
NLP vs general DL: Who is Chomsky? Who is Norvig? Who is Manning? Who is LeCun?
On Chomsky and the Two Cultures of Statistical Learning Norvig 2011
Yann LeCun and Christopher Manning discuss Deep Learning and Innate Priors 2018

XVII
Final exam
Final project presentations

Материалы

Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax
June 2013
Emily M. Bender, University of Washington

Speech and Language Processing (3rd ed. draft)
October 2017 draft
Dan Jurafsky, Stanford University
James H. Martin, University of Colorado

A Primer on Neural Network Models for Natural Language Processing
2015 draft
Yoav Goldberg, Bar-Ilan University