Российско-Армянский Университет, II семестр 2017-2018 уч. года
Урок: средам, часы 10:45 – 11:30 и 11:35 – 12:20
Часы открытой двери: по субботам, по заказу
Пожалуйста, присоединитесь к списку https://groups.google.com/group/rau-nlp-2017-2018-ii
Можете также присоединиться к каналу Slack
Lemmatisation using Deep Learning, Tsolak Ghukasyan
github.com/tsolakghukasyan/d-lemma
Finding the main content of web pages, Sargis Abrahamyan
github.com/sargisabrahamyan/contentparser
Text author style reduction - identifying and applying style, Harutyun Beybutyan
github.com/hbeybutyan/authorify
Harutyun Shmavonyan
github.com/harutyun3172800/Misclassification_detector
Nerses Nersesyan
github.com/nerses0/ModelJack
Tigran Mardanyan
I неделя
Slides
Read Norvig’s spelling corrector
Try the Google Cloud demo (scroll down) and the spaCy demo
II неделя
Slides
Break a parser - install spaCy, find an example that breaks it, explain why
III неделя
Slides
Review NLP fundamentals (tokens, lemmata, surface forms, n-grams, typology)
Play with language library ngrams
module
Play with spaCy token attributes like lemma_
, is_oov
and pos_
IV неделя
Slides
Review NLP fundamentals for upcoming exam
V неделя
Exam I
Slides
VI неделя
Slides
Visualise word embeddings at projector.tensorflow.org or anvaka.github.io/pm/#/galaxy/word2vec-wiki
Assignment:
Download the Amazon Reviews sentiment dataset
Clone and build fastText
Email your models and any pre-processing script to the instructor, and email your score and training time to the group
VII неделя
Slides
Think about project ideas
Assignment:
Train a fastText model to distinguish between three languages written in the same alphabet (for example English, Spanish and Russian translit), see fasttext.cc/blog/2017/10/02/blog-post.html
VIII неделя
Slides
Look at Twitter lang id eval blog post, fastText lang id blog post, YerevaNN translit blog post, spaCy sense2vec blog post and demo, spaCy adding a language instructions and blog post, Quora question pairs challenge, NYU Winograd Schema Challenge and rules and dataset
IX неделя
Quiz I
Slides
Write project proposal according to the project proposal guidelines
Submit a link to the git repo with the proposal to the class mailing list
Reading on seq2seq: github.com/google/seq2seq, Tensorflow tutorial
X-XI неделя
YerevaNN guest lecturer
Slides from Stanford CS 231N
Experiment with char-RNNs on an interesting text corpus or even source code
XII
Notes on machine translation
Project updates
More reading: byte-pair encodings for deep learning as presented in [Neural Machine Translation of Rare Words with Subword Units] by Sennrich, Haddow and Birch
XIII
Quiz II (by email)
Project presentations (by email)
XIV
День победы
XV
nlpguide.github.io
nlpguide.github.io/2017
XVI
How do teams build systems for the top 100-200 languages?
NLP vs general DL: Who is Chomsky? Who is Norvig? Who is Manning? Who is LeCun?
On Chomsky and the Two Cultures of Statistical Learning Norvig 2011
Yann LeCun and Christopher Manning discuss Deep Learning and Innate Priors
2018
XVII
Final exam
Final project presentations
Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax
June 2013
Emily M. Bender, University of Washington
Speech and Language Processing (3rd ed. draft)
October 2017 draft
Dan Jurafsky, Stanford University
James H. Martin, University of Colorado
A Primer on Neural Network Models for Natural Language Processing
2015 draft
Yoav Goldberg, Bar-Ilan University