21⟩ Tell me what are the different categories you can categorized the sequence learning process?
☛ a) Sequence prediction
☛ b) Sequence generation
☛ c) Sequence recognition
☛ d) Sequential decision
“Natural Language Processing Engineer related Frequently Asked Questions by expert members with job experience as Natural Language Processing Engineer. These questions and answers will help you strengthen your technical skills, prepare for the new job interview and quickly revise your concepts”
☛ a) Sequence prediction
☛ b) Sequence generation
☛ c) Sequence recognition
☛ d) Sequential decision
Statistical learning techniques allow learning a function or predictor from a set of observed data that can make predictions about unseen or future data. These techniques provide guarantees on the performance of the learned predictor on the future unseen data based on a statistical assumption on the data generating process.
The difference is that the heuristics for decision trees evaluate the average quality of a number of disjointed sets while rule learners only evaluate the quality of the set of instances that is covered with the candidate rule.
C) 9
Bigrams: Analytics Vidhya, Vidhya is, is a, a great, great source, source to, To learn, learn data, data science
B) FALSE
Word2vec also contains preprocessing model which is not a deep neural network
D) 1 and 2
Choices A and B are correct because stopword removal will decrease the number of features in the matrix, normalization of words will also reduce redundant features, and, converting all words to lowercase will also decrease the dimensionality.
E) 12345
Except for entire document as the feature, rest all can be used as features of text classification learning model.
B) Rule-based learning and Sequence to Sequence model
choice 2 best explains examples of retrieval based models and generative models
☛ Do you know about latent semantic indexing? Where can you apply it?
☛ Is it possible to find all the occurrences of quoted text in an article? If yes, explain how?
☛ What is a POS tagger? Explain the simplest approach to build a POS tagger?
☛ Which is a better algorithm for POS tagging – SVM or hidden Markov models?
☛ What is the difference between shallow parsing and dependency parsing?
☛ What package are you aware of in python which is used in NLP and ML?
☛ Explain one application in which stop words should be removed.
☛ How will you train a model to identify whether the word “Raymond” in a sentence represents a person’s name or a company?
☛ As a beginner in Natural Language processing, from where should I start?
☛ What is the relation between sentiment analysis, natural language processing and machine learning?
☛ What is the current state of the art in natural language processing?
☛ What is the state of the art in natural language understanding?
☛ Which publications would you recommend reading for someone interested in natural language processing?
☛ What are the basics of natural language processing?
☛ Could you please explain the choice constraints of the pros/cons while choosing Word2Vec, GloVe or any other thought vectors you have used?
☛ How do you explain NLP to a layman?
☛ How do I explain NLP, text mining, and their difference in layman’s terms?
☛ What is the relationship between N-gram and Bag-of-words in natural language processing?
☛ Is deep learning suitable for NLP problems like parsing or machine translation?
☛ What is a simple explanation of a language model?
☛ What is the definition of word embedding (word representation)?
☛ How is Computational Linguistics different from Natural Language Processing?
☛ Natural Language Processing: What is a useful method to generate vocabulary for large corpus of data?
☛ How do I learn Natural Language Processing?
☛ Natural Language Processing: What are good algorithms related to sentiment analysis?
☛ What makes natural language processing difficult?
☛ What are the ten most popular algorithms in natural language processing?
☛ What is the most interesting new work in deep learning for NLP in 2017?
☛ How is word2vec different from the RNN encoder decoder?
☛ How does word2vec work?
☛ What’s the difference between word vectors, word representations and vector embeddings?
☛ What are some interesting Word2Vec results?
☛ How do I measure the semantic similarity between two documents?
☛ What is the state of the art in word sense disambiguation?
☛ What is the main difference between word2vec and fastText?
☛ In layman terms, how would you explain the Skip-Gram word embedding model in natural language processing (NLP)?
☛ In layman’s terms, how would you explain the continuous bag of words (CBOW) word embedding technique in natural language processing (NLP)?
☛ What is natural language processing pipeline?
☛ What are the available APIs for NLP (Natural Language Processing)?
☛ How does perplexity function in natural language processing?
☛ How is deep learning used in sentiment analysis?
☛ Differentiate regular grammar and regular expression.
☛ How will you estimate the entropy of the English language?
☛ Describe dependency parsing?
☛ What do you mean by Information rate?
☛ Explain Discrete Memoryless Channel (DMC).
☛ How does correlation work in text mining?
☛ How to calculate TF*IDF for a single new document to be classified?
☛ How to build ontologies?
☛ What is an N-gram in the context of text mining?
☛ What do you know about linguistic resources such as WordNet?
☛ Explain the tools you have used for training NLP models?
☛ Artificial Intelligence: What is an intuitive explanation for recurrent neural networks?
☛ How are RNNs storing ‘memory’?
☛ What are encoder-decoder models in recurrent neural networks?
☛ Why do Recurrent Neural Networks (RNN) combine the input and hidden state together and not seperately?
☛ What is an intuitive explanation of LSTMs and GRUs?
☛ Are GRU (Gated Recurrent Unit) a special case of LSTM?
☛ How many time-steps can LSTM RNNs remember inputs for?
☛ How does attention model work using LSTM?
☛ How do RNNs differ from Markov Chains?
☛ For modelling sequences, what are the pros and cons of using Gated Recurrent Units in place of LSTMs?
☛ What is exactly the attention mechanism introduced to RNN (recurrent neural network)? It would be nice if you could make it easy to understand!
☛ Is there any intuitive or simple explanation for how attention works in the deep learning model of an LSTM, GRU, or neural network?
☛ Why is it a problem to have exploding gradients in a neural net (especially in an RNN)?
☛ For a sequence-to-sequence model in RNN, does the input have to contain only sequences or can it accept contextual information as well?
☛ Can “generative adversarial networks” be used in sequential data in recurrent neural networks? How effective would they be?
☛ What is the difference between states and outputs in LSTM?
☛ What is the advantage of combining Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN)?
☛ Which is better for text classification: CNN or RNN?
☛ How are recurrent neural networks different from convolutional neural networks?
☛ What is part of speech (POS) tagging? What is the simplest approach to building a POS tagger that you can imagine?
☛ How would you build a POS tagger from scratch given a corpus of annotated sentences? How would you deal with unknown words?
☛ How would you train a model that identifies whether the word “Apple” in a sentence belongs to the fruit or the company?
☛ How would you find all the occurrences of quoted text in a news article?
☛ How would you build a system that auto corrects text that has been generated by a speech recognition system?
☛ What is latent semantic indexing and where can it be applied?
☛ How would you build a system to translate English text to Greek and vice-versa?
☛ How would you build a system that automatically groups news articles by subject?
☛ What are stop words? Describe an application in which stop words should be removed.
☛ How would you design a model to predict whether a movie review was positive or negative?
☛ What is entropy? How would you estimate the entropy of the English language?
☛ What is a regular grammar? Does this differ in power to a regular expression and if so, in what way?
☛ What is the TF-IDF score of a word and in what context is this useful?
☛ How does the PageRank algorithm work?
☛ What is dependency parsing?
☛ What are the difficulties in building and using an annotated corpus of text such as the Brown Corpus and what can be done to mitigate them?
☛ What tools for training NLP models (nltk, Apache OpenNLP, GATE, MALLET etc…) have you used?
☛ Do you have any experience in building ontologies?
☛ Are you familiar with WordNet or other related linguistic resources?
☛ Do you speak any foreign languages?
Instance based learning algorithm is also referred as Lazy learning algorithm as they delay the induction or generalization process until classification is performed.
In various areas of information science like machine learning, a set of data is used to discover the potentially predictive relationship known as ‘Training Set’. Training set is an examples given to the learner, while Test set is used to test the accuracy of the hypotheses generated by the learner, and it is the set of example held back from the learner. Training set are distinct from Test set.
B) CRF is Discriminative whereas HMM is Generative model
B) K * Log(3) / T
formula for TF is K/T
formula for IDF is log(total docs / no of docs containing “data”)
= log(1 / (⅓))
= log (3)
Hence correct choice is Klog(3)/T
C) 5
After performing stopword removal and punctuation replacement the text becomes: “Analytics vidhya great source learn data science”
Trigrams – Analytics vidhya great, vidhya great source, great source learn, source learn data, learn data science
C) 1, 2
Collaborative filtering can be used to check what are the patterns used by people, Levenshtein is used to measure the distance among dictionary terms.
D) 7, 4, 2
Nouns: I, New, Delhi, Analytics, Vidhya, Delhi, Hackathon (7)
Verbs: am, planning, visit, attend (4)
Words with frequency counts > 1: to, Delhi (2)
Hence option D is correct.