The concept of MRC (Machine Reading Comprehension) is a natural way of evaluating a computer’s language understanding ability. In the NLP (Natural Language Processing) community, MRC has gained extensive attention, intending to overcome the challenge of a machine reading text passages and then answering the questions. It is also proving to be an important technology for applications such as search engines and dialog systems. These advancements in representation learning have led to separated progress in both IR (Information retrieval) and MC (machine comprehension), however, very few studies have examined the combined design of retrieval and comprehension at multiple levels of granularity, for the development of MRS (Machine Reading at Scale) systems.
In this blog, we’ll discuss the idea and the need for MRS –
Machine Reading at Scale (MRS)
Instead of focusing just on smaller text concepts, Danqi Chen et al. came up with a solution of machine reading at scale. To achieve the task of reading Wikipedia for answering open-domain questions, a search component based on bigram hashing and TF-IDF matching was combined with a multi-layer recurrent neural network model, trained to detect answers within Wikipedia paragraphs.
MRC algorithms assume that a short text is already identified and given to the model, which is not realistic for building an open-domain QA system. On the other hand, methods that use retrieval over documents should employ search as an important part of the solution.
MRS creates a fine balance between the two approaches because it is focused on simultaneously maintaining the challenge of machine comprehension while keeping the realistic constraints of searching over a large open resource.
Why MRS is Important?
You may have seen almost every enterprise adopting chatbots in recent times. Many industries have turned to conversational AI approaches, especially banking, insurance, and telecommunications sectors, where large text logs are involved.
One of the major challenges they face for conversational AI approaches is to understand complex sentences of human speech in the same way that humans do. The issue becomes more complex when this needs to be done over a large volume of texts.
MRS can help with both these concerns by answering objective questions from large text logs or corpus, with high accuracy. The approach can be used in real-world apps like customer service.
Let’s evaluate the MRS approach to solve automatic QA capability on a large corpus –
DrQA Model
The DrQA is a system for reading comprehension applied to open-domain QA, targeted at MRS, where we’re searching for the answer to a question in a huge corpus of unstructured documents. The system must combine challenges of document retrieval alongside machine comprehension of the text.
Deep Learning Virtual Machine (DLVM) is used as the computing environment, which makes it more straightforward to use GPU-based VM instances for deep learning models. It’s supported on Windows 2016 & Ubuntu Data Science Virtual Machine, shares core VM images as the DSVM, and is configured to make deep learning easier. The experiments are run on a Linux DLVM with NVIDIA Tesla P100 GPUs. PyTorch backend is used to build models.
Facebook Research GitHub is forked for the blog work and the DrQA model is trained on the SQUAD dataset. Pre-trained MRS will be used for evaluating the large Gutenberg corpuses using transfer learning techniques.
Machine Learning Algorithms
Large-scale Python machine learning projects include problems associated with specialized machine learning architectures and designs which are hard to tackle. But finding algorithms, designing and building platforms that can deal with large text logs, corpora or data is a growing need. With the rise of big data, there’s an increasing demand for computational and algorithmic efficiency. Machine learning with Python training can help you speed up the algorithms and improve scalability.
A comprehensive machine learning course with Python can help you understand the significance of the implementation of Machine Learning in the Python programming language in detail, and leverage this knowledge in their role as data scientists.
Learn machine learning from Cognixia, the world’s leading digital talent transformation Our machine learning certification covers all the important concepts, libraries, and techniques that would help one kickstart their career in machine learning.
Here’s what this course will cover –
- Introduction to Python Programming, various packages, and related functions
- Data Wrangling using Python
- Introduction to Machine Learning with Python
- Supervised Learning – Regression & Classification
- Dimensionality Reduction
- Unsupervised Learning – Clustering
- Additional Performance Evaluation and Model Selection
- Recommendation Engines
- Association Rules Mining
- Time Series Analysis
- Reinforcement Learning
- Artificial Neural Networks & Introduction to Deep Learning