Conference video: Rachel Wagner-Kaiser, PhD Teaching Computers to Read: Natural Language Processing and Deep Learning Techniques for Parsing Documents
This is Rachel Wagner-Kaiser’s Tech Talk at WiDS Puget Sound Conference 2020. Enjoy!
ABSTRACT:
You have a million contracts scanned and stored on your company server from decades of doing business. To prove compliance, you need to know the termination clause, renewal terms, and expiration date for each of those million documents. What are your options? You could hire 100 people to each read 50 contracts a day for a year – or, teach a computer to read the documents for you! Companies often struggle to automate this process and transform their thousands or millions of documents into tangible benefits. I will discuss the challenges of extracting information from documents as well as strategies to overcome them, such as custom word embeddings, sequence labeling, B-I-O tagging, and bi-directional LSTM model architecture. With effective sampling techniques and data augmentation, the required human effort can be minimized to obtain a sufficient sample size and create performant models that unlock value.
Rachel received her PhD in astronomy examining chemical differences in ancient star clusters living in the nearby universe, combining the power of the Hubble Space Telescope and Bayesian statistics. After graduation, she joined KPMG Digital Lighthouse, where she has worked as a consultant and data scientist since 2017. She specializes in using natural language processing and deep learning to help companies unlock their unstructured data to solve a variety of business problems and drive value through automation. She loves to travel, eat good food, and hike cool new places (and ideally, all three at once).