Document Type

Working Paper

Publication Date



We develop a protocol for using a well known lawyer-coded data set on Material Adverse Change/Effect clauses in acquisitions agreements to tokenize and calibrate a machine learning algorithm of textual analysis. Our protocol, built on both regular expression (RE) and latent semantic analysis (LSA) approaches, is designed to replicate, correct, and extend the reach of the hand-coded data. Our preliminary results indicate that both approaches perform well, though a hybridized approach improves predictive power even more. We employ Monte Carlo simulations show that our results generally carry over to out-of-sample predictions. We conclude that similar approaches could be used much more broadly in empirical legal scholarship, most specifically in the study of transactional documents in business law.