Abstract:
In recent years the exponential growth in digital data and the expansion of machine learning have fostered the development of new applications in geosciences. Natural Language Processing (NLP) tackles various issues that arise from using human language data. In this study, NLP is applied to classify and map lithological descriptions in a three dimensional space. The data originates from the Australian Groundwater Explorer dataset of the Bureau of Meteorology, which contains the description and geolocation of bores drilled in New South Wales (NSW), Australia. A GloVe model trained with scientific journal articles and Wikipedia contents related to geosciences was used to obtain embeddings (vectors) from borehole descriptions. In parallel, and as a baseline, the descriptions were classified combining regular expressions and expert criterion. The description embeddings were subsequently classified using a multilayer perceptron neural network (MLP). The performance was evaluated using different accuracy metrics. The embeddings were triangulated and the resulting embeddings were classified using the trained MLP and compared against a nearest neighbour (NN) interpolation of lithological classes. The mapping of the descriptions was carried out by using 3D voxels. Coupling NLP with supervised classification alternatives and interpolation methods resulted in reasonable 3D representation of lithologies. This methodology is a first step in demonstrating the applicability of NLP to the geosciences, which also allows for an uncertainty quantification in the different steps of the process, such as classification and interpolation. Interpolation techniques, although acceptable, might be replaced by machine learning techniques to improve the performance of 3D models.