Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling

Lawley C.J.M.; Gadd M.G.; Parsa M.; Lederer G.W.; Graham G.E.; Ford A.

doi:10.1007/s11053-023-10216-1

Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling

Файлы

Lawl_23.pdf (5.77 MB)

Дата

2023

Авторы

Аннотация

Geological maps are powerful models for visualizing the complex distribution of rock types through space and time. However, the descriptive information that forms the basis for a preferred map interpretation is typically stored in geological map databases as unstructured text data that are difﬁcult to use in practice. Herein we apply natural language processing (NLP) to geoscientiﬁc text data from Canada, the U.S., and Australia to address that knowledge gap. First, rock descriptions, geological ages, lithostratigraphic and lithodemic information, and other long-form text data are translated to numerical vectors, i.e., a word embedding, using a geoscience language model. Network analysis of word associations, nearest neighbors, and principal component analysis are then used to extract meaningful semantic relationships between rock types. We further demonstrate using simple Naive Bayes classiﬁers and the area under receiver operating characteristics plots (AUC) how word vectors can be used to: (1) predict the locations of ‘‘pegmatitic’’ (AUC = 0.962) and ‘‘alkalic’’ (AUC = 0.938) rocks; (2) predict mineral potential for Mississippi-Valley-type (AUC = 0.868) and clastic-dominated (AUC = 0.809) Zn-Pb deposits; and (3) search geoscientiﬁc text data for analogues of the giant Mount Isa clastic-dominated Zn-Pb deposit using the cosine similarities between word vectors. This form of semantic search is a promising NLP approach for assessing mineral potential with limited training data. Overall, the results highlight how geoscience language models and NLP can be used to extract new knowledge from unstructured text data and reduce the mineral exploration search space for critical raw materials.

Ключевые слова

Natural language processing, Language model, Word embedding, Semantics, Prospectivity, Critical mineral

Цитирование

Natural Resources Research, 2023, Vol. 32, No. 4, p.1503-1527

URI

https://repository.geologyscience.ru/handle/123456789/41597

Коллекции

Статьи, тезисы докладов

Полная страница элемента

Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling

Файлы

Дата

Авторы

Название журнала

ISSN журнала

Название тома

Издатель

Аннотация

Описание

Ключевые слова

Цитирование

URI

Коллекции

Подтверждение

Обзор

Дополнено

Упоминается в