Learning Foundation Language Models for Geoscience Knowledge Understanding and Utilization

dc.contributor.authorDeng C.
dc.contributor.authorZhang T.
dc.contributor.authorHe Z.
dc.contributor.authorChen Q.
dc.contributor.authorShi Y.
dc.contributor.authorZhou L.
dc.contributor.authorFu L.
dc.contributor.authorZhang W.
dc.contributor.authorWang X.
dc.contributor.authorZhou C.
dc.contributor.authorLin Z.
dc.contributor.authorHe J.
dc.date.accessioned2023-07-26T12:13:36Z
dc.date.available2023-07-26T12:13:36Z
dc.date.issued2023
dc.description.abstractLarge language models (LLMs) have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBenchmark, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pretrained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on over 2 million pieces of geoscience literature (3.9B Tokens) and utilize GeoSignal’s supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Experiments conducted on the GeoBenchmark demonstrate the effectiveness of our approach and datasets.ru_RU
dc.identifier.citationarXiv:2306.05064, 2023ru_RU
dc.identifier.urihttps://repository.geologyscience.ru/handle/123456789/41601
dc.language.isoenru_RU
dc.subjectGeoscience Language Modelru_RU
dc.subjectDomain Adaptationru_RU
dc.titleLearning Foundation Language Models for Geoscience Knowledge Understanding and Utilizationru_RU
dc.typeArticleru_RU

Файлы

Оригинальный пакет

Показано 1 - 1 из 1
Загрузка...
Изображение-миниатюра
Имя:
Deng_23.pdf
Размер:
1.47 MB
Формат:
Adobe Portable Document Format
Описание:

Пакет лицензий

Показано 1 - 1 из 1
Загрузка...
Изображение-миниатюра
Имя:
license.txt
Размер:
1.71 KB
Формат:
Item-specific license agreed upon to submission
Описание: