SimulateGPT/CITATION.cff at main · OpenBioLink/SimulateGPT · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: SimulateGPT
message: GPT-4 as a biomedical simulator
type: software
authors:
  - given-names: Moritz
    family-names: Schaefer
    email: mschaefer@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0001-6489-1947'
  - given-names: Stephan
    family-names: Reichl
    email: sreichl@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0001-8555-7198'
  - given-names: Rob
    family-names: ter Horst
    email: rterhorst@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0003-0576-5873'
  - given-names: Adele M
    family-names: Nicolas
    email: anicolas@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0003-0784-7207'
  - given-names: Thomas
    family-names: Krausgruber
    email: tkrausgruber@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0002-1374-0329'
  - given-names: Francesco
    family-names: Piras
    email: fpiras@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0002-0938-6072'
  - given-names: Peter
    family-names: Stepper
    email: pstepper@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0003-1785-2405'
  - given-names: Christoph
    family-names: Bock
    email: cbock@cemm.oeaw.ac.at
    affiliation: CeMM Research Center for Molecular Medicine
    orcid: 'https://orcid.org/0000-0001-6091-3088'
  - given-names: Matthias
    family-names: Samwald
    email: matthias.samwald@meduniwien.ac.at
    affiliation: Medical University of Vienna
    orcid: 'https://orcid.org/0000-0002-4855-2571'
identifiers:
  - type: doi
    value: 10.1016/j.compbiomed.2024.108796
    description: Computers in Biology and Medicine Paper DOI
  - type: url
    value: 'https://doi.org/10.1016/j.compbiomed.2024.108796'
    description: Computers in Biology and Medicine Paper URL
  - type: doi
    value: 10.1101/2023.06.16.545235
    description: bioRxiv DOI
  - type: url
    value: >-
      https://www.biorxiv.org/content/10.1101/2023.06.16.545235v1
    description: bioRxiv URL
repository-code: 'https://github.com/OpenBioLink/SimulateGPT'
abstract: >-
  Background

  Computational simulation of biological processes can be a
  valuable tool for accelerating biomedical research, but
  usually requires extensive domain knowledge and manual
  adaptation. Large language models (LLMs) such as GPT-4
  have proven surprisingly successful for a wide range of
  tasks. This study provides proof-of-concept for the use of
  GPT-4 as a versatile simulator of biological systems.


  Methods

  We introduce SimulateGPT, a proof-of-concept for
  knowledge-driven simulation across levels of biological
  organization through structured prompting of GPT-4. We
  benchmarked our approach against direct GPT-4 inference in
  blinded qualitative evaluations by domain experts in four
  scenarios and in two quantitative scenarios with
  experimental ground truth. The qualitative scenarios
  included mouse experiments with known outcomes and
  treatment decision support in sepsis. The quantitative
  scenarios included prediction of gene essentiality in
  cancer cells and progression-free survival in cancer
  patients.


  Results

  In qualitative experiments, biomedical scientists rated
  SimulateGPT's predictions favorably over direct GPT-4
  inference. In quantitative experiments, SimulateGPT
  substantially improved classification accuracy for
  predicting the essentiality of individual genes and
  increased correlation coefficients and precision in the
  regression task of predicting progression-free survival.


  Conclusion

  This proof-of-concept study suggests that LLMs may enable
  a new class of biomedical simulators. Such text-based
  simulations appear well suited for modeling and
  understanding complex living systems that are difficult to
  describe with physics-based first-principles simulations,
  but for which extensive knowledge is available as written
  text. Finally, we propose several directions for further
  development of LLM-based biomedical simulators, including
  augmentation through web search retrieval, integrated
  mathematical modeling, and fine-tuning on experimental
  data.
keywords:
  - Biomedicine
  - Simulation
  - Large Language Models
  - Computational Biology
  - Artificial intelligence
license: MIT