Added additional notes on safety
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ tags:
|
|
15 |
|
16 |
ether0 is a 24B language model trained to reason in English and output molecular structures as SMILES.
|
17 |
It is derived from fine-tuning and reinforcement learning training from Mistral-Small-24B-Instruct-2501.
|
18 |
-
Ask questions in English, but they may also include molecules specified as SMILES. The SMILES
|
19 |
ether0 has limited support for IUPAC names.
|
20 |
|
21 |
## Usage
|
@@ -25,7 +25,7 @@ It has been trained specifically for these tasks:
|
|
25 |
|
26 |
* IUPAC-names
|
27 |
* formulas to structures
|
28 |
-
* modifying solubilities by
|
29 |
* constrained edits (e.g., do not affect group X or do not affect scaffold)
|
30 |
* pKA
|
31 |
* smell/scent
|
@@ -40,7 +40,7 @@ It has been trained specifically for these tasks:
|
|
40 |
* blood-brain barrier permeability
|
41 |
|
42 |
For example, you can ask "Propose a molecule with a pKa of 9.2" or "Modify CCCCC(O)=OH to increase its pKa by about 1 unit." You cannot ask it "What is the pKa of CCCCC(O)=OH?"
|
43 |
-
If you ask it questions that lie significantly beyond those tasks, it can fail.
|
44 |
|
45 |
## Limitations
|
46 |
|
@@ -54,10 +54,12 @@ See our [preprint](arxiv.org) for details on data and training process.
|
|
54 |
|
55 |
## Safety
|
56 |
|
57 |
-
We performed refusal post-training for compounds listed on OPCW schedules 1 and 2.
|
58 |
-
|
|
|
|
|
59 |
no uplift on "tacit knowledge" tasks like purification, scale-up, or processing beyond a web search or similar sized language model.
|
60 |
|
61 |
## License
|
62 |
|
63 |
-
Open-weights (Apache 2.0)
|
|
|
15 |
|
16 |
ether0 is a 24B language model trained to reason in English and output molecular structures as SMILES.
|
17 |
It is derived from fine-tuning and reinforcement learning training from Mistral-Small-24B-Instruct-2501.
|
18 |
+
Ask questions in English, but they may also include molecules specified as SMILES. The SMILES do not need to be canonical and may contain stereochemistry information.
|
19 |
ether0 has limited support for IUPAC names.
|
20 |
|
21 |
## Usage
|
|
|
25 |
|
26 |
* IUPAC-names
|
27 |
* formulas to structures
|
28 |
+
* modifying solubilities by specifc LogS
|
29 |
* constrained edits (e.g., do not affect group X or do not affect scaffold)
|
30 |
* pKA
|
31 |
* smell/scent
|
|
|
40 |
* blood-brain barrier permeability
|
41 |
|
42 |
For example, you can ask "Propose a molecule with a pKa of 9.2" or "Modify CCCCC(O)=OH to increase its pKa by about 1 unit." You cannot ask it "What is the pKa of CCCCC(O)=OH?"
|
43 |
+
If you ask it questions that lie significantly beyond those tasks, it can fail. You can combine properties, although we haven't significantly benchmarked this.
|
44 |
|
45 |
## Limitations
|
46 |
|
|
|
54 |
|
55 |
## Safety
|
56 |
|
57 |
+
We performed refusal post-training for compounds listed on OPCW schedules 1 and 2.
|
58 |
+
We also post-trained ether0 to refuse questions about standard malicious topics like making explosives or poisons.
|
59 |
+
As the model knows pharmacokinetics, it can modulate toxicity.
|
60 |
+
However, the tructure of toxic or narcotic compounds are generally known and thus we do not consider this a safety risk. The model can provide
|
61 |
no uplift on "tacit knowledge" tasks like purification, scale-up, or processing beyond a web search or similar sized language model.
|
62 |
|
63 |
## License
|
64 |
|
65 |
+
Open-weights (Apache 2.0)
|