whitead commited on
Commit
f88e3e1
·
verified ·
1 Parent(s): 88249e6

Added additional notes on safety

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
 
16
  ether0 is a 24B language model trained to reason in English and output molecular structures as SMILES.
17
  It is derived from fine-tuning and reinforcement learning training from Mistral-Small-24B-Instruct-2501.
18
- Ask questions in English, but they may also include molecules specified as SMILES. The SMILES need not be canonical.
19
  ether0 has limited support for IUPAC names.
20
 
21
  ## Usage
@@ -25,7 +25,7 @@ It has been trained specifically for these tasks:
25
 
26
  * IUPAC-names
27
  * formulas to structures
28
- * modifying solubilities by speciifc LogS
29
  * constrained edits (e.g., do not affect group X or do not affect scaffold)
30
  * pKA
31
  * smell/scent
@@ -40,7 +40,7 @@ It has been trained specifically for these tasks:
40
  * blood-brain barrier permeability
41
 
42
  For example, you can ask "Propose a molecule with a pKa of 9.2" or "Modify CCCCC(O)=OH to increase its pKa by about 1 unit." You cannot ask it "What is the pKa of CCCCC(O)=OH?"
43
- If you ask it questions that lie significantly beyond those tasks, it can fail.
44
 
45
  ## Limitations
46
 
@@ -54,10 +54,12 @@ See our [preprint](arxiv.org) for details on data and training process.
54
 
55
  ## Safety
56
 
57
- We performed refusal post-training for compounds listed on OPCW schedules 1 and 2. As the model knows pharmacokinetics, it can modulate toxicity.
58
- As the structure of toxic or narcotic compounds are generally known, we do not consider this a significant safety risk. The model can provide
 
 
59
  no uplift on "tacit knowledge" tasks like purification, scale-up, or processing beyond a web search or similar sized language model.
60
 
61
  ## License
62
 
63
- Open-weights (Apache 2.0) for any use.
 
15
 
16
  ether0 is a 24B language model trained to reason in English and output molecular structures as SMILES.
17
  It is derived from fine-tuning and reinforcement learning training from Mistral-Small-24B-Instruct-2501.
18
+ Ask questions in English, but they may also include molecules specified as SMILES. The SMILES do not need to be canonical and may contain stereochemistry information.
19
  ether0 has limited support for IUPAC names.
20
 
21
  ## Usage
 
25
 
26
  * IUPAC-names
27
  * formulas to structures
28
+ * modifying solubilities by specifc LogS
29
  * constrained edits (e.g., do not affect group X or do not affect scaffold)
30
  * pKA
31
  * smell/scent
 
40
  * blood-brain barrier permeability
41
 
42
  For example, you can ask "Propose a molecule with a pKa of 9.2" or "Modify CCCCC(O)=OH to increase its pKa by about 1 unit." You cannot ask it "What is the pKa of CCCCC(O)=OH?"
43
+ If you ask it questions that lie significantly beyond those tasks, it can fail. You can combine properties, although we haven't significantly benchmarked this.
44
 
45
  ## Limitations
46
 
 
54
 
55
  ## Safety
56
 
57
+ We performed refusal post-training for compounds listed on OPCW schedules 1 and 2.
58
+ We also post-trained ether0 to refuse questions about standard malicious topics like making explosives or poisons.
59
+ As the model knows pharmacokinetics, it can modulate toxicity.
60
+ However, the tructure of toxic or narcotic compounds are generally known and thus we do not consider this a safety risk. The model can provide
61
  no uplift on "tacit knowledge" tasks like purification, scale-up, or processing beyond a web search or similar sized language model.
62
 
63
  ## License
64
 
65
+ Open-weights (Apache 2.0)