Per IANA registry, iw was deprecated as the code for Hebrew in 1989 and the preferred code is he
Browse filesPer [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`
Original PR was merged into whisper here - https://github.com/openai/whisper/pull/401
HuggingFace transformers PR here - https://github.com/huggingface/transformers/pull/21310
The correct subtag:
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
```
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
- added_tokens.json +1 -1
added_tokens.json
CHANGED
|
@@ -30,6 +30,7 @@
|
|
| 30 |
"<|gu|>": 50333,
|
| 31 |
"<|haw|>": 50352,
|
| 32 |
"<|ha|>": 50354,
|
|
|
|
| 33 |
"<|hi|>": 50276,
|
| 34 |
"<|hr|>": 50291,
|
| 35 |
"<|ht|>": 50339,
|
|
@@ -38,7 +39,6 @@
|
|
| 38 |
"<|id|>": 50275,
|
| 39 |
"<|is|>": 50311,
|
| 40 |
"<|it|>": 50274,
|
| 41 |
-
"<|iw|>": 50279,
|
| 42 |
"<|ja|>": 50266,
|
| 43 |
"<|jw|>": 50356,
|
| 44 |
"<|ka|>": 50329,
|
|
|
|
| 30 |
"<|gu|>": 50333,
|
| 31 |
"<|haw|>": 50352,
|
| 32 |
"<|ha|>": 50354,
|
| 33 |
+
"<|he|>": 50279,
|
| 34 |
"<|hi|>": 50276,
|
| 35 |
"<|hr|>": 50291,
|
| 36 |
"<|ht|>": 50339,
|
|
|
|
| 39 |
"<|id|>": 50275,
|
| 40 |
"<|is|>": 50311,
|
| 41 |
"<|it|>": 50274,
|
|
|
|
| 42 |
"<|ja|>": 50266,
|
| 43 |
"<|jw|>": 50356,
|
| 44 |
"<|ka|>": 50329,
|