|
---
|
|
tags:
|
|
- sentence-transformers
|
|
- sentence-similarity
|
|
- feature-extraction
|
|
- generated_from_trainer
|
|
- dataset_size:46338
|
|
- loss:MatryoshkaLoss
|
|
- loss:MultipleNegativesRankingLoss
|
|
base_model: Snowflake/snowflake-arctic-embed-m-v2.0
|
|
widget:
|
|
- source_sentence: What role does ESMA play in the development of guidelines and regulatory
|
|
technical standards related to cooperation arrangements with third countries as
|
|
mentioned in the text?
|
|
sentences:
|
|
- 'If a planned change is implemented notwithstanding the first and second subparagraphs,
|
|
or if an unplanned change has taken place pursuant to which the AIFM’s management
|
|
of the AIF would no longer comply with this Directive or the AIFM otherwise would
|
|
no longer comply with this Directive, the competent authorities of the home Member
|
|
State of the AIFM shall take all due measures in accordance with Article 46, including,
|
|
if necessary, the express prohibition of marketing of the AIF.
|
|
|
|
|
|
If the changes are acceptable because they do not affect the compliance of the
|
|
AIFM’s management of the AIF with this Directive, or the compliance by the AIFM
|
|
with this Directive otherwise, the competent authorities of the home Member State
|
|
of the AIFM shall, without delay, inform ESMA in so far as the changes concern
|
|
the termination of the marketing of certain AIFs or additional AIFs marketed and,
|
|
if applicable, the competent authorities of the host Member States of the AIFM
|
|
of those changes.
|
|
|
|
|
|
11.
|
|
|
|
|
|
The Commission shall adopt, by means of delegated acts in accordance with Article
|
|
56 and subject to the conditions of Articles 57 and 58, measures regarding the
|
|
cooperation arrangements referred to in point (a) of paragraph 2 in order to design
|
|
a common framework to facilitate the establishment of those cooperation arrangements
|
|
with third countries.
|
|
|
|
|
|
12.
|
|
|
|
|
|
In order to ensure uniform application of this Article, ESMA may develop guidelines
|
|
to determine the conditions of application of the measures adopted by the Commission
|
|
regarding the cooperation arrangements referred to in point (a) of paragraph 2.
|
|
|
|
|
|
13.
|
|
|
|
|
|
ESMA shall develop draft regulatory technical standards to determine the minimum
|
|
content of the cooperation arrangements referred to in point (a) of paragraph
|
|
2 so as to ensure that both the competent authorities of the home and the host
|
|
Member States receive sufficient information in order to be able to exercise their
|
|
supervisory and investigatory powers under this Directive.
|
|
|
|
|
|
Power is delegated to the Commission to adopt the regulatory technical standards
|
|
referred to in the first subparagraph in accordance with Article 10 to 14 of Regulation
|
|
(EU) No 1095/2010.
|
|
|
|
|
|
14.'
|
|
- (23) This Regulation should also apply to Union institutions, bodies, offices
|
|
and agencies when acting as a provider or deployer of an AI system.
|
|
- An operator that is a natural person or a microenterprise may mandate the next
|
|
operator or trader further down the supply chain that is not a natural person
|
|
or a microenterprise to act as an authorised representative. Such next operator
|
|
or trader further down the supply chain shall not place or make available relevant
|
|
products on the market or export them without submitting the due diligence statement
|
|
pursuant to Article 4(2) on behalf of that operator. In such cases, the operator
|
|
that is a natural person or a microenterprise shall retain responsibility for
|
|
compliance of the relevant product with Article 3, and shall communicate to that
|
|
next operator or trader further down the supply chain all information necessary
|
|
to confirm that due
|
|
- source_sentence: A review is scheduled for June 2019 to determine if the regulations
|
|
regarding hazardous substances should be broadened, based on practical experiences.
|
|
Additionally, the Commission aims to promote alternatives to animal testing by
|
|
reassessing testing requirements, potentially leading to amendments that prioritize
|
|
health and environmental safety.
|
|
sentences:
|
|
- '18 June 1994, until such plant and machinery is disposed of; (b) in the case
|
|
of the maintenance of plant and machinery already in service within a Member State
|
|
on 18 June 1994. For the purposes of point (a) Member States may, on grounds of
|
|
human health protection and environmental protection, prohibit within their territory
|
|
the use of such plant or machinery before it is disposed of. 25. Monomethyl-dichloro-diphenyl
|
|
methane Trade name: Ugilec 121 Ugilec 21 Shall not be placed on the market, or
|
|
used, as a substance or in mixtures. Articles containing the substance shall not
|
|
be placed on the market. 26. Monomethyl-dibromo-diphenyl methane bromobenzylbromotoluene,
|
|
mixture of isomers Trade name: DBBT CAS No 99688-47-8 Shall not be placed on'
|
|
- (35) | The fight against litter is a shared effort between competent authorities,
|
|
producers and consumers. Public authorities, including the Union institutions,
|
|
should lead by example.
|
|
- '7.
|
|
|
|
|
|
By 1 June 2013 the Commission shall carry out a review to assess whether or not,
|
|
taking into account latest developments in scientific knowledge, to extend the
|
|
scope of Article 60(3) to substances identified under Article 57(f) as having
|
|
endocrine disrupting properties. On the basis of that review the Commission may,
|
|
if appropriate, present legislative proposals.
|
|
|
|
|
|
8.
|
|
|
|
|
|
By 1 June 2019, the Commission shall carry out a review to assess whether or not
|
|
to extend the scope of Article 33 to cover other dangerous substances, taking
|
|
into account the practical experience in implementing that Article. On the basis
|
|
of that review, the Commission may, if appropriate, present legislative proposals
|
|
to extend that obligation.
|
|
|
|
|
|
9.
|
|
|
|
|
|
In accordance with the objective of promoting non-animal testing and the replacement,
|
|
reduction or refinement of animal testing required under this Regulation, the
|
|
Commission shall review the testing requirements of Section 8.7 of Annex VIII
|
|
by 1 June 2019. On the basis of this review, while ensuring a high level of protection
|
|
of health and the environment, the Commission may propose an amendment in accordance
|
|
with the procedure referred to in Article 133(4).
|
|
|
|
|
|
Article 139
|
|
|
|
|
|
Repeals
|
|
|
|
|
|
Directive 91/155/EEC shall be repealed.
|
|
|
|
|
|
Directives 93/105/EC and 2000/21/EC and Regulations (EEC) No 793/93 and (EC) No
|
|
1488/94 shall be repealed with effect from 1 June 2008.
|
|
|
|
|
|
Directive 93/67/EEC shall be repealed with effect from 1 August 2008.
|
|
|
|
|
|
Directive 76/769/EEC shall be repealed with effect from 1 June 2009.
|
|
|
|
|
|
References to the repealed acts shall be construed as references to this Regulation.
|
|
|
|
|
|
Article 140
|
|
|
|
|
|
Amendment of Directive 1999/45/EC
|
|
|
|
|
|
Article 14 of Directive 1999/45/EC shall be deleted.
|
|
|
|
|
|
Article 141
|
|
|
|
|
|
Entry into force and application
|
|
|
|
|
|
1.
|
|
|
|
|
|
This Regulation shall enter into force on 1 June 2007.
|
|
|
|
|
|
2.
|
|
|
|
|
|
Titles II, III, V, VI, VII, XI and XII as well as Articles 128 and 136 shall apply
|
|
from 1 June 2008.
|
|
|
|
|
|
3.
|
|
|
|
|
|
Article 135 shall apply from 1 August 2008.
|
|
|
|
|
|
4.
|
|
|
|
|
|
Title VIII and Annex XVII shall apply from 1 June 2009.
|
|
|
|
|
|
This Regulation shall be binding in its entirety and directly applicable in all
|
|
Member States.
|
|
|
|
|
|
LIST OF ANNEXES
|
|
|
|
|
|
ANNEX I GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL SAFETY
|
|
REPORTS ANNEX II REQUIREMENTS FOR THE COMPILATION OF SAFETY DATA SHEETS ANNEX
|
|
III CRITERIA FOR SUBSTANCES REGISTERED IN QUANTITIES BETWEEN 1 AND 10 TONNES ANNEX
|
|
IV EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE 2(7)(a)
|
|
ANNEX V EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE
|
|
2(7)(b) ANNEX VI INFORMATION REQUIREMENTS REFERRED TO IN ARTICLE 10 ANNEX VII
|
|
STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES
|
|
OF ONE TONNE OR MORE ANNEX VIII STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES
|
|
MANUFACTURED OR IMPORTED IN QUANTITIES OF 10 TONNES OR MORE ANNEX IX STANDARD
|
|
INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES
|
|
OF 100 TONNES OR MORE ANNEX X STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES
|
|
MANUFACTURED OR IMPORTED IN QUANTITIES OF 1 000 TONNES OR MORE ANNEX XI GENERAL
|
|
RULES FOR ADAPTATION OF THE STANDARD TESTING REGIME SET OUT IN ANNEXES VII TO
|
|
X ANNEX XII GENERAL PROVISIONS FOR DOWNSTREAM USERS TO ASSESS SUBSTANCES AND PREPARE
|
|
CHEMICAL SAFETY REPORTS ANNEX XIII CRITERIA FOR THE IDENTIFICATION OF PERSISTENT,
|
|
BIOACCUMULATIVE AND TOXIC SUBSTANCES, AND VERY PERSISTENT AND VERY BIOACCUMULATIVE
|
|
SUBSTANCES ANNEX XIV LIST OF SUBSTANCES SUBJECT TO AUTHORISATION ANNEX XV DOSSIERS
|
|
ANNEX XVI SOCIO-ECONOMIC ANALYSIS ANNEX XVII RESTRICTIONS ON THE MANUFACTURE,
|
|
PLACING ON THE MARKET AND USE OF CERTAIN DANGEROUS SUBSTANCES, MIXTURES AND ARTICLES
|
|
|
|
|
|
ANNEX I
|
|
|
|
|
|
GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL SAFETY REPORTS
|
|
|
|
|
|
0. INTRODUCTION
|
|
|
|
|
|
▼M51'
|
|
- source_sentence: What actions must the Commission take if the economic operator
|
|
does not provide commitments or if the provided commitments are deemed inappropriate
|
|
or insufficient to address the distortion?
|
|
sentences:
|
|
- '2.
|
|
|
|
|
|
Where the economic operator concerned does not offer commitments or where the
|
|
Commission considers that the commitments referred to in paragraph 1 are neither
|
|
appropriate nor sufficient to fully and effectively remedy the distortion, the
|
|
Commission shall adopt an implementing act in the form of a decision prohibiting
|
|
the award of the contract to the economic operator concerned (‘decision prohibiting
|
|
the award of the contract’). That implementing act shall be adopted in accordance
|
|
with the advisory procedure referred to in Article 48(2). Following that decision,
|
|
the contracting authority or contracting entity shall reject the tender.
|
|
|
|
|
|
3.'
|
|
- 6,5 8,9 (1) The values for biogas production from manure include negative emissions
|
|
for emissions saved from raw manure management. The value of esca considered is
|
|
equal to – 45 g CO2eq/MJ manure used in anaerobic digestion. (2) Maize whole
|
|
plant means maize harvested as fodder and ensiled for preservation. (3) Transport
|
|
of agricultural raw materials to the transformation plant is, according to the
|
|
methodology provided in the Commission's report of 25 February 2010 on sustainability
|
|
requirements for the use of solid and gaseous biomass sources in electricity,
|
|
heating and cooling, included in the ‘cultivation’ value. The value for transport
|
|
of maize silage accounts for 0,4 g CO2eq/MJ biogas.
|
|
- reduction in the consumption of lightweight plastic carrier bags. It should be
|
|
possible for Member States, while observing the general rules laid down in the
|
|
TFEU and acting in accordance with this Regulation, to adopt provisions which
|
|
go beyond the minimum waste prevention targets set out in this Regulation. When
|
|
implementing such measures, Member States should be aware of the risk of a shift
|
|
from heavier to lighter packaging materials and should prioritise measures that
|
|
minimise that risk.
|
|
- source_sentence: The content provides a comprehensive overview of numerous chemical
|
|
substances, including their structural formulas and potential applications. It
|
|
emphasizes the significance of specific compounds like acrylamide and thioacetamide,
|
|
while also addressing mixtures derived from coal tar. The information reflects
|
|
the intricate nature of chemical synthesis and the importance of understanding
|
|
the properties and uses of these compounds in various industrial contexts.
|
|
sentences:
|
|
- '2.
|
|
|
|
|
|
Each Member State shall ensure that a producer as defined in Article 3(1)(f)(iv)
|
|
and established on its territory, which sells EEE to another Member State in which
|
|
it is not established, appoints an authorised representative in that Member State
|
|
as the person responsible for fulfilling the obligations of that producer, pursuant
|
|
to this Directive, on the territory of that Member State.
|
|
|
|
|
|
3.
|
|
|
|
|
|
Appointment of an authorised representative shall be by written mandate.
|
|
|
|
|
|
Article 18
|
|
|
|
|
|
Administrative cooperation and exchange of information'
|
|
- '(a) display to customers and potential customers, in a visible manner, the labels
|
|
provided in accordance with Article 32(1), point (b) or (c); (b) make reference
|
|
to the information included on the labels provided in accordance with Article
|
|
32(1), point (b) or (c), in visual advertisements or in technical promotional
|
|
material for a specific model, in accordance with the applicable delegated acts
|
|
adopted pursuant to Article 4; and --- --- (c) not provide or display other labels,
|
|
marks, symbols or inscriptions that are likely to mislead or confuse customers
|
|
and potential customers with regard to the information included on the label regarding
|
|
ecodesign requirements. --- ---
|
|
|
|
|
|
Article 32
|
|
|
|
|
|
Obligations related to labels'
|
|
- '[2] 612-196-00-0 202-441-6 [1] 221-627-8 [2] 95-69-2 [1] 3165-93-3 [2] ►M5 —
|
|
◄ 2,4,5-Trimethylaniline [1] 2,4,5-trimethylaniline hydrochloride [2] 612-197-00-6
|
|
205-282-0 [1] -[2] 137-17-7 [1] 21436-97-5 [2] ►M5 — ◄ 4,4''-Thiodianiline [1]
|
|
and its salts 612-198-00-1 205-370-9 [1] 139-65-1 [1] ►M5 — ◄ 4,4''-Oxydianiline
|
|
[1] and its salts p-Aminophenyl ether [1] 612-199-00-7 202-977-0 [1] 101-80-4
|
|
[1] ►M5 — ◄ 2,4-Diaminoanisole [1] 4-methoxy-m-phenylenediamine 2,4-diaminoanisole
|
|
sulphate [2] 612-200-00-0 210-406-1 [1] 254-323-9 [2] 615-05-4 [1] 39156-41-7
|
|
[2] N, N,N'',N''-tetramethyl-4,4''-methylendianiline 612-201-00-6 202-959-2 101-61-1
|
|
C.I. Basic Violet 3 with ≥ 0,1 % of Michler''s ketone (EC No 202-027-5) 612-205-00-8
|
|
208-953-6 548-62-9 ►M5 — ◄ 6-Methoxy-m-toluidine p-cresidine 612-209-00-X 204-419-1
|
|
120-71-8 ►M5 — ◄ [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
|
|
"32012R0109: INSERTED") Biphenyl-3,3′,4,4′-tetrayltetraamine; Diaminobenzidine
|
|
612-239-00-3 202-110-6 91-95-2 (2-chloroethyl)(3-hydroxypropyl)ammonium chloride
|
|
612-246-00-1 429-740-6 40722-80-3 3-Amino-9-ethyl carbazole; 9-Ethylcarbazol-3-ylamine
|
|
612-280-00-7 205-057-7 132-32-1 [▼M49](./../../../legal-content/EN/AUTO/?uri=celex:32018R0675
|
|
"32018R0675: INSERTED") Reaction products of paraformaldehyde and 2-hydroxypropylamine
|
|
(ratio 3:2); [formaldehyde released from 3,3′-methylenebis[5-methyloxazolidine];
|
|
formaldehyde released from oxazolidin]; [MBO] 612-290-00-1 — — Reaction products
|
|
of paraformaldehyde with 2-hydroxypropylamine (ratio 1:1); [formaldehyde released
|
|
from α,α,α-trimethyl-1,3,5-triazine-1,3,5(2H,4H,6H)-triethanol]; [HPT] 612-291-00-7
|
|
— — Methylhydrazine 612-292-00-2 200-471-4 60-34-4 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
|
|
"32006R1907R(01): REPLACED") Ethyleneimine; aziridine 613-001-00-1 205-793-9 151-56-4
|
|
2-Methylaziridine; propyleneimine 613-033-00-6 200-878-7 75-55-8 ►M5 — ◄ Captafol
|
|
(ISO); 1,2,3,6-tetrahydro-N-(1,1,2,2-tetrachloroethylthio) phthalimide 613-046-00-7
|
|
219-363-3 2425-06-1 Carbadox (INN); methyl 3-(quinoxalin-2-ylmethylene)carbazate
|
|
1,4-dioxide; 2-(methoxycarbonylhydrazonomethyl) quinoxaline 1,4-dioxide 613-050-00-9
|
|
229-879-0 6804-07-5 A mixture of: 1,3,5-tris(3-aminomethylphenyl)-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione;
|
|
a mixture of oligomers of 3,5-bis(3-aminomethylphenyl)-1-poly[3,5-bis(3-aminomethylphenyl)-2,4,6-trioxo-1,3,5-(1H,3H,5H)-triazin-1-yl]-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione
|
|
613-199-00-X 421-550-1 — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
|
|
"32012R0109: INSERTED") Quinoline 613-281-00-5 202-051-6 91-22-5 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
|
|
"32006R1907R(01): REPLACED") Acrylamide 616-003-00-0 201-173-7 79-06-1 [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
|
|
"32021R2204: INSERTED") Butanone oxime; ethyl methyl ketoxime; ethyl methyl ketone
|
|
oxime 616-014-00-0 202-496-6 96-29-7 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
|
|
"32006R1907R(01): REPLACED") Thioacetamide 616-026-00-6 200-541-4 62-55-5 A mixture
|
|
of: N-[3-hydroxy-2-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide;
|
|
N-[2,3-Bis-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide; methacrylamide;
|
|
2-methyl-N-(2-methyl-acryloylaminomethoxymethyl)-acrylamide; N-2,3-dihydroxypropoxymethyl)-2-methylacrylamide
|
|
616-057-00-5 412-790-8 — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
|
|
"32012R0109: INSERTED") N-[6,9-dihydro-9-[[2-hydroxy-1-(hydroxymethyl)ethoxy]methyl]-6-oxo-1H-purin-2-yl]acetamide
|
|
616-148-00-X 424-550-1 84245-12-5 [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
|
|
"32021R2204: INSERTED") N-(hydroxymethyl)acrylamide; methylolacrylamide; [NMA]
|
|
616-230-00-5 213-103-2 924-42-5 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
|
|
"32006R1907R(01): REPLACED") Distillates (coal tar), benzole fraction; Light oil
|
|
(A complex combination of hydrocarbons obtained by the distillation of coal tar.
|
|
It consists of hydrocarbons having carbon numbers primarily in the range of C4
|
|
to C10 and distilling in the approximate range of 80 to 160 °C.) 648-001-00-0
|
|
283-482-7 84650-02-2 Tar oils, brown-coal; Light oil (The distillate from lignite
|
|
tar boiling in the range of approximately 80 to 250 °C. Composed primarily of
|
|
aliphatic and aromatic hydrocarbons and monobasic phenols.) 648-002-00-6 302-674-4
|
|
94114-40-6 J Benzol forerunnings (coal); Light oil redistillate, low boiling'
|
|
- source_sentence: How does the new Eurostat methodology differ in scope from the
|
|
indicators used in this Directive for calculating energy consumption?
|
|
sentences:
|
|
- (29) The methodology for calculation of primary energy consumption and final energy
|
|
consumption is aligned with the new Eurostat methodology, but the indicators used
|
|
for the purpose of this Directive have a different scope, in that they exclude
|
|
ambient energy and include energy consumption in international aviation for the
|
|
targets in primary energy consumption and final energy consumption. The use of
|
|
new indicators also implies that any changes in energy consumption of blast furnaces
|
|
are now only reflected in primary energy consumption.
|
|
- (92) InvestEU is the Union flagship programme to boost investment, especially
|
|
the green and digital transition, by providing financing and technical assistance,
|
|
for instance through blending mechanisms. Such an approach contributes to crowd
|
|
in additional public and private capital. Moreover, Member States are encouraged
|
|
to contribute to the InvestEU Member State compartment to support financial products
|
|
available to net-zero technology manufacturing, without prejudice to applicable
|
|
State aid rules.
|
|
- be used, filled or transported through the system; --- --- (iii) specify the terms
|
|
and conditions for proper handling and packaging use; --- --- (iv) specify detailed
|
|
requirements for packaging reconditioning; --- --- (v) specify the requirements
|
|
for packaging collection; --- --- (vi) specify the requirements for packaging
|
|
storage; --- --- (vii) specify the requirements for packaging filling or uploading;
|
|
--- --- (viii) specify rules to ensure the effective and efficient collection
|
|
of reusable packaging, including by providing for incentives for end users to
|
|
return the packaging to the collection points or grouped collection system; ---
|
|
--- (ix) specify rules to ensure equal and fair access to the re-use system, including
|
|
for vulnerable
|
|
pipeline_tag: sentence-similarity
|
|
library_name: sentence-transformers
|
|
metrics:
|
|
- cosine_accuracy@1
|
|
- cosine_accuracy@3
|
|
- cosine_accuracy@5
|
|
- cosine_accuracy@10
|
|
- cosine_precision@1
|
|
- cosine_precision@3
|
|
- cosine_precision@5
|
|
- cosine_precision@10
|
|
- cosine_recall@1
|
|
- cosine_recall@3
|
|
- cosine_recall@5
|
|
- cosine_recall@10
|
|
- cosine_ndcg@10
|
|
- cosine_mrr@10
|
|
- cosine_map@100
|
|
model-index:
|
|
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
|
|
results:
|
|
- task:
|
|
type: information-retrieval
|
|
name: Information Retrieval
|
|
dataset:
|
|
name: Unknown
|
|
type: unknown
|
|
metrics:
|
|
- type: cosine_accuracy@1
|
|
value: 0.7136198860693941
|
|
name: Cosine Accuracy@1
|
|
- type: cosine_accuracy@3
|
|
value: 0.9243915069911963
|
|
name: Cosine Accuracy@3
|
|
- type: cosine_accuracy@5
|
|
value: 0.9589159330226135
|
|
name: Cosine Accuracy@5
|
|
- type: cosine_accuracy@10
|
|
value: 0.981874676333506
|
|
name: Cosine Accuracy@10
|
|
- type: cosine_precision@1
|
|
value: 0.7136198860693941
|
|
name: Cosine Precision@1
|
|
- type: cosine_precision@3
|
|
value: 0.30813050233039874
|
|
name: Cosine Precision@3
|
|
- type: cosine_precision@5
|
|
value: 0.1917831866045227
|
|
name: Cosine Precision@5
|
|
- type: cosine_precision@10
|
|
value: 0.09818746763335057
|
|
name: Cosine Precision@10
|
|
- type: cosine_recall@1
|
|
value: 0.7136198860693941
|
|
name: Cosine Recall@1
|
|
- type: cosine_recall@3
|
|
value: 0.9243915069911963
|
|
name: Cosine Recall@3
|
|
- type: cosine_recall@5
|
|
value: 0.9589159330226135
|
|
name: Cosine Recall@5
|
|
- type: cosine_recall@10
|
|
value: 0.981874676333506
|
|
name: Cosine Recall@10
|
|
- type: cosine_ndcg@10
|
|
value: 0.8626251072928146
|
|
name: Cosine Ndcg@10
|
|
- type: cosine_mrr@10
|
|
value: 0.8227635844026309
|
|
name: Cosine Mrr@10
|
|
- type: cosine_map@100
|
|
value: 0.8236564067385257
|
|
name: Cosine Map@100
|
|
---
|
|
|
|
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
|
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
|
|
|
## Model Details
|
|
|
|
### Model Description
|
|
- **Model Type:** Sentence Transformer
|
|
- **Base model:** [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) <!-- at revision 5d1bbbdf0d1c2772eff7961f4cdc32b8426dac69 -->
|
|
- **Maximum Sequence Length:** 8192 tokens
|
|
- **Output Dimensionality:** 768 dimensions
|
|
- **Similarity Function:** Cosine Similarity
|
|
<!-- - **Training Dataset:** Unknown -->
|
|
<!-- - **Language:** Unknown -->
|
|
<!-- - **License:** Unknown -->
|
|
|
|
### Model Sources
|
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
|
|
|
### Full Model Architecture
|
|
|
|
```
|
|
SentenceTransformer(
|
|
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel
|
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
|
(2): Normalize()
|
|
)
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Direct Usage (Sentence Transformers)
|
|
|
|
First install the Sentence Transformers library:
|
|
|
|
```bash
|
|
pip install -U sentence-transformers
|
|
```
|
|
|
|
Then you can load this model and run inference.
|
|
```python
|
|
from sentence_transformers import SentenceTransformer
|
|
|
|
# Download from the 🤗 Hub
|
|
model = SentenceTransformer("sentence_transformers_model_id")
|
|
# Run inference
|
|
sentences = [
|
|
'How does the new Eurostat methodology differ in scope from the indicators used in this Directive for calculating energy consumption?',
|
|
'(29) The methodology for calculation of primary energy consumption and final energy consumption is aligned with the new Eurostat methodology, but the indicators used for the purpose of this Directive have a different scope, in that they exclude ambient energy and include energy consumption in international aviation for the targets in primary energy consumption and final energy consumption. The use of new indicators also implies that any changes in energy consumption of blast furnaces are now only reflected in primary energy consumption.',
|
|
'(92) InvestEU is the Union flagship programme to boost investment, especially the green and digital transition, by providing financing and technical assistance, for instance through blending mechanisms. Such an approach contributes to crowd in additional public and private capital. Moreover, Member States are encouraged to contribute to the InvestEU Member State compartment to support financial products available to net-zero technology manufacturing, without prejudice to applicable State aid rules.',
|
|
]
|
|
embeddings = model.encode(sentences)
|
|
print(embeddings.shape)
|
|
# [3, 768]
|
|
|
|
# Get the similarity scores for the embeddings
|
|
similarities = model.similarity(embeddings, embeddings)
|
|
print(similarities.shape)
|
|
# [3, 3]
|
|
```
|
|
|
|
<!--
|
|
### Direct Usage (Transformers)
|
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary>
|
|
|
|
</details>
|
|
-->
|
|
|
|
<!--
|
|
### Downstream Usage (Sentence Transformers)
|
|
|
|
You can finetune this model on your own dataset.
|
|
|
|
<details><summary>Click to expand</summary>
|
|
|
|
</details>
|
|
-->
|
|
|
|
<!--
|
|
### Out-of-Scope Use
|
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
|
-->
|
|
|
|
## Evaluation
|
|
|
|
### Metrics
|
|
|
|
#### Information Retrieval
|
|
|
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
|
|
|
|
| Metric | Value |
|
|
|:--------------------|:-----------|
|
|
| cosine_accuracy@1 | 0.7136 |
|
|
| cosine_accuracy@3 | 0.9244 |
|
|
| cosine_accuracy@5 | 0.9589 |
|
|
| cosine_accuracy@10 | 0.9819 |
|
|
| cosine_precision@1 | 0.7136 |
|
|
| cosine_precision@3 | 0.3081 |
|
|
| cosine_precision@5 | 0.1918 |
|
|
| cosine_precision@10 | 0.0982 |
|
|
| cosine_recall@1 | 0.7136 |
|
|
| cosine_recall@3 | 0.9244 |
|
|
| cosine_recall@5 | 0.9589 |
|
|
| cosine_recall@10 | 0.9819 |
|
|
| **cosine_ndcg@10** | **0.8626** |
|
|
| cosine_mrr@10 | 0.8228 |
|
|
| cosine_map@100 | 0.8237 |
|
|
|
|
<!--
|
|
## Bias, Risks and Limitations
|
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
|
-->
|
|
|
|
<!--
|
|
### Recommendations
|
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
|
-->
|
|
|
|
## Training Details
|
|
|
|
### Training Dataset
|
|
|
|
#### Unnamed Dataset
|
|
|
|
* Size: 46,338 training samples
|
|
* Columns: <code>query_text</code> and <code>doc_text</code>
|
|
* Approximate statistics based on the first 1000 samples:
|
|
| | query_text | doc_text |
|
|
|:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
|
|
| type | string | string |
|
|
| details | <ul><li>min: 9 tokens</li><li>mean: 39.44 tokens</li><li>max: 311 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 233.15 tokens</li><li>max: 1900 tokens</li></ul> |
|
|
* Samples:
|
|
| query_text | doc_text |
|
|
|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|
| <code>The regulation's applicability extends to various stakeholders involved in AI systems, including providers, deployers, importers, and manufacturers, regardless of their location. It specifically addresses high-risk AI systems and outlines the limitations of its scope, particularly concerning national security and military applications. Additionally, it clarifies that it does not interfere with the responsibilities of member states regarding national security or the operations of public authorities and international organizations in specific contexts.</code> | <code>(180) The European Data Protection Supervisor and the European Data Protection Board were consulted in accordance with Article 42(1) and (2) of Regulation (EU) 2018/1725 and delivered their joint opinion on 18 June 2021,<br><br>HAVE ADOPTED THIS REGULATION:<br><br>CHAPTER I<br><br>GENERAL PROVISIONS<br><br>Article 1<br><br>Subject matter`<br><br>1. The purpose of this Regulation is to improve the functioning of the internal market and promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental rights enshrined in the Charter, including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union and supporting innovation.<br><br>2. This Regulation lays down:<br><br>(a) harmonised rules for the placing on the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; --- --- (c) specific requirements for high-risk AI systems and oblig...</code> |
|
|
| <code>How should loans with unknown use of proceeds be allocated in terms of sectors and alignment metrics?</code> | <code>instruments. For loans whose use of proceeds is known, the value shall be included for the relevant sector and alignment metric. For loans whose use of proceeds is unknown, the gross carrying amount of the exposure shall be allocated to the relevant sectors and alignment metrics based on the counterparties’ activity distribution, including by counterparties’ turnover by activity. Institutions shall add a row in the template for each relevant combination of sectors disclosed in column (b) and alignment metrics included in column (d). ---|--- (f) | Column (f): the point in time distance of the column (d) metric(s) to the 2030 data points of the Net Zero Emissions by 2050 Scenario (NZE2050), shall be expressed in percentage points. That</code> |
|
|
| <code>What measures must AIFMs implement to ensure they do not rely solely on credit ratings for assessing the creditworthiness of AIFs' assets?</code> | <code>▼M1<br><br>The measures specifying the risk-management systems referred to in point (a) of the first subparagraph shall ensure that the AIFMs are prevented from relying solely or mechanistically on credit ratings, as referred to in the first subparagraph of paragraph 2, for assessing the creditworthiness of the AIFs’ assets.<br><br>▼B<br><br>Article 16<br><br>Liquidity management<br><br>1.<br><br>AIFMs shall, for each AIF that they manage which is not an unleveraged closed- ended AIF, employ an appropriate liquidity management system and adopt procedures which enable them to monitor the liquidity risk of the AIF and to ensure that the liquidity profile of the investments of the AIF complies with its underlying obligations.</code> |
|
|
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
|
|
```json
|
|
{
|
|
"loss": "MultipleNegativesRankingLoss",
|
|
"matryoshka_dims": [
|
|
768,
|
|
512,
|
|
256,
|
|
128,
|
|
64
|
|
],
|
|
"matryoshka_weights": [
|
|
1,
|
|
1,
|
|
1,
|
|
1,
|
|
1
|
|
],
|
|
"n_dims_per_step": -1
|
|
}
|
|
```
|
|
|
|
### Training Hyperparameters
|
|
#### Non-Default Hyperparameters
|
|
|
|
- `eval_strategy`: steps
|
|
- `learning_rate`: 2e-05
|
|
- `num_train_epochs`: 4
|
|
- `warmup_ratio`: 0.1
|
|
- `fp16`: True
|
|
- `load_best_model_at_end`: True
|
|
|
|
#### All Hyperparameters
|
|
<details><summary>Click to expand</summary>
|
|
|
|
- `overwrite_output_dir`: False
|
|
- `do_predict`: False
|
|
- `eval_strategy`: steps
|
|
- `prediction_loss_only`: True
|
|
- `per_device_train_batch_size`: 8
|
|
- `per_device_eval_batch_size`: 8
|
|
- `per_gpu_train_batch_size`: None
|
|
- `per_gpu_eval_batch_size`: None
|
|
- `gradient_accumulation_steps`: 1
|
|
- `eval_accumulation_steps`: None
|
|
- `torch_empty_cache_steps`: None
|
|
- `learning_rate`: 2e-05
|
|
- `weight_decay`: 0.0
|
|
- `adam_beta1`: 0.9
|
|
- `adam_beta2`: 0.999
|
|
- `adam_epsilon`: 1e-08
|
|
- `max_grad_norm`: 1.0
|
|
- `num_train_epochs`: 4
|
|
- `max_steps`: -1
|
|
- `lr_scheduler_type`: linear
|
|
- `lr_scheduler_kwargs`: {}
|
|
- `warmup_ratio`: 0.1
|
|
- `warmup_steps`: 0
|
|
- `log_level`: passive
|
|
- `log_level_replica`: warning
|
|
- `log_on_each_node`: True
|
|
- `logging_nan_inf_filter`: True
|
|
- `save_safetensors`: True
|
|
- `save_on_each_node`: False
|
|
- `save_only_model`: False
|
|
- `restore_callback_states_from_checkpoint`: False
|
|
- `no_cuda`: False
|
|
- `use_cpu`: False
|
|
- `use_mps_device`: False
|
|
- `seed`: 42
|
|
- `data_seed`: None
|
|
- `jit_mode_eval`: False
|
|
- `use_ipex`: False
|
|
- `bf16`: False
|
|
- `fp16`: True
|
|
- `fp16_opt_level`: O1
|
|
- `half_precision_backend`: auto
|
|
- `bf16_full_eval`: False
|
|
- `fp16_full_eval`: False
|
|
- `tf32`: None
|
|
- `local_rank`: 0
|
|
- `ddp_backend`: None
|
|
- `tpu_num_cores`: None
|
|
- `tpu_metrics_debug`: False
|
|
- `debug`: []
|
|
- `dataloader_drop_last`: False
|
|
- `dataloader_num_workers`: 0
|
|
- `dataloader_prefetch_factor`: None
|
|
- `past_index`: -1
|
|
- `disable_tqdm`: False
|
|
- `remove_unused_columns`: True
|
|
- `label_names`: None
|
|
- `load_best_model_at_end`: True
|
|
- `ignore_data_skip`: False
|
|
- `fsdp`: []
|
|
- `fsdp_min_num_params`: 0
|
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
|
- `fsdp_transformer_layer_cls_to_wrap`: None
|
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
|
- `deepspeed`: None
|
|
- `label_smoothing_factor`: 0.0
|
|
- `optim`: adamw_torch
|
|
- `optim_args`: None
|
|
- `adafactor`: False
|
|
- `group_by_length`: False
|
|
- `length_column_name`: length
|
|
- `ddp_find_unused_parameters`: None
|
|
- `ddp_bucket_cap_mb`: None
|
|
- `ddp_broadcast_buffers`: False
|
|
- `dataloader_pin_memory`: True
|
|
- `dataloader_persistent_workers`: False
|
|
- `skip_memory_metrics`: True
|
|
- `use_legacy_prediction_loop`: False
|
|
- `push_to_hub`: False
|
|
- `resume_from_checkpoint`: None
|
|
- `hub_model_id`: None
|
|
- `hub_strategy`: every_save
|
|
- `hub_private_repo`: None
|
|
- `hub_always_push`: False
|
|
- `gradient_checkpointing`: False
|
|
- `gradient_checkpointing_kwargs`: None
|
|
- `include_inputs_for_metrics`: False
|
|
- `include_for_metrics`: []
|
|
- `eval_do_concat_batches`: True
|
|
- `fp16_backend`: auto
|
|
- `push_to_hub_model_id`: None
|
|
- `push_to_hub_organization`: None
|
|
- `mp_parameters`:
|
|
- `auto_find_batch_size`: False
|
|
- `full_determinism`: False
|
|
- `torchdynamo`: None
|
|
- `ray_scope`: last
|
|
- `ddp_timeout`: 1800
|
|
- `torch_compile`: False
|
|
- `torch_compile_backend`: None
|
|
- `torch_compile_mode`: None
|
|
- `dispatch_batches`: None
|
|
- `split_batches`: None
|
|
- `include_tokens_per_second`: False
|
|
- `include_num_input_tokens_seen`: False
|
|
- `neftune_noise_alpha`: None
|
|
- `optim_target_modules`: None
|
|
- `batch_eval_metrics`: False
|
|
- `eval_on_start`: False
|
|
- `use_liger_kernel`: False
|
|
- `eval_use_gather_object`: False
|
|
- `average_tokens_across_devices`: False
|
|
- `prompts`: None
|
|
- `batch_sampler`: batch_sampler
|
|
- `multi_dataset_batch_sampler`: proportional
|
|
|
|
</details>
|
|
|
|
### Training Logs
|
|
| Epoch | Step | Training Loss | cosine_ndcg@10 |
|
|
|:----------:|:--------:|:-------------:|:--------------:|
|
|
| -1 | -1 | - | 0.7763 |
|
|
| 0.0863 | 500 | 0.2343 | - |
|
|
| **0.1726** | **1000** | **0.1259** | **0.814** |
|
|
| 0.2589 | 1500 | 0.1027 | - |
|
|
| 0.3452 | 2000 | 0.0757 | 0.8288 |
|
|
| 0.4316 | 2500 | 0.0617 | - |
|
|
| 0.5179 | 3000 | 0.0651 | 0.8288 |
|
|
| 0.6042 | 3500 | 0.0863 | - |
|
|
| 0.6905 | 4000 | 0.06 | 0.8376 |
|
|
| 0.7768 | 4500 | 0.0579 | - |
|
|
| 0.8631 | 5000 | 0.0593 | 0.8342 |
|
|
| 0.9494 | 5500 | 0.0485 | - |
|
|
| 1.0357 | 6000 | 0.0465 | 0.8384 |
|
|
| 1.1220 | 6500 | 0.0276 | - |
|
|
| 1.2084 | 7000 | 0.0353 | 0.8392 |
|
|
| 1.2947 | 7500 | 0.0335 | - |
|
|
| 1.3810 | 8000 | 0.0292 | 0.8436 |
|
|
| 1.4673 | 8500 | 0.0276 | - |
|
|
| 1.5536 | 9000 | 0.0404 | 0.8485 |
|
|
| 1.6399 | 9500 | 0.0476 | - |
|
|
| 1.7262 | 10000 | 0.0265 | 0.8601 |
|
|
| 1.8125 | 10500 | 0.017 | - |
|
|
| 1.8988 | 11000 | 0.0217 | 0.8549 |
|
|
| 1.9852 | 11500 | 0.0329 | - |
|
|
| 2.0715 | 12000 | 0.0207 | 0.8577 |
|
|
| 2.1578 | 12500 | 0.0199 | - |
|
|
| 2.2441 | 13000 | 0.015 | 0.8544 |
|
|
| 2.3304 | 13500 | 0.0143 | - |
|
|
| 2.4167 | 14000 | 0.0117 | 0.8574 |
|
|
| 2.5030 | 14500 | 0.0204 | - |
|
|
| 2.5893 | 15000 | 0.0141 | 0.8595 |
|
|
| 2.6756 | 15500 | 0.0123 | - |
|
|
| 2.7620 | 16000 | 0.0211 | 0.8538 |
|
|
| 2.8483 | 16500 | 0.0207 | - |
|
|
| 2.9346 | 17000 | 0.0134 | 0.8562 |
|
|
| 3.0209 | 17500 | 0.0276 | - |
|
|
| 3.1072 | 18000 | 0.0106 | 0.8552 |
|
|
| 3.1935 | 18500 | 0.0129 | - |
|
|
| 3.2798 | 19000 | 0.0157 | 0.8582 |
|
|
| 3.3661 | 19500 | 0.0164 | - |
|
|
| 3.4524 | 20000 | 0.0192 | 0.8614 |
|
|
| 3.5388 | 20500 | 0.0138 | - |
|
|
| 3.6251 | 21000 | 0.0141 | 0.8601 |
|
|
| 3.7114 | 21500 | 0.0109 | - |
|
|
| 3.7977 | 22000 | 0.0178 | 0.8605 |
|
|
| 3.8840 | 22500 | 0.0088 | - |
|
|
| 3.9703 | 23000 | 0.0255 | 0.8626 |
|
|
|
|
* The bold row denotes the saved checkpoint.
|
|
|
|
### Framework Versions
|
|
- Python: 3.10.15
|
|
- Sentence Transformers: 4.0.2
|
|
- Transformers: 4.49.0
|
|
- PyTorch: 2.6.0+cu126
|
|
- Accelerate: 0.26.0
|
|
- Datasets: 3.5.0
|
|
- Tokenizers: 0.21.1
|
|
|
|
## Citation
|
|
|
|
### BibTeX
|
|
|
|
#### Sentence Transformers
|
|
```bibtex
|
|
@inproceedings{reimers-2019-sentence-bert,
|
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
|
author = "Reimers, Nils and Gurevych, Iryna",
|
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
|
month = "11",
|
|
year = "2019",
|
|
publisher = "Association for Computational Linguistics",
|
|
url = "https://arxiv.org/abs/1908.10084",
|
|
}
|
|
```
|
|
|
|
#### MatryoshkaLoss
|
|
```bibtex
|
|
@misc{kusupati2024matryoshka,
|
|
title={Matryoshka Representation Learning},
|
|
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
|
|
year={2024},
|
|
eprint={2205.13147},
|
|
archivePrefix={arXiv},
|
|
primaryClass={cs.LG}
|
|
}
|
|
```
|
|
|
|
#### MultipleNegativesRankingLoss
|
|
```bibtex
|
|
@misc{henderson2017efficient,
|
|
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
|
year={2017},
|
|
eprint={1705.00652},
|
|
archivePrefix={arXiv},
|
|
primaryClass={cs.CL}
|
|
}
|
|
```
|
|
|
|
<!--
|
|
## Glossary
|
|
|
|
*Clearly define terms in order to be accessible across audiences.*
|
|
-->
|
|
|
|
<!--
|
|
## Model Card Authors
|
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
|
-->
|
|
|
|
<!--
|
|
## Model Card Contact
|
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
|
--> |