MEscriva commited on
Commit
94e82e6
·
verified ·
1 Parent(s): a19fa88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -7
README.md CHANGED
@@ -1,10 +1,87 @@
 
 
 
 
 
 
1
  ---
2
- title: README
3
- emoji: 🦀
4
- colorFrom: pink
5
- colorTo: yellow
6
- sdk: static
7
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ **Elixir**
3
+
4
+ ## 🏷️ Tagline (visible sous le nom)
5
+ **Sovereign AI for PDF intelligence – Multimodal, local, efficient.**
6
+
7
  ---
8
+
9
+ ## 📝 Description (Organization Card)
10
+
11
+ **What is Elixir?**
12
+ Elixir builds **sovereign multimodal AI models** to extract, structure, and activate data from complex PDF documents — **locally**, with **no external dependencies**, and with **full control over your data**.
13
+
14
+ We focus on **regulatory and sensitive use cases**, especially in finance, legal, and public sector — starting with recurring documents such as **KIDs**, **financial reports**, or **technical annexes**.
15
+
16
+ ---
17
+
18
+ ### ⚙️ Our Models – Small, Specialized, Sovereign
19
+
20
+ We develop and fine-tune our own compact LLMs and VLMs, tailored to the needs of regulated organizations.
21
+ These models, collectively named **SAGE models** (Sovereign AI for Governance & Extraction), are:
22
+
23
+ - Efficient enough to run on a **standard CPU or Apple chip**
24
+ - Specialized for **real-world document structures**
25
+ - Fine-tuned on **in-house datasets** we build ourselves (see: **Elixir Corpus**)
26
+
27
+ We offer a selection of these models for **open use and testing** directly via Hugging Face Spaces.
28
+
29
  ---
30
 
31
+ ### 📚 Elixir Corpus Our Data Foundation
32
+
33
+ All our models are trained on the **Elixir Corpus**, a structured collection of open datasets built from public and regulatory documents.
34
+ Each subset focuses on a key domain: finance, public governance, legal frameworks, ESG reporting, and more.
35
+
36
+ ✅ PDFs (text-based or scanned)
37
+ ✅ Tables, texts, images, and charts
38
+ ✅ Built from **OpenData**, web scraping, and internal sources
39
+ ✅ Annotations: manual, semi-automated, or model-assisted
40
+ ✅ License: **Apache 2.0** (open for research & commercial use)
41
+ ✅ Tested with models and available for download or demo
42
+
43
+ First available dataset: **KIDs Dataset (Finance)**
44
+
45
+ ---
46
+
47
+ ### 🌍 Why Elixir?
48
+
49
+ - 💡 **No hype** – real value from real data
50
+ - 🔐 **No cloud** – full data sovereignty
51
+ - ⚙️ **No guesswork** – structured output for real-world operations
52
+ - 🌱 **Low carbon** – 100x more efficient than standard cloud AI
53
+ - 🧠 **Open knowledge** – accelerating compliance & innovation for everyone
54
+
55
+ ---
56
+
57
+ ### 💼 Who is it for?
58
+
59
+ Elixir is built for:
60
+
61
+ - Researchers working on document AI, multimodality, or regulatory tech
62
+ - Companies and public institutions looking to embed **sovereign AI** into their infrastructure
63
+ - Builders who need **clean, structured data** to train or benchmark new models
64
+
65
+ ---
66
+
67
+ ### 🧪 Want to try?
68
+
69
+ Use one of our models or datasets.
70
+ Explore a demo.
71
+ Clone a Space.
72
+ Or just reach out.
73
+
74
+ We believe **your data should work for you — not the other way around**.
75
+
76
+ Let’s make it happen.
77
+ **→ https://huggingface.co/Elixir**
78
+
79
+ ---
80
+
81
+ Souhaites-tu que je t’aide à :
82
+ - Ajouter un logo/banner ?
83
+ - Générer un README type pour les datasets ?
84
+ - Faire la page d’un modèle “SAGE” ?
85
+ - Proposer un petit Space de démo ?
86
+
87
+ Je peux tout rédiger/structurer pour toi selon tes besoins.