Spaces:
Running
Running
Commit
·
39d4f87
1
Parent(s):
ce82a96
check
Browse files
app.py
CHANGED
@@ -19,7 +19,7 @@ relation_extractor = CustomGLiNERRelationExtractor(model=model, return_index=Tru
|
|
19 |
|
20 |
# Sample text
|
21 |
SAMPLE_TEXT = (
|
22 |
-
"
|
23 |
)
|
24 |
|
25 |
# Post-processing: prune acronyms and self-relations
|
@@ -161,7 +161,7 @@ def _cached_predictions(state):
|
|
161 |
return json.dumps(state, indent=2)
|
162 |
|
163 |
with gr.Blocks() as demo:
|
164 |
-
gr.Markdown("""# Data Use Detector
|
165 |
|
166 |
This Space demonstrates our fine-tuned GLiNER model’s ability to spot **dataset mentions** and **relations** in any input text. It identifies dataset names via NER, then extracts relations such as **publisher**, **acronym**, **publication year**, **data geography**, and more.
|
167 |
|
@@ -177,7 +177,7 @@ with gr.Blocks() as demo:
|
|
177 |
4. Click **Get Model Predictions** to view the raw JSON output.
|
178 |
|
179 |
**Resources**
|
180 |
-
- **Model:** [rafmacalaba/gliner_re_finetuned-v3](https://huggingface.co/
|
181 |
- **Paper:** _Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers_ – ArXiv: [2502.10263](https://arxiv.org/pdf/2502.10263)
|
182 |
- [GLiNER GitHub Repo](https://github.com/urchade/GLiNER)
|
183 |
- [Project Docs](https://worldbank.github.io/ai4data-use/docs/introduction.html)
|
|
|
19 |
|
20 |
# Sample text
|
21 |
SAMPLE_TEXT = (
|
22 |
+
"Encuesta Nacional de Hogares (ENAHO) is the Peruvian version of the Living Standards Measurement Survey, e.g. a nationally representative household survey collected monthly on a continuous basis. For our analysis, we use data from January 2007 to December 2020. The survey covers a wide variety of topics, including basic demographics, educational background, labor market conditions, crime victimization, and a module on respondent’s perceptions about the main problems in the country and trust in different local and national‐level institutions. Observations are also spatially identified at the municipality level, but here we focus on variation in the Venezuelan share of the population at the province level, of which there are 196, as these are best representative of local labor markets. Latin American Public Opinion Project (LAPOP) is an opinion survey conducted bi-annually in all countries in Latin America and designed to be representative of urban populations. This was fielded in Peru in 2010, 2012, 2014, 2017, and 2019 and consists of about 2,000 observations from mostly urban areas"
|
23 |
)
|
24 |
|
25 |
# Post-processing: prune acronyms and self-relations
|
|
|
161 |
return json.dumps(state, indent=2)
|
162 |
|
163 |
with gr.Blocks() as demo:
|
164 |
+
gr.Markdown(f"""# Data Use Detector
|
165 |
|
166 |
This Space demonstrates our fine-tuned GLiNER model’s ability to spot **dataset mentions** and **relations** in any input text. It identifies dataset names via NER, then extracts relations such as **publisher**, **acronym**, **publication year**, **data geography**, and more.
|
167 |
|
|
|
177 |
4. Click **Get Model Predictions** to view the raw JSON output.
|
178 |
|
179 |
**Resources**
|
180 |
+
- **Model:** [rafmacalaba/gliner_re_finetuned-v3](https://huggingface.co/{DATA_MODEL_ID})
|
181 |
- **Paper:** _Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers_ – ArXiv: [2502.10263](https://arxiv.org/pdf/2502.10263)
|
182 |
- [GLiNER GitHub Repo](https://github.com/urchade/GLiNER)
|
183 |
- [Project Docs](https://worldbank.github.io/ai4data-use/docs/introduction.html)
|