rafmacalaba commited on
Commit
39d4f87
·
1 Parent(s): ce82a96
Files changed (1) hide show
  1. app.py +3 -3
app.py CHANGED
@@ -19,7 +19,7 @@ relation_extractor = CustomGLiNERRelationExtractor(model=model, return_index=Tru
19
 
20
  # Sample text
21
  SAMPLE_TEXT = (
22
- "In 2010, Smith published the third round of the Demographic and Health Survey (DHS), a nationally representative cross-sectional survey funded and published by the World Bank with fieldwork reference year 2019 and a reference population of women aged 15–49. Conducted in 2020, it serves as the principal data source for collecting household composition, fertility and mortality rates, maternal and child health indicators, and access to water and sanitation across Nigeria, Kenya, and Ghana."
23
  )
24
 
25
  # Post-processing: prune acronyms and self-relations
@@ -161,7 +161,7 @@ def _cached_predictions(state):
161
  return json.dumps(state, indent=2)
162
 
163
  with gr.Blocks() as demo:
164
- gr.Markdown("""# Data Use Detector
165
 
166
  This Space demonstrates our fine-tuned GLiNER model’s ability to spot **dataset mentions** and **relations** in any input text. It identifies dataset names via NER, then extracts relations such as **publisher**, **acronym**, **publication year**, **data geography**, and more.
167
 
@@ -177,7 +177,7 @@ with gr.Blocks() as demo:
177
  4. Click **Get Model Predictions** to view the raw JSON output.
178
 
179
  **Resources**
180
- - **Model:** [rafmacalaba/gliner_re_finetuned-v3](https://huggingface.co/rafmacalaba/gliner_re_finetuned-v3)
181
  - **Paper:** _Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers_ – ArXiv: [2502.10263](https://arxiv.org/pdf/2502.10263)
182
  - [GLiNER GitHub Repo](https://github.com/urchade/GLiNER)
183
  - [Project Docs](https://worldbank.github.io/ai4data-use/docs/introduction.html)
 
19
 
20
  # Sample text
21
  SAMPLE_TEXT = (
22
+ "Encuesta Nacional de Hogares (ENAHO) is the Peruvian version of the Living Standards Measurement Survey, e.g. a nationally representative household survey collected monthly on a continuous basis. For our analysis, we use data from January 2007 to December 2020. The survey covers a wide variety of topics, including basic demographics, educational background, labor market conditions, crime victimization, and a module on respondent’s perceptions about the main problems in the country and trust in different local and national‐level institutions. Observations are also spatially identified at the municipality level, but here we focus on variation in the Venezuelan share of the population at the province level, of which there are 196, as these are best representative of local labor markets. Latin American Public Opinion Project (LAPOP) is an opinion survey conducted bi-annually in all countries in Latin America and designed to be representative of urban populations. This was fielded in Peru in 2010, 2012, 2014, 2017, and 2019 and consists of about 2,000 observations from mostly urban areas"
23
  )
24
 
25
  # Post-processing: prune acronyms and self-relations
 
161
  return json.dumps(state, indent=2)
162
 
163
  with gr.Blocks() as demo:
164
+ gr.Markdown(f"""# Data Use Detector
165
 
166
  This Space demonstrates our fine-tuned GLiNER model’s ability to spot **dataset mentions** and **relations** in any input text. It identifies dataset names via NER, then extracts relations such as **publisher**, **acronym**, **publication year**, **data geography**, and more.
167
 
 
177
  4. Click **Get Model Predictions** to view the raw JSON output.
178
 
179
  **Resources**
180
+ - **Model:** [rafmacalaba/gliner_re_finetuned-v3](https://huggingface.co/{DATA_MODEL_ID})
181
  - **Paper:** _Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers_ – ArXiv: [2502.10263](https://arxiv.org/pdf/2502.10263)
182
  - [GLiNER GitHub Repo](https://github.com/urchade/GLiNER)
183
  - [Project Docs](https://worldbank.github.io/ai4data-use/docs/introduction.html)