Spaces:

ai4data
/

datause-detector

Running

App Files Files Community

rafmacalaba commited on 8 days ago

Commit

39d4f87

1 Parent(s): ce82a96

check

Browse files

Files changed (1) hide show

app.py +3 -3

app.py CHANGED Viewed

@@ -19,7 +19,7 @@ relation_extractor = CustomGLiNERRelationExtractor(model=model, return_index=Tru
 # Sample text
 SAMPLE_TEXT = (
-"In 2010, Smith published the third round of the Demographic and Health Survey (DHS), a nationally representative cross-sectional survey funded and published by the World Bank with fieldwork reference year 2019 and a reference population of women aged 15–49. Conducted in 2020, it serves as the principal data source for collecting household composition, fertility and mortality rates, maternal and child health indicators, and access to water and sanitation across Nigeria, Kenya, and Ghana."
 )
 # Post-processing: prune acronyms and self-relations
@@ -161,7 +161,7 @@ def _cached_predictions(state):
     return json.dumps(state, indent=2)
 with gr.Blocks() as demo:
-    gr.Markdown("""# Data Use Detector
     This Space demonstrates our fine-tuned GLiNER model’s ability to spot **dataset mentions** and **relations** in any input text. It identifies dataset names via NER, then extracts relations such as **publisher**, **acronym**, **publication year**, **data geography**, and more.
@@ -177,7 +177,7 @@ with gr.Blocks() as demo:
     4. Click **Get Model Predictions** to view the raw JSON output.
     **Resources**
-    - **Model:** [rafmacalaba/gliner_re_finetuned-v3](https://huggingface.co/rafmacalaba/gliner_re_finetuned-v3)
     - **Paper:** _Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers_ – ArXiv: [2502.10263](https://arxiv.org/pdf/2502.10263)
     - [GLiNER GitHub Repo](https://github.com/urchade/GLiNER)
     - [Project Docs](https://worldbank.github.io/ai4data-use/docs/introduction.html)

 # Sample text
 SAMPLE_TEXT = (
+"Encuesta Nacional de Hogares (ENAHO) is the Peruvian version of the Living Standards Measurement Survey, e.g. a nationally representative household survey collected monthly on a continuous basis. For our analysis, we use data from January 2007 to December 2020. The survey covers a wide variety of topics, including basic demographics, educational background, labor market conditions, crime victimization, and a module on respondent’s perceptions about the main problems in the country and trust in different local and national‐level institutions. Observations are also spatially identified at the municipality level, but here we focus on variation in the Venezuelan share of the population at the province level, of which there are 196, as these are best representative of local labor markets. Latin American Public Opinion Project (LAPOP) is an opinion survey conducted bi-annually in all countries in Latin America and designed to be representative of urban populations. This was fielded in Peru in 2010, 2012, 2014, 2017, and 2019 and consists of about 2,000 observations from mostly urban areas"
 )
 # Post-processing: prune acronyms and self-relations
     return json.dumps(state, indent=2)
 with gr.Blocks() as demo:
+    gr.Markdown(f"""# Data Use Detector
     This Space demonstrates our fine-tuned GLiNER model’s ability to spot **dataset mentions** and **relations** in any input text. It identifies dataset names via NER, then extracts relations such as **publisher**, **acronym**, **publication year**, **data geography**, and more.
     4. Click **Get Model Predictions** to view the raw JSON output.
     **Resources**
+    - **Model:** [rafmacalaba/gliner_re_finetuned-v3](https://huggingface.co/{DATA_MODEL_ID})
     - **Paper:** _Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers_ – ArXiv: [2502.10263](https://arxiv.org/pdf/2502.10263)
     - [GLiNER GitHub Repo](https://github.com/urchade/GLiNER)
     - [Project Docs](https://worldbank.github.io/ai4data-use/docs/introduction.html)