Spaces:
Running
A newer version of the Gradio SDK is available:
5.44.1
Model documentation & parameters
Algorithm version: The model version to use. Note that any HF model can be wrapped to a KeyBERT
model.
Text: The main text prompt to "understand", i.e., generate keywords.
Minimum keyphrase ngram: Lower bound for phrase size. Each keyword will have at least this many words.
Maximum keyphrase ngram: Upper bound for phrase size. Each keyword will have at least this many words.
Stop words: Stopwords to remove from the document. If not provided, no stop words removal.
Use MaxSum: To diversify the results, we take the 2 x MaxSum candidates
most similar words/phrases to the document. Then, we take all top_n combinations from the 2 x MaxSum candidates
and extract the combination that are the least similar to each other by cosine similarity. Control usage of max sum similarity for keywords generated.
MaxSum candidates: Candidates considered when enabling Use MaxSum
.
Use Max. marginal relevance: To diversify the results, we can use Maximal Margin Relevance (MMR) to create keywords / keyphrases which is also based on cosine similarity.
Diversity: Diversity for the results when enabling max. marginal relevance
.
Number of keywords: How many keywords should be generated (maximal 50).
Model card -- KeywordBERT
Model Details: KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
Developers: Maarten Grootendorst.
Distributors: Original developer's code from https://github.com/MaartenGr/KeyBERT.
Model date: 2020.
Model type: Different BERT and SciBERT models, trained on CIRCA data.
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: N.A.
Paper or other resource for more information: The KeyBERT GitHub repo.
License: MIT
Where to send questions or comments about the model: Open an issue on GT4SD repository.
Intended Use. Use cases that were envisioned during development: N.A.
Primary intended uses/users: N.A.
Out-of-scope use cases: Production-level inference.
Metrics: N.A.
Datasets: N.A.
Ethical Considerations: Unclear, please consult with original authors in case of questions.
Caveats and Recommendations: Unclear, please consult with original authors in case of questions.
Model card prototype inspired by Mitchell et al. (2019)
Citation
@misc{grootendorst2020keybert,
author = {Maarten Grootendorst},
title = {KeyBERT: Minimal keyword extraction with BERT.},
year = 2020,
publisher = {Zenodo},
version = {v0.3.0},
doi = {10.5281/zenodo.4461265},
url = {https://doi.org/10.5281/zenodo.4461265}
}