| Título: | Advancing Spanish clinical language understanding through domain-adaptive pretraining and new open clinical resources |
|---|---|
| Autores: | Guillem García Subies, Álvaro Barbero Jiménez, Paloma Martínez Fernández |
| Año: | 2026 |
We present a novel contribution to Spanish clinical Natural Language Processing (NLP) by introducing the largest publicly available clinical corpus, ClinText-SP, along with a state-of-the-art clinical encoder language model. Our corpus was meticulously curated from diverse open sources, including clinical cases from medical journals and annotated corpora from shared tasks, providing a rich and diverse dataset that was previously difficult to access. Our model, developed through domain-adaptive pretraining on this comprehensive dataset, significantly outperforms existing models on multiple clinical NLP benchmarks. By publicly releasing both the dataset and the model, we aim to empower the research community with robust resources that can drive further advancements in clinical NLP and ultimately contribute to improved healthcare applications.
Si te interesa esta publicación, puedes descargarla:
Advancing Spanish clinical language understanding through domain-adaptive pretraining and new open clinical resources.
