Loading…
Loading grant details…
| Funder | Swedish Research Council |
|---|---|
| Recipient Organization | University of Gothenburg |
| Country | Sweden |
| Start Date | Jan 01, 2023 |
| End Date | Dec 31, 2028 |
| Duration | 2,191 days |
| Number of Grantees | 4 |
| Roles | Principal Investigator; Co-Investigator |
| Data Source | Swedish Research Council |
| Grant ID | 2022-02311_VR |
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the presence of personal and sensitive information, e.g names, political opinions.
GDPR suggests pseudonymization as a solution, but we need to learn more about it before adopting it for manipulation of research data.
This environment targets several aspects of pseudonymization, aiming to advance Sweden´s work on open access to research data: algorithms to automatically detect, label and pseudonymize personal identifiers in freely written texts (essays/blogs), focusing on linguistic challenges such as spelling errors, ambiguous entities, semantic constraints etc analysis of type and number of personal identifiers versus acceptable protection, followed by re-identification tests to ensure that pseudonymization is effective analysis of the effects of pseudonymization on research data, e.g on the readability of the resulting texts, their utility for answering the intended research questions and applicability to practical scenarios (e.g language assessment)We will use Swedish learner-written essays, collected and manually annotated by us, and generalize to social media domain (through available corpora).
Natural Language Processing, machine learning, neural networks, word embeddings are some of the methods we will work with.Tools and datasets will be openly shared; theoretical and methodological insights will be discussed in articles.
University of Gothenburg
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant