Erik Buunk, M.Sc.

Project/Data Science Officer

Innovation and Entrepreneurship Research

+49 89 24246-583
erik.buunk(at)ip.mpg.de

Areas of Interest

Data Science, Information Management, Business Consultancy, Project Management, Process optimization (Lean Six Sigma), Software development, Data Visualization, Graphic Design, Geographical Information Systems (GIS)

Resumé

since 2021
Project/Data Science Officer at the Max Planck Institute for Innovation and Competition (Innovation and Entrepreneurship Research)

2019 - 2020
Institute Fellow, Institute for Quantitative Social Science, Harvard University, MA, USA

2016 - 2019
Information management consultant, Security Region Utrecht, Netherlands

2011 - 2016
Senior IT Consultant, Municipality of Amersfoort, Netherlands

2011 - 2007
Information Consultant/Project manager Social Services, Municipality of Amersfoort, Netherlands

2003 - 2006
Graphic Design

1994 - 1996
Environmental Sciences, M.Sc. Degree (1996)

1991 - 1994
Science, Business and Administration, Propaedeutic Diploma, with Distinction (1992)

Publications

Discussion Papers

Ghosh, Mainak; Erhardt, Sebastian; Rose, Michael; Buunk, Erik; Harhoff, Dietmar (2024). PaECTER: Patent-level Representation Learning using Citation-informed Transformers, arXiv preprint 2402.19411. DOI

  • PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the next-best patent specific pre-trained language model (BERT for Patents) on our patent citation prediction test dataset on two different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. PaECTER is available on Hugging Face.

Erhardt, Sebastian; Ghosh, Mainak; Buunk, Erik; Rose, Michael; Harhoff, Dietmar (2022). Logic Mill - A Knowledge Navigation System, arXiv preprint 2301.00200.

  • Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
  • https://doi.org/10.48550/arXiv.2301.00200

Presentations

19.09.2023
Logic Mill / Tracing The Flow Of Knowledge
Research Seminar
Location: Schloss Ringberg


18.09.2023
Research Project Management
Research Seminar
Location: Ringberg Castle


12.07.2023
Tracing The Flow Of Knowledge
EPO/ARP Workshop
Location: online


03.07.2023
Tracing The Flow Of Knowledge
Poster presentation
Board of Trustees, Max Planck Institute for Innovation and Competition
Location: Munich


27.02.2023
Startup Data
Research Seminar
Location: Frauenchiemsee


06.09.2022
Startup Data Project and GDPR
Research Seminar
Location: Bernried


04.07.2022
Logic Mill – Applications of Machine Learning to Patents, Publications, and Other Text Corpora
Poster presentation
Board of Trustees, Max Planck Institute for Innovation and Competition
Location: Munich


09.06.2022
Logic Mill – Applications of Machine Learning to Patents, Publications, and Other Text Corpora
Poster presentation
Munich Summer Institute
Location: Munich


13.04.2022
New Data Sources
Research Seminar
Location: Ohlstadt


02.12.2021
Logic Mill
Research Seminar
Location: Ringberg Castle


01.12.2021
Dataroom Reproducibility
Research Seminar
Location: Ringberg Castle


01.10.2021
Tools and Resources for Reproducibility
Research Seminar
Location: Feldkirchen-Westerham


30.09.2021
Logic Mill
Research Seminar
Location: Feldkirchen-Westerham


27.07.2021
Information and Data Management at MPI-IC: Human Research Data in Practice
Location: online


06.07.2021
Logic Mill, Applications of Machine Learning to Patents, Publications, and Other Text Corpora
Location: online


26.03.2021
Replicability
Research Seminar
Location: online

Projects