In this project, we use text mining techniques to mine forum data and create a search system that allows users to get insight in forum posts (either Dutch or English)

Forum posts contain a lot of valuable information. It is, however, sometimes hard to find that information in the multitude of posts available. This digital instrument helps users to find the relevant posts using a search function and gain insights in the posts using a graph with related terms and their relations. The projects were focused on specific types of cancer, such as GIST (English forum) and hemato-oncological types (Dutch fora). The website is validated by experts in the field.

State of the art Natural Language Processing techniques are used to extract entities (possibly relevant nouns and verbs), create entity graph using co-occurrence counts and using a smart (ElasticSearch) database and search function to find the related posts for a query. A state of the art summarization technique, developed by the University Leiden, is used to select only the relevant sentences and a visualization is developed to show all functionalities properly.

The website is still available, but locked with a password to keep track of the users.


  • Maaike de Boer, Scientist, TNO, e-mail: