Master Theses

Tag aggregation

Summary: Investigating community detection algorithms to aggregate tag synonym pairs of Stack Overflow to meaningful topics.

On the question and answer site Stack Overflow more than 38,000 diverse tags are used to classify posts. These tags have often the same or similar meaning. The Stack Overflow community provides tag synonyms to reduce the number of redundant tags. In our previous research, we investigated those synonym pairs and derived a number of strategies to create tag synonyms automatically. The tag synonym suggestion tool TSST implements these strategies. Furthermore, we presented an approach to group tag synonyms to meaningful topics. For this, we represent our synonyms as directed, weighted graphs, and investigate several graph community detection algorithms provided by the igraph package of R to build meaningful groups of tags also called tag communities.

We apply our approach to the tags obtained from Android-related Stack Overflow posts and evaluate resulting tag communities quantitatively with various community metrics, such as the minimum and maximum size of the communities or the number of communities. In addition, we evaluate our approach qualitatively through a manual inspection and comparison of a random sample of tag communities. Our results show that we can cluster the Android tags when using the walktrap community algorithm to 2,249 meaningful tag communities.

A topic for a master thesis could be the investigation of more community detection algorithms on graphs. In particular, the idea is to investigate community detection algorithms supporting tags belonging to more than one community and evaluate the algorithms on a larger set of tags of Stack Overflow.

Topic revision: r2 - 2016-03-29 - StefanieBeyer
 

Copyright © 2012-2019 by the Software Engineering Research Group, University of Klagenfurt, Austria