Mining the Network of Coordination Interactions in Wikipedia

From AIRWiki
Jump to: navigation, search
Mining the Network of Coordination Interactions in Wikipedia
Short Description: This project aims to analyze the interactions between wiki users through the study of discussion pages (Talk: and User talk: namespaces).
Coordinator: MarcoColombetti (colombet@elet.polimi.it)
Tutor: DavidLaniado (david.laniado@gmail.com), RiccardoTasso (tasso@elet.polimi.it)
Collaborator:
Students: DanieleMetilli (daniele.metilli@gmail.com)
Research Area: Social Software and Semantic Web
Research Topic: Social Network Analysis
Start: 2009/07/16
End: 2010/02/22
Status: Closed
Level: Bs
Type: Thesis

This project, named Mining the Network of Coordination Interactions in Wikipedia, aims to analyze the interactions between Wikipedia users through the study of discussion pages. This work can be thought of as an extension of the Wikipedia Social Network study, from a slightly different perspective. The aim of the project is the same (i.e. study the network of Wikipedia users by applying the techniques of social network analysis on it), but this work is more centered on the direct coordination interactions between users, since it focuses on their discussions.

Project outline

Phase 1: State of the art

In the first phase, previous papers on the subject will be collected. Many papers have analyzed Wikipedia's content and editing patterns, but just a few have studied the social network formed by its users. Studies about web forums will also be useful, since Wikipedia's discussion pages, while not being structured as a forum, have a similar function and develop in a similar way. All these papers will be listed here.

Phase 2: Modeling

A few Wikipedia discussion pages will be studied to create a suitable model that can properly describe the relations between users and their interactions. Which user should be considered the "top user" in a discussion? How should repliers be connected to each other? How should anonymous users be considered? Should the model be a bipartite network? We will also have to distinguish between discussions about articles (talk pages) and personal discussions between users (user talk pages).

Phase 3: Implementation

The model will then be implemented through the use of the Java language. A parser will be built to read wikitext from Wikipedia discussion pages and extract the useful data from it. The program will be used to collect data on a suitable number of Wikipedia pages, so that we can have a large dataset of user connections to analyze. It would also be interesting to create a "live" version of the program that can run on demand while browsing Wikipedia pages, or even other wikis, and immediately show the results for that particular page.

Phase 4: Data analysis

The collected data will be analyzed and graphed through the use of Java frameworks such as JUNG. This will allow us to build a map of the Wikipedia social network and draw conclusions about the way Wikipedia users relate to each other and coordinate their contributions. It would also be interesting to compare these results to those found in previous papers and in particular in the Wikipedia Social Network study: is the network of Wikipedia "talkers" similar to that of Wikipedia editors in general? Do users who talk a lot also edit a lot? How is the evolution of a Wikipedia article affected by the discussions on its talk page? Are there users who edit a lot without ever talking? Are there users who talk but don't edit? How do different classes of users discuss their editing? What is the role of administrators on discussion pages? What is the role of anonymous users? And so on...