Difference between revisions of "Wikipedia Category Graph"
JacopoFarina (Talk | contribs) (start to write) |
JacopoFarina (Talk | contribs) m (→Creation of the database) |
||
Line 20: | Line 20: | ||
==Creation of the database== | ==Creation of the database== | ||
Wikipedia let users download the entire site database (with all versions of all articles) or just some selections of it. | Wikipedia let users download the entire site database (with all versions of all articles) or just some selections of it. | ||
− | We use a selection wich contains the category list and articles memberships in them. | + | We use a selection wich contains the category list and articles memberships in them. |
+ | |||
+ | |||
[http://neo4j.org/ Neo4j] is a graph-based database, which allow a program to create and manipulate graph structures like nodes and relationships. | [http://neo4j.org/ Neo4j] is a graph-based database, which allow a program to create and manipulate graph structures like nodes and relationships. |
Revision as of 11:43, 16 June 2010
Wikipedia Category Graph
| |
Short Description: | Represent Wikipedia Categories with a model based on graphs to further analyze it. |
Coordinator: | MarcoColombetti (colombet@elet.polimi.it) |
Tutor: | DavidLaniado (david.laniado@gmail.com), RiccardoTasso (tasso@elet.polimi.it) |
Collaborator: | |
Students: | JacopoFarina (jacopo1.farina@mail.polimi.it) |
Research Area: | Social Software and Semantic Web |
Research Topic: | Graph Mining and Analysis |
Start: | 2010/06/10 |
End: | 2010/10/01 |
Status: | Active |
Level: | Bs |
Type: | Course |
The goal of the project is to analyze Wikipedia categories by representing them in a graph based database.
Wikipedia categories are not a three-based structure: a category may be contained in another one which is contained in another one which is contained in the first, generating a cyclic reference and many categories may be a root category (non contained in others).
For this reasons a graph database is better to represent the structure.
Creation of the database
Wikipedia let users download the entire site database (with all versions of all articles) or just some selections of it. We use a selection wich contains the category list and articles memberships in them.
Neo4j is a graph-based database, which allow a program to create and manipulate graph structures like nodes and relationships.