Difference between revisions of "Mining wikipedia categories"

From AIRWiki
Jump to: navigation, search
m
Line 3: Line 3:
 
|image=wikipedia_categories.png
 
|image=wikipedia_categories.png
 
|description=Wikipedia articles are organized in a hierarchy of categories, manually assigned by users. This process can be considered a huge effort for the collective categorization of human knowledge; the result is a wide and disordered graph which can provide precious information for a variety of applications (natural language processing, information retrieval, ontology building...).
 
|description=Wikipedia articles are organized in a hierarchy of categories, manually assigned by users. This process can be considered a huge effort for the collective categorization of human knowledge; the result is a wide and disordered graph which can provide precious information for a variety of applications (natural language processing, information retrieval, ontology building...).
Aim of the project is the development of a tool for the visualization of this graph. The project can be extended to a thesis in various directions, as the development of advanced visualization features or the creation and population of an ontology.
+
In the project "Wikipedia Category Map" a tool has been developed to extract the graph of Wikipedia categories, to store it in RDF format and to interactively visualize and explore it.  
 +
Aim of this project is to analyze the resulting graph for the extraction of semantic relationships; for example it is possible to define metrics of distance between topics in the graph, which can be useful for various purposes in information retrieval.
 
|tutor=DavidLaniado;RiccardoTasso;MarcoColombetti
 
|tutor=DavidLaniado;RiccardoTasso;MarcoColombetti
 
|cfumin=5
 
|cfumin=5
Line 10: Line 11:
 
|studmax=2
 
|studmax=2
 
|resarea=Social Software and Semantic Web
 
|resarea=Social Software and Semantic Web
|restopic=Wiki Analysis
+
|restopic=Semantic Tagging;Semantic Search
 
|level=Bs;Ms
 
|level=Bs;Ms
 
|type=Course;Thesis
 
|type=Course;Thesis
Line 18: Line 19:
  
 
Wikipedia articles are organized in a hierarchy of categories, manually assigned by users. This process can be considered a huge effort for the collective categorization of human knowledge; the result is a wide and disordered graph which can provide precious information for a variety of applications (natural language processing, information retrieval, ontology building...).
 
Wikipedia articles are organized in a hierarchy of categories, manually assigned by users. This process can be considered a huge effort for the collective categorization of human knowledge; the result is a wide and disordered graph which can provide precious information for a variety of applications (natural language processing, information retrieval, ontology building...).
Aim of the project is the development of a tool for the visualization of this graph. The project can be extended to a thesis in various directions, as the development of advanced visualization features or the creation and population of an ontology.
+
In [[Wikipedia Category Map]] a tool has been developed to extract the graph of Wikipedia categories, to store it in RDF format and to interactively visualize and explore it.
 +
Aim of this project is to analyze the resulting graph for the extraction of semantic relationships; for example it is possible to define metrics of distance between topics in the graph, which can be useful for various purposes in information retrieval.
  
 
;Tools and instruments
 
;Tools and instruments
:the software can be implemented in any programming language; we have already developed a java prototype that queries the wikipedia APIs, which can be used as a starting point.
+
:the software can be implemented in any programming language.
 +
 
 +
;Related projects
 +
:[[Wikipedia Category Map]]

Revision as of 13:32, 21 September 2009

Title: Wikipedia category map
Wikipedia categories.png

Image:wikipedia_categories.png

Description: Wikipedia articles are organized in a hierarchy of categories, manually assigned by users. This process can be considered a huge effort for the collective categorization of human knowledge; the result is a wide and disordered graph which can provide precious information for a variety of applications (natural language processing, information retrieval, ontology building...).

In the project "Wikipedia Category Map" a tool has been developed to extract the graph of Wikipedia categories, to store it in RDF format and to interactively visualize and explore it. Aim of this project is to analyze the resulting graph for the extraction of semantic relationships; for example it is possible to define metrics of distance between topics in the graph, which can be useful for various purposes in information retrieval.

Tutor: DavidLaniado (david.laniado@gmail.com), RiccardoTasso (tasso@elet.polimi.it), MarcoColombetti (colombet@elet.polimi.it)
Start: Nowwarning.pngThe date "Now" was not understood.
Students: 1 - 2
CFU: 5 - 20
Research Area: Social Software and Semantic Web
Research Topic: Semantic Tagging, Semantic Search
Level: Bs, Ms
Type: Course, Thesis
Status: Proposalwarning.png"Proposal" is not in the list of possible values (Active, Closed) for this property.


Wikipedia articles are organized in a hierarchy of categories, manually assigned by users. This process can be considered a huge effort for the collective categorization of human knowledge; the result is a wide and disordered graph which can provide precious information for a variety of applications (natural language processing, information retrieval, ontology building...). In Wikipedia Category Map a tool has been developed to extract the graph of Wikipedia categories, to store it in RDF format and to interactively visualize and explore it. Aim of this project is to analyze the resulting graph for the extraction of semantic relationships; for example it is possible to define metrics of distance between topics in the graph, which can be useful for various purposes in information retrieval.

Tools and instruments
the software can be implemented in any programming language.
Related projects
Wikipedia Category Map