https://airwiki.elet.polimi.it/api.php?action=feedcontributions&user=FabioColzada&feedformat=atomAIRWiki - User contributions [en]2024-03-29T13:12:28ZUser contributionsMediaWiki 1.25.6https://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=15019Emergent Semantics in Wikipedia2012-05-19T17:35:56Z<p>FabioColzada: </p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
|image=grafo_WIKI.gif<br />
}}<br />
<br />
<br />
<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
*[[Media:Thesis_Colzada_Di_Vitto_Italian.pdf|Thesis (Italian)]]<br />
*[[Media:Slides_thesis_Colzada_Di_Vitto_IT.pdf|Thesis overview (Italian)]]<br />
<br />
==Related works==<br />
*[http://airlab.elet.polimi.it/index.php/Wikipedia_Folksonomy Project: Wikipedia Folksonomy]<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14153Emergent Semantics in Wikipedia2011-10-23T18:39:06Z<p>FabioColzada: </p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
|image=grafo_WIKI.gif<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
*[[Media:Thesis_Colzada_Di_Vitto_Italian.pdf|Thesis (Italian)]]<br />
*[[Media:Slides_thesis_Colzada_Di_Vitto_IT.pdf|Thesis overview (Italian)]]<br />
<br />
==Related works==<br />
*[http://airlab.elet.polimi.it/index.php/Wikipedia_Folksonomy Project: Wikipedia Folksonomy]<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=File:Grafo_WIKI.gif&diff=14152File:Grafo WIKI.gif2011-10-23T18:35:08Z<p>FabioColzada: </p>
<hr />
<div></div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14151Emergent Semantics in Wikipedia2011-10-23T17:50:03Z<p>FabioColzada: /* Related works */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
*[[Media:Thesis_Colzada_Di_Vitto_Italian.pdf|Thesis (Italian)]]<br />
*[[Media:Slides_thesis_Colzada_Di_Vitto_IT.pdf|Thesis overview (Italian)]]<br />
<br />
==Related works==<br />
*[http://airlab.elet.polimi.it/index.php/Wikipedia_Folksonomy Project: Wikipedia Folksonomy]<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14150Emergent Semantics in Wikipedia2011-10-23T17:42:27Z<p>FabioColzada: /* Get materials */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
*[[Media:Thesis_Colzada_Di_Vitto_Italian.pdf|Thesis (Italian)]]<br />
*[[Media:Slides_thesis_Colzada_Di_Vitto_IT.pdf|Thesis overview (Italian)]]<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14149Emergent Semantics in Wikipedia2011-10-23T17:41:41Z<p>FabioColzada: /* Get materials */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
- [[Media:Thesis_Colzada_Di_Vitto_Italian.pdf|Thesis (Italian)]]<br />
- [[Media:Slides_thesis_Colzada_Di_Vitto_IT.pdf|Thesis overview (Italian)]]<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=File:Slides_thesis_Colzada_Di_Vitto_IT.pdf&diff=14148File:Slides thesis Colzada Di Vitto IT.pdf2011-10-23T17:40:40Z<p>FabioColzada: </p>
<hr />
<div></div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14147Emergent Semantics in Wikipedia2011-10-23T17:39:13Z<p>FabioColzada: /* Get materials */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
- [[Media:Thesis_Colzada_Di_Vitto_Italian.pdf|Thesis (Italian)]]<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=File:Thesis_Colzada_Di_Vitto_Italian.pdf&diff=14146File:Thesis Colzada Di Vitto Italian.pdf2011-10-23T17:10:27Z<p>FabioColzada: </p>
<hr />
<div></div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14144Emergent Semantics in Wikipedia2011-10-23T17:00:26Z<p>FabioColzada: /* Build a network of pages */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center|Bipartite network structure]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=14143Emergent Semantics in Wikipedia2011-10-23T16:59:06Z<p>FabioColzada: /* Build a network of pages */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
[[File:Bipartite_network.png|500px|thumb|center]]<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=File:Bipartite_network.png&diff=14142File:Bipartite network.png2011-10-23T16:57:28Z<p>FabioColzada: </p>
<hr />
<div></div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=13895Emergent Semantics in Wikipedia2011-10-06T11:36:10Z<p>FabioColzada: /* Community detection in the Wikipedia graph */</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=13485Emergent Semantics in Wikipedia2011-08-18T13:40:10Z<p>FabioColzada: </p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
<br />
----<br />
EDIT IN PROGRESS<br />
----<br />
<br />
The aim of this project was to develop a social-measure for the similarity between two Wikipedia's pages, according to the quality and the quantity of the contributions generated by shared users. By building a bipartite network of pages and users it is possible to obtain from it a one-mode network, where the nodes are the pages itself.<br />
<br />
==Build a network of pages==<br />
The network used in our analysis comes from the interaction between the users and the encyclopedia, in such a way the object of our analysis is dynamically created in a distributed system of contributions.<br />
We adopted the EditLongevity measure to estimate the quality (and the quantity) of an edit on a page generated by a certain user, this metric is directly used to weight the edges in the first bipartite network created between users and pages, where two pages are connected by the edits of a shared contributor.<br />
<br />
To obtain a one-mode network from a two-mode network the cosine similarity measure has been chosen to calculate the correct weight of a direct edge between two pages. In this way not only the relative weight of the two user-page connections is considered, but also a comparison with the size of these pages.<br />
<br />
==Community detection in the Wikipedia graph==<br />
By applying to the network an algorithm for fast community detection in a graph, we got groups of Wikipedia's articles. Actually we analyzed the results obtained by two different algorithm, both based on modularity optimization: the Fastgreedy algorithm (with a workaround to solve the resolution limit issue) and the Louvain method. Despite the similar results, we lead the final final analysis on the set of data obtained by the Louvain method, considered more reliable than the Fastgreedy algorithm.<br />
<br />
==Results==<br />
To both restrict the dataset and keep the most meaningful information we set up many thresholds during the process. This lead to a graph with many small groups of pages (often containing just 2 or 3 articles) and two main components: one containing only pages about geography, and the biggest one, more interesting for our purpose, containing elements from various semantic areas.<br />
After the community detection we lead a further analysis on the internal semantics of the biggest communities of pages. Taking the structure of Wikipedia's categories, and forcing its representation as a tree, we calculated how strongly a cluster is binded to a limited set of related-meaning categories. We stated that almost all of the analyzed groups have elements belonging to the same semantic area, in such a way it is reasonable to think it is possible to extract an emergent semantics from such a network.<br />
Particulary interesting is the presence of a single very heterogeneous and big group of pages, not covered by any restricted set of categories. Some hypothesis have been made about this groups, underlining that the elements of this community are usually big pages often considered of general interest.<br />
<br />
==Get materials==<br />
<br />
==Related works==<br />
<br />
==Related guides==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=13442Emergent Semantics in Wikipedia2011-07-28T21:23:22Z<p>FabioColzada: </p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social similarity" metrics and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
EDIT IN PROGRESS<br />
<br />
==Build a network of pages==<br />
<br />
==Community detection in the Wikipedia graph==<br />
<br />
==Results==<br />
<br />
==Get materials==<br />
<br />
==Related works==<br />
<br />
==Related giudes==</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=13441Emergent Semantics in Wikipedia2011-07-28T21:11:29Z<p>FabioColzada: </p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social measure" and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}<br />
<br />
EDIT IN PROGRESS</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=13440Emergent Semantics in Wikipedia2011-07-28T21:10:15Z<p>FabioColzada: </p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social measure" and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/16<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}</div>FabioColzadahttps://airwiki.elet.polimi.it/index.php?title=Emergent_Semantics_in_Wikipedia&diff=13439Emergent Semantics in Wikipedia2011-07-28T21:07:50Z<p>FabioColzada: New page: {{Project |title=Emergent Semantics in Wikipedia |short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social measure" and detect an emergent...</p>
<hr />
<div>{{Project<br />
|title=Emergent Semantics in Wikipedia<br />
|short_descr=Represent Wikipedia's pages as a graph with connections' weight realized by means of a "social measure" and detect an emergent semantics in it.<br />
|coordinator=MarcoColombetti<br />
|tutor=DavidLaniado;RiccardoTasso<br />
|students=FabioColzada;MattiaDiVitto<br />
|resarea=Social Software and Semantic Web<br />
|restopic=Graph Mining and Analysis; Semantic Tagging;<br />
|start=2011/02/20<br />
|end=2011/07/01<br />
|status=Closed<br />
|level=Bs<br />
|type=Thesis<br />
}}</div>FabioColzada