Assignment10DelanyTamabayeva

Last modified by Hal Eden on 2010/08/20 11:06

Comments (2) · Attachments (0) · History · Information

To Do

Be prepared to give a 5-10 minute presentation about your project in class so the other students and the instructors understand your ideas and can see the progress you have made

Please post in the Wiki the following things about your project:

title
abstract
team members and their responsibilities (e.g: how have you split up your work?)
why is the project of interest to you?
description of what you have done so far
plan what you will do in the remaining time till the final projects reports are due
relationship of your work to the major themes presented and discussed in the class
references identified

Group response

1. Members of the Group

Daniel Delany and Diana Tamabayeva

2. Question 1

Title: Freebase and the Collaborative Semantic Web

Abstract:

The purpose of this project is to consider and experiment with using semantic networks as a data structure for organizing very large, complex databases. Currently, large amounts of related data are stored in relational databases, tables of information which relate to one another through matching key columns. For example, a grocery store's computer system may store a log of the day's sales in one table, with a column for "item ID," a unique identifier which can be used to look up that item's description and price in another table. This works well for moderately-sized, well-structured databases that consist of information on a single topic. However, on the massive scale of the Internet, this organization system breaks down, and user access to the data must rely on a powerful text-matching search engine. There is speculation among many computer scientists that a solution may lie in the semantic network data structure: instead of representing each piece of information as a row in a table, a semantic network treats them as nodes, each with their own types and properties, and each with defined relationships to one another. These relationships are not strictly data, but metadata, information about the information which is directly searchable by the user. On a large scale, this allows users to search for information based on things they know about it, and relationships it has to other information, rather than a phrase or set of words they think may be in a document.

This new way of thinking could have drastic effects on the way people interact with information on the Internet. In addition to making search easier, semantic networks may replace relational databases in areas like social networks, where semantic structures can store rich information about relationships with those in your "social graph." Tim Berners-Lee, the computer scientist widely credited with the invention of the World Wide Web, imagines a reinvention of the structure of the Internet which he calls the "Giant Global Graph." In this scenario, the web is not made of documents and table rows, but of nodes of information with relationships to one another. This layer of abstraction renders obsolete the idea of multiple friends on multiple social networks; instead, the user has a "canonical" User node in the GGG, which has certain properties (gender, name, age, etc.) and is related to other user nodes in certain ways (friendship, acquantaince, family, etc.) The User node is also related to many other non-user nodes, such as his place of work, his grocery list, and events on his schedule. While there may be privacy issues that need to be addressed, this idea of knitting information from the social graph and the user's factual life together through structured yet semantically meaningful relationships has vast implications for the way we interact with computers and each other.

This idea of a "canonical" or "factual" semantic network has spawned an open source community project called Freebase (freebase.com), an attempt to build an open database of the world's information. However, unlike Wikipedia, which has a similar goal, Freebase stores its data as semantically related "topic" nodes which each have multiple types. We have many questions about the feasibility, usefulness and trustworthiness of such a large-scale semantic web, and plan to use Freebase as a testing ground to answer these questions. Freebase provides developers with a full-featured API to read from and write to the Freebase database, and we'd like use it to explore how this paradigm affects information consumers and publishers alike. Obviously, Freebase's database is currently very small, but we'd like to determine the conditions under which these data structures are most useful, and compare the collaborative effort behind Freebase to similar projects, both open source and otherwise, such as Wikipedia and IMDB.

3. Question 2

Team Members and Responsibilities: Daniel Delany:

Research on Freebase structure
Freebase API - how easy is it for developers to access graph data?
Freebase contribution - How do content publishers publish to the graph?
Most writing, including abstract
Research on "Giant Global Graph" concept

Diana Tamabayeva:

Research on general semantic web, other examples for comparison
Researching credible references and scientific papers on the topic
Comparisons and contrasts with other large databases
Creation of PowerPoint presentations

Project Interest: Currently, the world wide web treats all linked information as "documents" with no particular semantic meaning. We believe that the advent of the "collaborative semantic web," that is, an Internet which treats linked objects as pieces of information with semantic meaning and structure, and which allows its users to edit and create these objects, is a plausible and interesting future scenario. We believe the web is becoming unmanageably and unsearchably large, and that the concept of not only storing data but structured semantic relational metadata about that data and its relationships has the potential to revolutionize information creation, storage, and access.

Work Done So Far: Currently, most of the work done has been in researching existing ideas on this subject, in scientific papers and blogs alike, as well as systems which embody these concepts. The project will be centered around Freebase, an open source collaborative graph database, and as such, we have signed up as collaborators on this site and begun to explore the relationships and data which already exist there, the tools for publication and editing, and the limitations of this system. We've also begun looking at the Freebase Developer API and existing community-made applications to explore how this data can be accessed via a standardized interface. Finally, we've started researching other similar systems like Wikipedia and IMDB and comparing and contrasting them to "Web 3.0" systems like Freebase to try to piece together a coherent portrait of the future potential these databases have.

Plan for Future Work:

4. Question 3