Assignment10DelanyTamabayeva
Last modified by Hal Eden on 2010/08/20 11:06
Assignment10DelanyTamabayeva
To Do
Be prepared to give a 5-10 minute presentation about your project in class so the other students and the instructors understand your ideas and can see the progress you have made
Please post in the Wiki the following things about your project:
- title
- abstract
- team members and their responsibilities (e.g: how have you split up your work?)
- why is the project of interest to you?
- description of what you have done so far
- plan what you will do in the remaining time till the final projects reports are due
- relationship of your work to the major themes presented and discussed in the class
- references identified
Group response
- 1. Members of the Group
- Daniel Delany and Diana Tamabayeva
- 2. Question 1
- Title: Freebase and the Collaborative Semantic Web Abstract: The purpose of this project is to consider and experiment with using semantic networks as a data structure for organizing very large, complex databases. Currently, large amounts of related data are stored in relational databases, tables of information which relate to one another through matching key columns. For example, a grocery store's computer system may store a log of the day's sales in one table, with a column for "item ID," a unique identifier which can be used to look up that item's description and price in another table. This works well for moderately-sized, well-structured databases that consist of information on a single topic. However, on the massive scale of the Internet, this organization system breaks down, and user access to the data must rely on a powerful text-matching search engine. There is speculation among many computer scientists that a solution may lie in the semantic network data structure: instead of representing each piece of information as a row in a table, a semantic network treats them as nodes, each with their own types and properties, and each with defined relationships to one another. These relationships are not strictly data, but metadata, information about the information which is directly searchable by the user. On a large scale, this allows users to search for information based on things they know about it, and relationships it has to other information, rather than a phrase or set of words they think may be in a document. This new way of thinking could have drastic effects on the way people interact with information on the Internet. In addition to making search easier, semantic networks may replace relational databases in areas like social networks, where semantic structures can store rich information about relationships with those in your "social graph." Tim Berners-Lee, the computer scientist widely credited with the invention of the World Wide Web, imagines a reinvention of the structure of the Internet which he calls the "Giant Global Graph." In this scenario, the web is not made of documents and table rows, but of nodes of information with relationships to one another. This layer of abstraction renders obsolete the idea of multiple friends on multiple social networks; instead, the user has a "canonical" User node in the GGG, which has certain properties (gender, name, age, etc.) and is related to other user nodes in certain ways (friendship, acquantaince, family, etc.) The User node is also related to many other non-user nodes, such as his place of work, his grocery list, and events on his schedule. While there may be privacy issues that need to be addressed, this idea of knitting information from the social graph and the user's factual life together through structured yet semantically meaningful relationships has vast implications for the way we interact with computers and each other. This idea of a "canonical" or "factual" semantic network has spawned an open source community project called Freebase (freebase.com), an attempt to build an open database of the world's information. However, unlike Wikipedia, which has a similar goal, Freebase stores its data as semantically related "topic" nodes which each have multiple types. We have many questions about the feasibility, usefulness and trustworthiness of such a large-scale semantic web, and plan to use Freebase as a testing ground to answer these questions. Freebase provides developers with a full-featured API to read from and write to the Freebase database, and we'd like use it to explore how this paradigm affects information consumers and publishers alike. Obviously, Freebase's database is currently very small, but we'd like to determine the conditions under which these data structures are most useful, and compare the collaborative effort behind Freebase to similar projects, both open source and otherwise, such as Wikipedia and IMDB.
- 3. Question 2
- Team Members and Responsibilities:
Daniel Delany:
- Research on Freebase structure
- Freebase API - how easy is it for developers to access graph data?
- Freebase contribution - How do content publishers publish to the graph?
- Most writing, including abstract
- Research on "Giant Global Graph" concept
- Research on general semantic web, other examples for comparison
- Researching credible references and scientific papers on the topic
- Comparisons and contrasts with other large databases
- Creation of PowerPoint presentations
- 4. Question 3
- References: "Freebase: an open, shared database of the world's knowledge" Official Freebase Website "Freebase Parallax: A New Way to Browse or Explore Data" Video presentation of Freebase "Freebase Parallax a promising search tool" by Martin Heller, August 23, 2008 "Giant Global Graph" by Tim Berners-Lee, Nov 21, 2007 "Bye bye World Wide Web, welcome Giant Global Graph" by Tim Leberecht, Dec 2, 2007 "The Semantic Web: An Introduction" "State of the Semantic Web" "W3C Semantic Web Frequently Asked Questions"