Posts Tagged ‘big data’

New Graph Database: agamemnon

08 Aug

agamemnon is a Python-based graph database built on pycassa, the Python client library for Apache Cassandra. In short, it enables you to use Cassandra as a graph database. The API is inspired by the Python wrapper for Neo4j,


From our earlier post on graph databases:

Graph databases apply graph theory to the storage of information about the relationships between entries. The relationships between people in social networks is the most obvious example. The relationships between items and attributes in recommendation engines is another. Yes, it has been noted by many that it's ironic that relational databases aren't good for storing relationship data. Adam Wiggins from Heroku has a lucid explanation of why that is here. Short version: among other things, relationship queries in RDBSes can be complex, slow and unpredictable. Since graph databases are designed for this sort of thing, the queries are more reliable.

Neo4j is an open source, Java based database sponsored by Neo Technologies and is one of the most popular graph databases.

Cassandra is a key-value store database inspired by both's Dynamo and Google's BigTable. It was created at Facebook and is now sponsored by DataStax.



Infochimps Launches Even More API Calls

13 Mar

Right now, big data is restricted to 1.) the companies that can afford Oracle, and 2.) the companies that can leverage Hadoop and Cassandra, HBase or other NoSQL alternatives. These tools are robust and will always be necessary. They also take considerable amounts of time and knowledge to deploy.

Our mission at Infochimps is to democratize the world’s access to data. The best way to do this is to host useful data in one place so that people can share it. By collectively offsetting the hosting costs, a lot of people can access useful information without the pains of scraping and hosting it.

We have launched more data API calls on our website and intend to launch hundreds more in the next few weeks. Our data API allows you to query databases like our Twitter conversations database, which is over half a terabyte in size. This is not something you can comfortably do with MySQL, and we are giving you access to it for free. Using your Infochimps API key, you can access this data within seconds. I don’t even have MySQL installed on my computer and our data team has given me the power to find and understand data that only “big data” companies have the resources to access. It is truly inspiring.

Here are just a few of the types of data you can query with no prior knowledge of non-relational databases:
* Twitter People Search: Tired of poking around on Twitter forever just to find cool people to follow? Think of a subject you like and query this data set for it. It helps you find like-minded people on Twitter.
* The 100 million word British National Corpus: a representative sample of spoken and written British English in the late 20th century. This is incredibly useful for linguistics and language processing.
* Qwerly: Query a person’s social media handle and find all of their corresponding social media presences online. It helps you get a stronger sense of who a person is.
* IP to Demographic: Be smarter about the people who visit your website. Find the demographics of your visitors based on their IP addresses.
* Wikipedia Articles Abstract Search: Look up a term in Wikipedia and get general descriptions that contain that word. This helps people or machines instantly understand something.

We are still in beta with our new API calls, but you take a look at some of them yourself. We have bookmarked them with the tag “AwesomeAPIs”. They are here:

It doesn’t matter if you’ve ever written a MySQL query in your life. Find a data set you like and query it using our new API Explorer located on each data set that is accessible via our API. You’ll think it’s cool. We promise.

We are still in beta, so feel free to email me direct at michelle(at) should you have any questions or issues. Oh, and sign up for an API key. This will put you in the loop for when we launch more.