Gora - Amazon DynamoDB datastore for Gora
Renato Marroquin
Short description: Provide a gora-amazondynamodb module for Gora in order to offer a popular datastore for the community. In this way, the main objective is to embrace the use of Gora within the open source community.
Project title:
Gora-DynamoDB
Project objectives:
-
Research and compare Amazon DynamoDB data model and functionality against other Dynamo-like NoSQL solutions e.g. Cassandra, Riak to determine how data model influences Gora's internals.
-
Analyze existing gora-datastore implementations E.g. gora-cassandra, gora-hbase, gora-sql and gora-accumulo compare both of the above to determine when the gora-dynamodb data store should be used as an alternative to other.
-
Review the Amazon SDK[1] and identify areas of key importance e.g. Request's authentication process, response time (number of retries for writing or reading)[5], cost of allocated resources, among others.
-
Document the conclusions in the form of the new Java coded gora-dynamodb datastore which will be attached to the Gora wiki and actively reviewed throughout the project.
-
Begin creating a component to decide which gora-datastore use according to system's capabilities, or user configurations (i.e. If we know the user has an amazon account, maybe he could try Gora-Dynamodb datastore, or if the user's computer is a low-end machine, suggest them to use amazon as an alternative?).
-
Integrate Gora-DynamoDB datastore with other cloud computing projects like Whirr[2] or Jclouds[3] in order to offer the ability of testing database scalability.
One caveat here is that DynamoDB does not support asynchronous writing. This will probably become a problem while writing thousands of thousands records. We will have to design a data model good enough to overcome this limitation (maybe creating different DynamoDB tables for each group of related websites?) and study how this problem relates to the use of Amazon EMR[4].
Benefit to Gora:
-
Use a popular NoSQL solution in order to gain more attention from the community.
-
Explore a new Dynamo-like database which will lead to an easier inclusion of other popular data stores e.g. Riak, Voldemort, etc.
-
Easier deployment of Gora using cloud computing resources by leveraging other open source projects (whirr, jclouds, or other).
Success criteria:
-
Perform all tasks described above. Gora-DynamoDB should be able to read and write Gora's results from/into Amazon DynamoDB with acceptable overall performance (Of course this will depend on how much money our Amazon accounts have).
-
Implement features described while analyzing pros and cons of using a close source NoSQL databse for Gora.
-
Write tests for Gora-DynamoDB data store to ensure functionality.
Deliverables:
-
Gora-DynamoDB datasource which will make Gora interoperate with Amazon cloud computing resources.
Project schedule:
-
I can begin working in Gora-DynamoDB as soon as I am accepted as a GSoC participant. For what I have looked in[1] the source code is not so bizarre. Maybe the first part of the project I can write all the unit tests for the specific datastore, and then, continue working on its development.
Why Gora?
-
I am a BigData enthusiastic and I am trying to apply my knowledge about distributed databases to information retrieval field.
-
I see a huge opportunity in making Gora the default data access layer for other NoSQL data sources, and a great chance for learning more about two fields which I am passionate about.
[1]http://aws.amazon.com/sdkforjava/
[4]http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.html
