@garyshort
The Nosql philosophy
why begun, really noREL
store only once, storage now cheap
in relationship
ACID
atomic transaction happen it all happens – a gurantee
consistency – move from on to status to the next
isolation – change row, now one else can get to it.
durability – play over and over again
NoSQL Johan Oskarsson open source, non relational, distributed no acid gurantee
NO:east conference Atlanta 2009
no fixed table schema no rel aviod join operation (costs) scale horizontally
DOCUMENT STORAGE
RavenDB Apache Jackrabbit CouchDB MongoDB SimplDB XML databases – marklogic server
GRAPH STORAGE
Allegro
node edges
e.g. FlockDB twitter use node = users edges = relationships
KEY / VALUE STORES
on disk cache ram concept of strong and weak can be order look at alphabetical order
OBJECT DATABASE
How to index nosql databases???
key value pairs work
how index document database depends on db
ravendb example (rob coder) link expression (re-treve data prediction)
Thorny issue is indexing but can be done
real word scenarios
constant consistency goal
every point in time e.g. financial or medical records, or bonded goods e.g. warehouse of whisky best use rational model
horizontal scalability goal
number of geographic regions, vast quantities of data, game server sharding could use NOSQL for this can use cloud sql e.g. amazonrds
MOBILE stuff
rarely read nosql key value use
BIG DATA
weather statellites maps use nosql hadoop
BINARY BABY
e.g. youtube, flickr, S3 don’t use NOSQL
Transient data short term data here today gone tomorrow e.g. shopping cart memecache
DATA REPLICATION e.g music example, desktop, mobile etc couchDB
HIGH AVAILABILITY e.g. gambling, pay per view, high number important e.g. cassandra
TWITTER
challenges many graphs to store follow, followme, reach status online to text update – remove set arthmetic e.g @mentions
tried relational, key index blue whale
under the hood complicated, so need a simple solution horizontal partition arrive out of oder, process more than once
result flockBD
stores graph
not optimoised for a graph traversal operations factural time non polynominal n mathematics
but limit to follower not whole graph at all time
Optimised for large adjaceny list edges of the graph
Optimised for fast read write
Optimised for page arthimatics
data partitioned by a node ie per person all queries answer by a single partition
idempotent applied multiple times without changing results e.g someone follows you twice without getting an error
Commutative
Idempotency – mathematics O set S S x O x = x
set union AU B set interact A n B
Commutative ordering of doing sums do immediate dump and going through live and dump happy to mix
Performance 13 billion edges 20k writes per seconds 100k reads second
Lessons learned aggressive timeouts same path for error and normal ops ignore just try and try until fail Instrument
Punchline - mysql sits below flock
So is this the future?
Yes and no Gary point of view