NOSQL start

@garyshort

The Nosql philosophy

why begun,  really noREL

store only once, storage now cheap

in relationship

ACID
atomic  transaction happen it all happens – a gurantee
consistency – move from on to status to the next
isolation – change row, now one else can get to it.
durability – play over and over again

NoSQL  Johan Oskarsson   open source, non relational, distributed  no acid gurantee

NO:east conference Atlanta 2009

no fixed table schema  no rel   aviod join operation (costs)  scale horizontally

DOCUMENT STORAGE

RavenDB Apache Jackrabbit  CouchDB MongoDB  SimplDB  XML databases – marklogic server

GRAPH STORAGE

Allegro

node  edges

e.g. FlockDB   twitter use  node = users   edges  = relationships

KEY / VALUE STORES

on disk   cache  ram   concept of strong and weak  can be order  look at alphabetical order

OBJECT DATABASE

How to index  nosql databases???

key value pairs work

how index document database   depends on db

ravendb  example  (rob coder)  link expression (re-treve data prediction)

Thorny issue is indexing  but can be done

real word scenarios

constant consistency  goal
every point in time e.g. financial or medical records, or bonded goods e.g. warehouse of whisky best use rational model

horizontal scalability goal
number of geographic regions, vast quantities of data, game server sharding    could use NOSQL for this    can use cloud sql e.g. amazonrds

MOBILE stuff

rarely read  nosql  key value use

BIG DATA

weather   statellites maps    use nosql  hadoop

BINARY BABY
e.g. youtube, flickr,   S3   don’t use NOSQL

Transient data   short term data  here today gone tomorrow  e.g. shopping cart  memecache

DATA REPLICATION e.g music example, desktop, mobile etc  couchDB

HIGH AVAILABILITY  e.g. gambling, pay per view,   high number important   e.g. cassandra

TWITTER

challenges   many graphs to store   follow, followme, reach status online to text   update – remove  set arthmetic  e.g @mentions

tried  relational,  key index   blue whale
under the hood complicated, so need a simple solution    horizontal partition    arrive out of oder, process more than once

result   flockBD

stores graph

not optimoised for a graph traversal operations   factural time  non polynominal  n  mathematics
but limit to  follower  not whole graph at all time

Optimised for large adjaceny list   edges of the graph

Optimised for fast read write

Optimised for page  arthimatics

data partitioned by a node ie per person  all queries answer by a single partition

idempotent  applied multiple times without changing results  e.g someone follows you twice without getting an error

Commutative

Idempotency – mathematics   O  set S  S x O x = x

set union  AU B    set interact  A n  B

Commutative    ordering of doing sums    do immediate  dump and going through live and dump  happy to mix

Performance  13 billion edges  20k writes per seconds  100k reads second

Lessons learned  aggressive timeouts     same path for error and normal ops  ignore  just try and try until fail   Instrument

Punchline  –    mysql   sits below  flock

So is this the future?

Yes and no  Gary point of view

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.