this post was submitted on 05 Apr 2024

89 points (95.9% liked)

Programming

18069 readers

134 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi

How we’ve saved 98% in cloud costs by writing our own database (hivekit.io)

submitted 10 months ago by testeronious@lemmy.world to c/programming@programming.dev

25 comments fedilink hide all child comments

top 25 comments

sorted by: hot top controversial new old

[–] fubarx@lemmy.ml 61 points 10 months ago

Anyone tempted to write a custom database should ask if they're prepared to be in the database maintenance business.

Because that's where they're heading, for a really long time.

[–] tsonfeir@lemm.ee 46 points 10 months ago* (last edited 10 months ago) (3 children)

we’ve replaced our $10k/month Aurora instances with a $200/month Elastic Block Storage (EBS) volume.

Lol, i’m sure just refactoring it and using a different service could have lowered that price significantly.

The description of their data says “relational database” to me, but don’t murder me for using such an unpopular phrase!

[–] paysrenttobirds@sh.itjust.works 16 points 10 months ago (1 children)

I had the same thought. Like, I think Aurora is one of the most expensive ways to do this in AWS. But, since this particular set of data is so well-defined, and unlikely to change, roll your own is maybe not crazy. The transactions per second and size don't seem that huge to me, so as things grow I imagine they can revisit this.

[–] otl@apubtest2.srcbeat.com 14 points 10 months ago

But, since this particular set of data is so well-defined, and unlikely to change, roll your own is maybe not crazy.

I think that's the trick here. A relational database lets you do a whole bunch of complex operations on all sorts of data. That flexibility doesn't come for free - financially nor performance-wise! Given:

engineering chops
a firm idea of the type of data
a firm idea of the possible operations you may want to do with that data

then there's a whole range of different approaches to take. The "just use Postgresql" guideline makes sense for most CRUD web systems out there. And there are architecture astronauts who will push stuff because they can, not because they should.

Every now and then it's nice to think about what exactly is needed and be able to build that. That's engineering after all!

[–] Turun@feddit.de 15 points 10 months ago (2 children)

They never would have been able to get the same performance from any solution that incorporates a general purpose database.

Their requirements/explicitly-not-required-ments include that it's fine to drop 1s of data. That would be an insane proposition for any other database. Also their read/write rates and latency requirements are unusual to say the least.

It's the same thing as tiger beetle. Ridiculously narrow domains allow for ridiculous performance improvements compared to of-the-shelf solutions.

[–] tyler@programming.dev 6 points 10 months ago (1 children)

What is tiger beetle?

[–] Turun@feddit.de 4 points 10 months ago

A new database specifically designed for financial transactions.

I'm not an expert on finance software, so I can't critically assert how good they really are. But they claim much much higher throughput than traditional databases, higher fault tolerance, self healing networks if several replicas are running, etc.
From a purely technical standpoint it's interesting for being written in zig. Because the database scope is so narrow they know exactly how much memory they will need on startup and just allocate all required memory on startup and never allocate more, nor free the aquired memory.

[–] fruitycoder@sh.itjust.works 5 points 10 months ago

I'm really excited to see what forks of tiger Beatle for other domains look like. They, supposedly, built it to be able to modify the state machine to other data schemes, but that code mostly just made realize I had no idea what I was looking at.

As soon as someone makes a KV on it, I'm tried to have be my ETCD database

[–] greyhathero@lemmy.world 0 points 10 months ago (1 children)

Aurora is a relational database

[–] tsonfeir@lemm.ee 1 points 10 months ago

Cool

[–] ChubakPDP11@programming.dev 23 points 10 months ago* (last edited 10 months ago) (1 children)

For anyone seeking to write their own database, I have one recommendation: the Tokyo Cabinet Library.

Tokyo Cabinet abstracts away all the needs of writing your own serializers and deserilizers for binary formats. You can have hashtable databases , B+ trees and everything else all prepared for you under one roof.

Of course that is if you have brain, and don't use a text storage format like JSON. If you use shit like JSON and YAML

and add potentially hundreds of millisceonds of parsing time just to serialize data from text into machine-readable binary, then please submit your name and address to my council so we can get rid of you when we own the world.

Cleansing of the undesirables aside (seriously, give me ONE good thing about text storage formats! They are EXCHANGE formats, not STORAGE formats!), Tokyo Cabinet is written in C so you can easily bind it with SWIG. But there's probably bindings around if you look.

[–] Tramort@programming.dev 3 points 10 months ago

Never heard of this. Thanks for mentioning it!

[–] cyclohexane@lemmy.ml 17 points 10 months ago* (last edited 10 months ago)

Comparing cost to AWS Aurora is unfair. Give us the self host price, and compare to that.

Also, they should have tried Scylla or Cassandra. It's very scalable and handles a lot of writes.

[–] Yewb@lemmy.world 13 points 10 months ago (1 children)

Why does everyone on the planet hate sql nowadays?

If your not dealing with billions of transactions why not?

[–] InternetCitizen2@lemmy.world 5 points 10 months ago

Maybe it reminds them of Oracle.

[–] some_guy@lemmy.sdf.org 7 points 10 months ago (1 children)

What has changed though, is that we’ve replaced our $10k/month Aurora instances with a $200/month Elastic Block Storage (EBS) volume.

Holy shit, I hope whoever wrote this gets a fat bonus at the end of the year. That's a truly astounding savings.

[–] dohpaz42@lemmy.world 34 points 10 months ago (3 children)

What they don’t say is how much in developer time they’ve spent rolling their own database. Then there’s also maintenance and new features.

[–] shnizmuffin@lemmy.inbutts.lol 18 points 10 months ago

It's called job security, bro.

[–] FalseMyrmidon@kbin.run 17 points 10 months ago

I'm sure their custom database will be easy to find people to support and maintain

[–] FlaminGoku@reddthat.com 2 points 10 months ago

It seems like features would be mainly additional key/values like temperature, humidity, etc. This wouldn't really change the underlying infrastructure greatly but still give good enhancements to certain customers.

[–] keccsx@programming.dev 4 points 10 months ago (1 children)

Aren't time series databases like Prometheus pretty good at storing this kind of data?

[–] nik9000@programming.dev 2 points 10 months ago (1 children)

I don't believe Prometheus supports geospatial data. Two minutes of googling though, so I could be wrong.

[–] keccsx@programming.dev 1 points 10 months ago (1 children)

They're just storing doubles in their own format too. I'm not sure if they even need any spatial lookups on the data, they didn't mention anything about that in the article. Maybe they do that in-memory?

[–] nik9000@programming.dev 1 points 10 months ago

I imagine a delta encoding scheme similar to what the time series DBs use would work well for the geo points. Maybe even a delta-of-delta encoding for things like ships which move very consistently.

It's probably not worth it given how small they've already go their data. But it is fun.

[–] FUBAR@lemm.ee 3 points 10 months ago

If you’re using Java you can use eclipsestore. Seems like a good project