When I grow up I want to be a Database Administrator (said no-one ever)
Instead of just sharing the slides from the keynote talk that I gave at
DjangoCon Europe 2024
in Vigo this month, I'm trying a new approach, inspired by
Simon Willison's annotated presentations.
This post contains the slides together with the transcript, links etc. so it should be much more useful.
You can watch the video here.
This was my third time speaking at DjangoCon Europe, and my first ever keynote. I hope to be able to attend many more because,
as I said at the conference, spending time with the Django community is way up there on my list of favourite things.
I was so excited to be asked to speak to a room full of Django developers about my all-time favourite topic - databases in general, and PostgreSQL in particular.
The talk is aimed at developers who are somewhat forced into looking after databases, whether they like it or not.
It looks at some of the things that a developer actually needs to know about databases in their role as a developer
and tries to encourage them to love databases just a little bit.
I live in a small village in the French Alps, and when I'm not playing with or talking about databases
I love getting out on one of my bikes - road, gravel or mountain bike depending on how I’m feeling.
As well as being a Senior Solutions Architect, which I came to from a DBA (database administrator)
background, I’m on the PostgreSQL Europe Board of directors, I’m leading the (very) recently
established PostgreSQL Europe diversity task force,
and I write and present talks (about databases) at PostgreSQL and Developer conferences.
When asked “What do you want to be when you grow up?”
I doubt any kid ever said “A database administrator”.
In fact, I asked the audience "Who, in this room, said they wanted to be a DBA when they grew up?"
and unsurprisingly not a single hand went up.
I’ve spent my whole career working with databases, as illustrated by this diagram of my roles
over the last 25 years. I’ve had such "wide-ranging" job titles as Junior Database Administrator”,
“Senior Database Administrator”, “Database Expert” and “Senior Database Consultant”.
My current job title is the only one I’ve ever had that doesn’t contain the
word “database”, although I still work exclusively with databases, and even I
didn’t want to be a DBA when I grew up!
Although I think at one point I wanted to be a dinosaur, so maybe I didn’t know too much about career options.
Anyway, just in case I was wrong, and kids do in fact say they want to be database administrators when they grow up,
I thought should do a bit of actual research.
Fortunately, lots of people have already done the hard work for me;
they’ve been out and asked a bunch of children what they want to be when they grow up,
and they’ve asked adults what they wanted to be back when they were growing up [1] [2] [3].
And you’ll actually be surprised to hear that database administrator
...
didn’t feature in any of the lists.
Vet or Doctor was a popular choice,
as was professional footballer,
astronaut, obviously,
and even engineer featured in most of the lists,
but not database administrator.
I even had to inflict my drawing "skills" on you to be able to include a picture of a DBA
because it’s not so easy to find useful images.
So I have to concede that maybe (just maybe) not everyone is as passionate about databases as I am.
Most of the developers I speak to just want the database to quietly do its thing in the background
so they can concentrate on developing their application, which doesn’t seem like an unreasonable request.
But the world of databases is changing. The traditional DBA role is becoming less common,
and developers are often expected to manage their own databases.
Even if you don’t have to actually look after your own database,
it often forms the backbone of your application, so it’s helpful to know how to interact with it.
So I wanted to talk about how you, as a developer, can navigate this uncomfortable reality,
and what you actually need to know about databases.
Obviously, because I love databases so much, I think everyone should want to know everything about databases.
If you don’t want to talk constantly about Boyce-Codd normal form, then why not?
If your party trick isn’t to recite the ACID properties of transactions,
and the different transaction isolation modes available in each of the database management systems,
and discuss how your chosen RDBMS either conforms to or differs from the SQL standard, then are you even really living?
Don't hesitate to get in touch if you need other ideas for sparkling conversation starters for your next dinner party.
I was in the audience for
Daniele Procida’s “The Attentive Programmer” at PyCon Italia
in May, and he also gave the talk at DjangoCon Europe.
If you didn't manage to see it, I recommend catching the recording.
Daniele talked about the idea of writing a love program.
I love the ode to his favourite camera that he wrote in Python,
and it got me think about how I would express love in database terms.
My chosen love language would be the entity relationship diagram.
I think it’s a beautiful way to represent almost any real-world objects in terms
of their attributes and the relationships between them.
I love the fact that there are rules, and syntax, and conventions.
I like the simplicity of the crows-foot notation,
and the fact that so much information can be conveyed by just a few little lines.
Just from the tiny snippet above, we know that one department can have one or more employees,
and that an employee belongs to one and only one department.
I like the way the colouring and spacing and organisation of the elements in the diagram
can help even a non-technical reader to see how the different concepts and pieces of data fit together.
I’ve been known to lose myself and spend far too long rearranging an ER diagram
to look balanced, pretty, untangled, and ordered, and to make sure it fits on to one sheet of paper
(albeit sometimes a very large sheet of paper).
What better thing to use my love language for than to describe
one of my favourite things – a Postgres cluster!
The process always seems quite simple to start with:
- I modelled my Postgres cluster, which comprises one or more databases.
- I can define one or more roles or users for my cluster.
- I can create one or more tablespaces.
- Each of my databases can contain one or more schemas,
- and each of my tables will belong to one of those schemas.
- The tables can also belong to one of the tablespaces.
- An index can be created on one or more of my table columns,
- but equally, a column could belong to one or more indices.
I could model the permissions granted to the roles,
and I could model the fact that certain databases have access to certain tablespaces.
I need to think about how changes over time will be modelled; do I want to record the history?
I could add in other object types like views, functions, stored procedures, extensions...
I’ve not even started to add in the column names or the primary and foreign key columns.
And I had to force myself to stop there, because the goal of the exercise
wasn’t to come up with a realistic model of my database cluster,
but to show how I could lovingly "draw" my databases in this way!
To go back to the actual subject of the talk, What even is a DBA?
Well, according to various different definitions from careers services,
database management system vendors and job sites,
there seems to be a general consensus that a DBA is
someone who uses specialist software to manage and secure computer systems that store data.
Which tells us pretty much nothing at all about what a DBA actually does all day
So, what does a DBA actually do?
I compiled a list of DBA responsibilities based on various definitions and sample job adverts.
It turns out to be a pretty long list.
Apparently, a DBA is expected to do some or all of the following things:
- Design, implement and maintain backup and recovery procedures
- Design and implement security procedures, manage database access
- Monitor the database’s availability, performance, security, space etc
- Logical and physical data modelling
- 24/7 support and troubleshooting
- Plan, perform and test database software install and upgrades
- Provide database expertise, advice and support to other teams
- Fix performance problems, generally improve database performance
- Plan for future database size/resource needs
- Create databases
- Design and implement database maintenance procedures
- Make sure data protection/GDPR rules are obeyed
No problem!
And then there’s the list of skills that a DBA is expected to have.
I created this chart last year, based on data for the UK from IT Jobs Watch.
It ranks the required skills and capabilities for DBA job roles
based on how often they appeared in DBA job adverts.
It boils down to:
- SQL skills, procedural languages, data integration & ORMs
- Knowledge of one or more database management systems
(SQL Server, Oracle, Postgres) - Knowledge of various operating systems
including Linux, Windows and various cloud environments - Performance Tuning
- Database Migration
- Disaster Recovery
- Social Skills
I’m not sure what they’re trying to say about DBAs there, and when I did a quick check this week to make sure the list was still up to date, social skills ranked even higher in the list: in 5th place instead of down in 16th! - High Availability
- Replication
- Clustering
- DevOps and automation methodologies and tools
That’s a lot to know and do!
Oh, and to make things even less clear, there are various different “flavours” of DBA.
A traditional split is between the Production DBA and Development DBA,
where the development DBA works closely with developers,
and focuses on building and maintaining a database environment that supports the application development.
Then usually throws things over the fence to the production DBA,
who makes sure the databases in your production environment are kept up and running,
focusing on things like availability, performance and security.
Maybe you are the development DBA in your organisation.
In some organisations, the split is between application DBA
(concerned with the logical, application-related aspects of the database) and
system DBA (responsible for the underlying software and physical infrastructure).
In this situation, you’re probably the application DBA.
As well as production DBAs and development DBAs, application DBAs and System DBAs, you’ll hear of
- Datawarehouse DBAs
- Cloud DBAs
- Database architects or even
- Replication DBAs or
- Backup and recovery DBAs
who just look after one aspect of database admin.
These different types of DBA role will differ from one organisation to another, or even from one team to the next.
The division of responsibilities isn’t necessarily clear-cut, and the roles often overlap.
Fortunately, you really don’t need to be an expert in all of that.
I really enjoyed Katie McLaughlin’s keynote “What should you have to worry about” from DjangoCon Europe 2022 in Porto.
Katie pointed out that although you absolutely can run Django by creating a server, installing Django and Postgres and nginx and making it available to the world,
there are then all sorts of things to worry about, including:
- Server and network availability.
- OS, Django, nginx and Postgres updates.
- Database migrations.
- Database administration.
- Backups.
and if you’re the one who's worrying about these things, Katie wondered
"Are you a Django Developer, or are you a combination Django developer and database administrator and
systems administrator and network administrator and full stack engineer and grossly, grossly underpaid?"
You absolutely don’t need to know everything about databases.
I certainly don’t know everything about app development,
and even that statement makes it sounds as though I know much more than I do.
I know just enough to be able to help developers out with their database-related questions.
Likewise, you need to know just enough about databases to help you out with your app development.
This is the agenda from a talk I gave called “How to keep your database Happy” at PyCon UK last September.
My top 5 tips for things you can put in place, without too much effort,
to make sure you have a robust, performant database environment
- Check that a few key configuration parameters are set correctly for your environment and workload
- Make sure you take regular backups of your database (and test your recovery process)
- Put a high availability architecture in place
- Make sure the right users/applications can connect to and interact with your database
- Make sure you know what's going on and can react quickly if something goes wrong
I gave about a 3-minute overview of how to do each of those things.
Funnily enough, that wasn't enough time to go into the details of
everything you need to know to look after your database,
i.e. everything I've spent the last 25 years learning.
And that was just the things that relate to the production / system side of things.
We didn't even start to think about the application side of things:
database design, queries design and performance tuning, connection management,
stored procedures, transaction management and more.
But you don’t need to know these things in detail.
The main thing is to know that these ideas and concepts exist,
and to know where to look and who to ask for more information if and when you need it.
I always include links to the relevant documentation or other useful resources
when I do that type of presentation, because I know I can’t cover everything in a 30-minute talk,
and that even if I did, people wouldn’t remember most of what I said.
If, when you have to implement something, or when you have a problem, you think
"Oh, I vaguely remember Karen saying something about that.”
and you can go and read up on it, then I feel as though I’ve done something useful.
If you don’t have the time, skills, inclination or infrastructure available to do all of the things you need to do to look after a database,
there are various options, including managed services, that will do it all for you.
The big cloud providers provide managed database services,
the company I work for (Crunchy Data) has a managed PostgreSQL service called Crunchy Bridge,
and there are plenty of others out there.
And even if do you have the skills and the inclination, it might just be a better use of your time and expertise
as an application developer to outsource some of these tasks.
So, going back to the list of DBA responsibilities,
which of those tasks do you actually have to know about or be able to do yourself?
- It's definitely useful to have some knowledge of how database backup and recovery works. You might need to do it yourself for your development environments, you may want to make sure that sensible backup processes are in place for your chosen managed database service, and you definitely want to test out recovering a database in case something goes very wrong.
- You may not be the one who designs and enforces database security policies, but it's helpful to have in the back of your mind some of the best practices - for example principle of least privilege & separation of duties - when deciding who should own and manipulate database objects, which users should connect to the database and how etc.
- You probably won't be responsible for putting database monitoring in place, but at least having an idea of what some of the metrics shown on your database dashboard mean can be a valuable debugging tool.
- Data modelling is a must from my point of view. Having that vision of what the data in your system looks like and how it fits together can be really helpful when looking at the best ways to write code that accesses and updates that data.
- Hopefully, you won't have to do 24/7 support, but being able to do some basic database troubleshooting could save you a lot of time - debugging connection errors, figuring out why the database went down, understanding why (and resolving) performance of a certain query suddenly dropped off a cliff.
- Hopefully, the software install and upgrades can be left to someone else, especially if you're using a managed service.
- Although being a "database expert" is unlikely to be part of your remit, it will probably give you a boost if you manage to understand some databasey-stuff and explain it to the rest of your team
- It's definitely important to be able to work out why a particular query or part of your application is performing badly and be able to make improvements to performance. Even better is learning how to write code that performs well against the database in the first place.
- Capacity planning may not be in your remit, but it's good to be aware of the resources being used for the database, and how they're expected to grow so you can design your application to be able to cope.
- >A lot of organisations are putting automated processes in place to create database clusters, and individual databases. If you're using a managed service, there's likely an API for that. Even if that's the case, it's helpful to have a basic understanding of some of the most common parameters to make sure they're set to appropriate values for your use case when the database is created.
- Hopefully, "someone else" will deal with general database maintenance tasks, especially if you're using a managed service. Again, it's helpful to be familiar with the basics of things like autovacuum, rebuilding indexes, refreshing materialized views, keeping statistics up to date etc. so you're aware of the impact that the processes could have on your application (whether in terms of negative impact of not keeping on top of it, or the potential impact of running certain processes).
- It can be a real pain, but it's everyone's responsibility to make sure data protection/GDPR rules are obeyed. This one's not a database administration specific task, but it's tightly related to the database.
Going back to the list of DBA skills and capabilities; which of them to you actually need?
- I know, I know, everyone uses ORMs. But if you want to understand what the ORM’s doing, and potentially why a query seems to be misbehaving, there’s no better way than to understand and to be able to write your own SQL queries.
- You don’t need to know about lots of different database technologies - just the one that you’re actually using. Obviously, I’m going to recommend PostgreSQL for pretty much any situation, but you might have your reasons for using something else.
- I already touched on performance tuning and disaster recovery.
- Having spent time with the Django community, I’d say you’re generally fine on the social skills front!
- And you probably already know at least as much about DevOps and Automation than the average DBA.
You can get an idea of some of the things that I think it’s useful for you to know by looking at the titles of some of the talks that I've written for developer events:
- How to keep your database happy (as mentioned already)
- How to tune PostgreSQL to work even better talks about some of the most useful PostgreSQL configuration parameters and how to set them to the right values for your use-case.
- Tuning your database for Analytics looks at the implications of running a mixed workload on your database, and how you can minimise the performance impact of that.
- Everything you wanted to know about databases but were too afraid to ask your DBA answers all sorts of questions from "what's a database instance?" and "what's the difference between a view and a materialized view?" to "How do I read a query execution plan?".
- Database troubleshooting for developers explains how to recognise and fix the most common database issues you’ll come across.
- How Postgres is misused and abused in the wild was written for the people who develop PostgreSQL itself. The aim was to encourage reflection about what we as a project can do to make the process of learning about and using Postgres as straightforward as possible.
- PostgreSQL documentation
- Talk recordings from PostgreSQL conferences (many can be found via a search on YouTube)
- PostgreSQL Conferences
- PostgreSQL Books
- Tutorials, for example the Crunchy Data Postgres Playground
Hopefully, once you've learnt about some of those things, you'll be able to proudly talk about "how I learned to stop worrying and love the database".
I've included a link to the non-annotated, pdf version of the slides, and some contact details so you can
get in touch with me, or follow me for more database chat.
Mastodon: @karenhjex@mastodon.online
Twitter/X: @karenhjex
References:
[1]
fatherly.com: What kids want to be when they grow up
[2]
statista.com: Childhood aspirations in China/US/UK
[3]
Moneypeny.com: What did you want to be?
[4]
reed.co.uk: What your kids really want to be when they grow up
[5]
itjobswatch.co.uk: DBA roles