Updating millions of rows
This maximizes your table availability since you only need to keep locks for a short amount of time.In case of adding a new column, you can set it temporarily as nullable and start gradually filling it with new values.When you update a value in a column, Postgres writes a whole new row in the disk, deprecates the old row and then proceeds to update all indexes.This process is equivalente to an With this in mind, let’s look at a few strategies that you can use to effectively update a large number of rows in your table: If you can segment your data using, for example, sequential IDs, you can update rows incrementally in batches.Updating a large table in Postgres is not as straightforward as it seems.If you have a table with hundreds of millions of rows you will find that simple operations, such as adding a column or changing a column type, are hard to do in a timely manner.Here is a script with the base structure for this operation: CREATE TABLE new_tbl ( field1 int, field2 int, ...
It may also require more complex application logic during the migration.Doing these kind of operations without downtime is an even harder challenge.In this blog post I will try to outline a few strategies to minimize the impact in table availability while managing large data sets.The fastest way to update a large table is to create a new one.
If you can safely drop the existing table and if there is enough disk space, then the easiest way to perform the update is to insert the data into a new table and rename it afterwards.
-- use your new logic here to insert the updated data ) CREATE INDEX -- add your constraints and indexes to new_tbl DROP TABLE tbl; ALTER TABLE tbl_new RENAME TO tbl; If you can’t drop the original table because you don’t want to recreate views or because of other constraints, then you can use a temporary table to hold the new values, truncate the old table and rewrite the data there.