Merge multiple backups of the same table schema into 1 master table

I have about 200 copies of a SQLite database. All taken at different times with different data in them. Some rows are deleted and some are added. They are all in a single directory.

I want to merge all the rows in the table my_table, using all the .db files in the directory. I want duplicate rows to be deleted, showing all entires from all the databases, just once.

I’d like to do this in pure SQL, but I don’t think it’s possible, so we can use Python too.

Table definition:

CREATE TABLE my_table (
    ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    text TEXT,
    replace INTEGER DEFAULT 0,
    service_center TEXT,
    handle_id INTEGER DEFAULT 0,
    subject TEXT,
    country TEXT,
    attributedBody BLOB,
    version INTEGER DEFAULT 0,
    type INTEGER DEFAULT 0,
    service TEXT,
    account TEXT,
    account_guid TEXT,
    error INTEGER DEFAULT 0,
    date INTEGER,
    date_read INTEGER,
    date_delivered INTEGER,
    is_delivered INTEGER DEFAULT 0,
    is_finished INTEGER DEFAULT 0,
    is_emote INTEGER DEFAULT 0,
    is_from_me INTEGER DEFAULT 0,
    is_empty INTEGER DEFAULT 0,
    is_delayed INTEGER DEFAULT 0,
    is_auto_reply INTEGER DEFAULT 0,
    is_prepared INTEGER DEFAULT 0,
    is_read INTEGER DEFAULT 0,
    is_system_message INTEGER DEFAULT 0,
    is_sent INTEGER DEFAULT 0,
    has_dd_results INTEGER DEFAULT 0,
    is_service_message INTEGER DEFAULT 0,
    is_forward INTEGER DEFAULT 0,
    was_downgraded INTEGER DEFAULT 0,
    is_archive INTEGER DEFAULT 0,
    cache_has_attachments INTEGER DEFAULT 0,
    cache_roomnames TEXT,
    was_data_detected INTEGER DEFAULT 0,
    was_deduplicated INTEGER DEFAULT 0,
    is_audio_message INTEGER DEFAULT 0,
    is_played INTEGER DEFAULT 0,
    date_played INTEGER,
    item_type INTEGER DEFAULT 0,
    other_handle INTEGER DEFAULT -1,
    group_title TEXT,
    group_action_type INTEGER DEFAULT 0,
    share_status INTEGER,
    share_direction INTEGER,
    is_expirable INTEGER DEFAULT 0,
    expire_state INTEGER DEFAULT 0,
    message_action_type INTEGER DEFAULT 0,
    message_source INTEGER DEFAULT 0
)

Best answer

To be able to access both the master database and a snapshot at the same time, use ATTACH.
To delete and old version of a row, use INSERT OR REPLACE:

ATTACH 'snapshot123.db' AS snapshot;
INSERT OR REPLACE INTO main.my_table SELECT * FROM snapshot.my_table;
DETACH snapshot;

Do this with all databases, in order from oldest to newest.
(SQLite has no loop control mechanism for this; do this loop in Python.)

Alternatively, you can go backwards, from newest and oldest, and insert only rows that do not yet exist:

ATTACH 'snapshot123.db' AS snapshot;
INSERT OR IGNORE INTO main.my_table SELECT * FROM snapshot.my_table;
DETACH snapshot;