How to Bulk Insert Nested Attributes in Rails
Tuesday, Aug 2nd 2022
When using accepts_nested_attributes_for
in Rails, especially for many-to-many relations,
you may find that things slow down during large inserts.
For example, take the common Author
, Post
and Tag
models:
class Author < ActiveRecord::Base has_many :postsend class Post < ActiveRecord::Base belongs_to :author has_many :post_tags has_many :tags, through: :post_tags accepts_nested_attributes_for :post_tagsend class Tag < ActiveRecord::Base has_many :post_tags has_many :posts, through: :post_tags has_many :authors, through: :postsend
Then for the many-to-many Post
to Tag
relation, we have a PostTag
:
class PostTag < ActiveRecord::Base belongs_to :post belongs_to :tagend
Now if an Author
were to create a Post
that has 10+ tags:
author = Author.create!(name: 'Z')author.posts.create( post_tags_attributes: [ { tag: Tag.first }, { tag: Tag.second }, { tag: Tag.third }, { tag: Tag.fourth }, { tag: Tag.fifth }, # ... ])
This is what the resulting SQL would look like:
BEGIN; INSERT INTO "posts" ("created_at", "updated_at")VALUES ( '2022-08-02 17:08:40.202914', '2022-08-02 17:08:40.202914' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 1, '2022-08-02 17:08:40.203770', '2022-08-02 17:08:40.203770' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 2, '2022-08-02 17:08:40.204044', '2022-08-02 17:08:40.204044' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 3, '2022-08-02 17:08:40.204301', '2022-08-02 17:08:40.204301' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 4, '2022-08-02 17:08:40.204535', '2022-08-02 17:08:40.204535' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 5, '2022-08-02 17:08:40.204757', '2022-08-02 17:08:40.204757' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 6, '2022-08-02 17:08:40.204976', '2022-08-02 17:08:40.204976' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 7, '2022-08-02 17:08:40.205224', '2022-08-02 17:08:40.205224' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 8, '2022-08-02 17:08:40.205442', '2022-08-02 17:08:40.205442' ); INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 9, '2022-08-02 17:08:40.205542', '2022-08-02 17:08:40.205542' ); INSERT INTO "post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 10, '2022-08-02 17:08:40.205662', '2022-08-02 17:08:40.205662' ); COMMIT;
That's a lot sequential inserts, and a lot of wasted time and wasted compute.
And imagine timings if you were inserting tens or hundreds of rows.
To speed things up, we could insert the post tags in bulk.
PostTag.insert_all([ { post_id: 1, tag_id: 1 }, { post_id: 1, tag_id: 2 }, { post_id: 1, tag_id: 3 }, # ...])
That's pretty easy to do for normal inserts… but how do we use insert_all
with nested attributes? Digging into the ActiveRecord internals, I found a little method called autosave_associated_records_for_
.
By defining an override, you can hook into the autosaving of nested attributes:
class Post belongs_to :author
has_many :post_tags has_many :tags, through: :post_tags accepts_nested_attributes_for :post_tags + def autosave_associated_records_for_post_tags + return if post_tags.empty?+ + PostTag.insert_all(+ post_tags.map {{ post_id: id, tag_id: _1.tag_id }},+ record_timestamps: true,+ )+ end end
Now when we create another Post
, we can peek the SQL:
BEGIN; INSERT INTO "posts" ("created_at", "updated_at")VALUES ( '2022-08-02 17:15:06.098818', '2022-08-02 17:15:06.098818' ); INSERT INTO "post_tags" ("post_id", "tag_id", "created_at", "updated_at")VALUES ( 1, 1, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), (
1, 2, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 3, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 4, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 5, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 6, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 7, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 8, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 9, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ), ( 1, 10, STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'), STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW') ) ON CONFLICT DO NOTHING; COMMIT;
Much better! This does come with caveats, however:
- If needed, you will have to handle record validation yourself, since
bulk_insert
does not run model validations. Ditto for model callbacks. - If needed, you will have to handle records that are
marked_for_destruction?
, and delete them accordingly. - This is using an, AFAICT, undocumented method in Rails.
Until next time.