Open, source-available — the new KeygenStar us on GitHub arrow_right_alt

How to Bulk Insert Nested Attributes in Rails

Tuesday, Aug 2nd 2022

When using accepts_nested_attributes_for in Rails, especially for many-to-many relations, you may find that things slow down during large inserts.

For example, take the common Author, Post and Tag models:

class Author < ActiveRecord::Base
has_many :posts
end
 
class Post < ActiveRecord::Base
belongs_to :author
has_many :post_tags
has_many :tags, through: :post_tags
 
accepts_nested_attributes_for :post_tags
end
 
class Tag < ActiveRecord::Base
has_many :post_tags
has_many :posts, through: :post_tags
has_many :authors, through: :posts
end

Then for the many-to-many Post to Tag relation, we have a PostTag:

class PostTag < ActiveRecord::Base
belongs_to :post
belongs_to :tag
end

Now if an Author were to create a Post that has 10+ tags:

author = Author.create!(name: 'Z')
author.posts.create(
post_tags_attributes: [
{ tag: Tag.first },
{ tag: Tag.second },
{ tag: Tag.third },
{ tag: Tag.fourth },
{ tag: Tag.fifth },
# ...
]
)

This is what the resulting SQL would look like:

BEGIN;
 
INSERT INTO
"posts" ("created_at", "updated_at")
VALUES
(
'2022-08-02 17:08:40.202914',
'2022-08-02 17:08:40.202914'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
1,
'2022-08-02 17:08:40.203770',
'2022-08-02 17:08:40.203770'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
2,
'2022-08-02 17:08:40.204044',
'2022-08-02 17:08:40.204044'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
3,
'2022-08-02 17:08:40.204301',
'2022-08-02 17:08:40.204301'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
4,
'2022-08-02 17:08:40.204535',
'2022-08-02 17:08:40.204535'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
5,
'2022-08-02 17:08:40.204757',
'2022-08-02 17:08:40.204757'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
6,
'2022-08-02 17:08:40.204976',
'2022-08-02 17:08:40.204976'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
7,
'2022-08-02 17:08:40.205224',
'2022-08-02 17:08:40.205224'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
8,
'2022-08-02 17:08:40.205442',
'2022-08-02 17:08:40.205442'
);
 
INSERT INTO ...
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
9,
'2022-08-02 17:08:40.205542',
'2022-08-02 17:08:40.205542'
);
 
INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
10,
'2022-08-02 17:08:40.205662',
'2022-08-02 17:08:40.205662'
);
 
COMMIT;

That's a lot sequential inserts, and a lot of wasted time and wasted compute.

And imagine timings if you were inserting tens or hundreds of rows.

To speed things up, we could insert the post tags in bulk.

PostTag.insert_all([
{ post_id: 1, tag_id: 1 },
{ post_id: 1, tag_id: 2 },
{ post_id: 1, tag_id: 3 },
# ...
])

That's pretty easy to do for normal inserts… but how do we use insert_all with nested attributes? Digging into the ActiveRecord internals, I found a little method called autosave_associated_records_for_.

By defining an override, you can hook into the autosaving of nested attributes:

class Post
belongs_to :author ...
has_many :post_tags
has_many :tags, through: :post_tags
 
accepts_nested_attributes_for :post_tags
 
+ def autosave_associated_records_for_post_tags
+ return if post_tags.empty?
+ 
+ PostTag.insert_all(
+ post_tags.map {{ post_id: id, tag_id: _1.tag_id }},
+ record_timestamps: true,
+ )
+ end
end

Now when we create another Post, we can peek the SQL:

BEGIN;
 
INSERT INTO
"posts" ("created_at", "updated_at")
VALUES
(
'2022-08-02 17:15:06.098818',
'2022-08-02 17:15:06.098818'
);
 
INSERT INTO
"post_tags" ("post_id", "tag_id", "created_at", "updated_at")
VALUES
(
1,
1,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
( ...
1,
2,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
3,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
4,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
5,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
6,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
7,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
8,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
9,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
),
(
1,
10,
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW'),
STRFTIME('%Y-%m-%d %H:%M:%f', 'NOW')
) ON CONFLICT DO NOTHING;
 
COMMIT;

Much better! This does come with caveats, however:

  1. If needed, you will have to handle record validation yourself, since bulk_insert does not run model validations. Ditto for model callbacks.
  2. If needed, you will have to handle records that are marked_for_destruction?, and delete them accordingly.
  3. This is using an, AFAICT, undocumented method in Rails.

Until next time.