Change data in migrations like a bossHot vacancies! Apply to a job of your dream!
Changing data on change database schema in production is a common problem for Rails developers. Assume that you have a Rails project. Some day you decided to change the database schema and want to add some new column. Then you have to go through all your models and change actual data according this new schema. Currently there are solutions to overcome this. But all of them have their disadvantages. You will see in this chapter them. This post tells about these disadvantages and how to get rid of the issue with the
Solutions with disadvantages
There are many solutions to avoid the issue. This is a list of them:
- Writing code in migrations without caution
- Duplicate classes in migrations
- Writing raw SQL in migrations
- Using seeds
- Other methods
Now let’s look at all of them one by one and see what are problems have these solutions.
Writing code in migrations without caution
Say we are going to add a column to a
User model and then update all users in our database. So the migration may look like this:
Today this migration works without problems. We are committing this code to our version control system (I hope it’s
git). In a week we will have to run it on the production server. All our team members get this new code and run the migration. It works as expected - perfect! But tomorrow we decide to rename the
User model to
Customer. We create new migration to rename the table, rename the
User model to
Customer and don’t touch the old migrations. Tests are working, new migration is working and everyone is happy. But what will be with this migration in one week when we run it on the production server? It will be broken because at those moment we won’t have
User model yet. We forgot to rename the
User model in the old migrations. It happens too often in Rails development. So don’t be like a kamikaze and don’t write code to update data in migrations. Never!
Duplicate classes in migrations
There is a similar solution to the previous one and it may be useful in some cases. Just define the
User model in the migration:
Now when you rename the
User model in the new migration this code won’t fail. For this example the solution is suitable. But the problems come when you have to define for example polymorphic association in the migration:
The code will work without exception but actually it doesn’t set correct association because the defined class are under namespace
AddStatusToUser. This is what happens in reality:
And when you have the bug in production you may decide to figure out why the association is not set for the user. You open rails console and check the association:
As you see the
role_type is set incorrectly it should be
Role without the namespace. That’s why this solution bug prone as well.
Writing raw SQL in migrations
To be honest this solution doesn’t have disadvantages except a few. You have to know SQL and sometimes well. And as a result requires more time to write the code. Besides of this Rails developers don’t prefer to write raw SQL because if you want to migrate through PostgreSQL to MySQL for example you may have to fix much raw SQL code. Check out official documentation to get examples how to write raw SQL in migrations.
Rails has useful approach to populate the database. You may write any code in the
db/seeds.rb file and run
rake db:seed command. This is great solution to populate the database by some data at first start but then, when your project is released and you have to change the data on changing the database schema this won’t help you. Also writing the code in one file may lead to the mess. Of course you may create your own dependent files and then load them in the
seeds.rb file but anyway you have to worry about what to do on second run. There is also gem which helps to structuring you seeds data - seedbank. But again it doesn’t solve all these problems.
No doubt that there are many other methods. For example create rake tasks, of even going to production console and write code there after deploying (I hope you don’t do it). But none of them doesn’t solve all problems with the problem.
Use migration_data gem
Recently I’ve released gem which solves all these problems. This is the migration_data. After it’s installing you will able to define a
data method in your migrations and write the code here. The method runs only on migrating up (i.e. on
rake db:migrate but doesn’t run on
rake db:rollback). Additionally to keep the code in
data method up to date just write tests for this. The gem provides
require_migration method to help load migrations easily to the tests.
And this is a test for the migration:
The test will fail you have some unexpected changes in the code and you will be informed that your migration is not actual immediately. Currently the gem works with Ruby >= 2.0 and Rails >= 4.0.0.rc1.
The only solution to keep your migrations up to date with any code which lives there is to write tests for them. But if you write the migration data code and code to change database schema simultaneously in
change methods you won’t able to write tests for these migrations. Change database schema in tests is not good idea, isn’t it? So if you have these problems this gem is what you are looking for.
Over time you will notice that it’s rather hard to maintain your old migrations. Especially it’s true when the code is changing a lot. At this point the best solution that I know is to just remove all the old migrations. After the removal we have to just insert current database structure into the last migration. This way we will have clean migrations history and it will be possible to run them on a new database.
In order to perform the migrations squashing I added a rake task into https://github.com/ka8725/migration_data#clean-old-migration today. The task’s name is
db:migrate:squash. And finally don’t forget to remove tests for the migrations if you have them after the task execution. Enjoy it!
One note for the rake task. You have to make sure that all team members and deployment servers have already run all current migrations before the squashing. Otherwise they can have collisions. This process can be automated later.
Once you can get the following state:
This can be happened when somebody does the squashing and some other person is creating a new migration. This way you can have inconsistency - the migration for change can try to alter the not created yet tables (for example, it just adds a new column to the
users table, but as it applied to the fresh DB accordingly to the DB history it unfortunately fails). In order to fix the issue you have to update the timestamps for the migration change to be newer than the squash migration.
When you have to update a lot of data then the
migration_data gem may be not a good choice. It takes time
to process huge amount of data and the deployment process will be slowed down dramatically. It will increase
downtime of your application what is not acceptable for a production ready application with many real clients.
So if you have such type application then, please, don’t use this gem. There are may be used other techniques
that already discussed in other great posts:
- Rails Migrations with Zero Downtime
- Data Migrations in Rails
- SEEDMIGRATIONS. LIKE SCHEMA MIGRATIONS, BUT FOR YOUR DATA