ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding i_data_sem and that violates lock ordering because i_data_sem ranks below a transaction start (and it can lead to a real deadlock with ext4_get_blocks() mapping blocks in some page while having a transaction open). We fix the problem by dropping the i_data_sem before restarting the transaction and acquire it afterwards. It's slightly subtle that this works: 1) By the time ext4_truncate() is called, all the page cache for the truncated part of the file is dropped so get_block() should not be called on it (we only have to invalidate extent cache after we reacquire i_data_sem because some extent from not-truncated part could extend also into the part we are going to truncate). 2) Writes, migrate or defrag hold i_mutex so they are stopped for all the time of the truncate. This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

commit: 487caeef9fc08c0565e082c40a8aaf58dad92bbb [log] [tgz]
author: Jan Kara <jack@suse.cz> Mon Aug 17 22:17:20 2009 -0400
committer: Theodore Ts'o <tytso@mit.edu> Mon Aug 17 22:17:20 2009 -0400
tree: 69920293cfe3a50bdbbf845be785350e7c203a2b
parent: 9599b0e597d810be9b8f759ea6e9619c4f983c5e [diff] [blame]
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9a4c929..d61fb52 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c

@@ -192,11 +192,24 @@
  * so before we call here everything must be consistently dirtied against
  * this transaction.
  */
-static int ext4_journal_test_restart(handle_t *handle, struct inode *inode)
+ int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode,
+				 int nblocks)
 {
+	int ret;
+
+	/*
+	 * Drop i_data_sem to avoid deadlock with ext4_get_blocks At this
+	 * moment, get_block can be called only for blocks inside i_size since
+	 * page cache has been already dropped and writes are blocked by
+	 * i_mutex. So we can safely drop the i_data_sem here.
+	 */
 	BUG_ON(EXT4_JOURNAL(inode) == NULL);
 	jbd_debug(2, "restarting handle %p\n", handle);
-	return ext4_journal_restart(handle, blocks_for_truncate(inode));
+	up_write(&EXT4_I(inode)->i_data_sem);
+	ret = ext4_journal_restart(handle, blocks_for_truncate(inode));
+	down_write(&EXT4_I(inode)->i_data_sem);
+
+	return ret;
 }
 
 /*
@@ -3658,7 +3671,8 @@
 			ext4_handle_dirty_metadata(handle, inode, bh);
 		}
 		ext4_mark_inode_dirty(handle, inode);
-		ext4_journal_test_restart(handle, inode);
+		ext4_truncate_restart_trans(handle, inode,
+					    blocks_for_truncate(inode));
 		if (bh) {
 			BUFFER_TRACE(bh, "retaking write access");
 			ext4_journal_get_write_access(handle, bh);
@@ -3869,7 +3883,8 @@
 				return;
 			if (try_to_extend_transaction(handle, inode)) {
 				ext4_mark_inode_dirty(handle, inode);
-				ext4_journal_test_restart(handle, inode);
+				ext4_truncate_restart_trans(handle, inode,
+					    blocks_for_truncate(inode));
 			}
 
 			ext4_free_blocks(handle, inode, nr, 1, 1);
commit	487caeef9fc08c0565e082c40a8aaf58dad92bbb	[log] [tgz]
author	Jan Kara <jack@suse.cz>	Mon Aug 17 22:17:20 2009 -0400
committer	Theodore Ts'o <tytso@mit.edu>	Mon Aug 17 22:17:20 2009 -0400
tree	69920293cfe3a50bdbbf845be785350e7c203a2b
parent	9599b0e597d810be9b8f759ea6e9619c4f983c5e [diff] [blame]