Programming with Reason
Programming with Reason
Java File IO Comparison
Friday, April 30, 2010
I've recently been working with low-level file reading and writing in Java. I started with the java.io classes since I knew that inside and out, but moved to java.nio to see if I could improve performance. The results were shocking. First, the details:
- For the java.io code, I used a RandomAccessFile object and wrote directly to the file, seeking to the proper location as records were written, read back, and deleted.
- For the java.nio code, first I used a FileChannel object. NIO is much more efficient than java.io because it allows you to read and write to a file (or a network) using large data chunks. Java.io basically works a byte-at-a-time, which is not as efficient.
- Further, to make things even more efficient, I updated the NIO code to use a MappedByteBuffer, which itself is built on top of the host OS platform's virtual memory system. According to the documentation, and the excellent O'Reilly book, Java NIO, this should result in the most efficient code in terms of performance and storage resources.
To start, I wrote a test application that simulated an employee database. The Employee data structure is as such:
class Employee {
String last; // the key
String first;
int id;
int zip;
boolean employed;
String comments;
}
Employees are written to the file, and an index by key (last name) is maintained. Employee data is later loaded from the file by this key. Using a java.io.RandomAccessFile is required in all three cases - IO, NIO, and MappedByteBuffer code. The following code creates the file "employee.ejb", stored as the variable journal, in the current user's home directory:
String userHome = System.getProperty("user.home");
StringBuffer pathname = new StringBuffer(userHome);
pathname.append(File.separator);
pathname.append("employees.ejb");
java.io.RandomAccessFilejournal =
new RandomAccessFile(pathname.toString(), "rw");
To use this file with java.nio code, one more line is added:
java.nio.channels.FileChannel channel = journal.getChannel();
To further use this file with a MappedByteBuffer (also part of java.nio), add the following lines:
journal.setLength(PAGE_SIZE);
MappedByteBuffer mbb =
channel.map(FileChannel.MapMode.READ_WRITE, 0, journal.length() );
When you map a file, if the file size grows, the map will not see the new parts of the file. Since we want to read AND write, we need to re-map it each time it data is written to the end of it. To make this more efficient, the file is initially sized and subsequently grown by a predetermined amount (a page, if you will). This also makes the code a little more complex, but we can deal with it.
Writing Employee Records
With java.io, writing an employee record involves the following code:
public boolean addRecord_IO(Employee emp) {
try {
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
// Just hard-code the sizes for perfomance
int size = 0;
size += emp.last.length();
size += 4; // strlen - Integer
size += emp.first.length();
size += 4; // strlen - Integer
size += 4; // emp.id - Integer
size += 4; // emp.zip - Integer
size += 1; // emp.employed - byte
size += emp.comments.length();
size += 4; // strlen - Integer
long offset = getStorageLocation(size);
//
// Store the record by key and save the offset
//
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = journal.getFilePointer();
long jounralLen = journal.length();
if ( jounralLen != currentPos )
journal.seek(jounralLen);
offset = jounralLen;
}
else {
// Seek to the returned insertion point
journal.seek(offset);
}
// Fist write the header
journal.writeByte(1);
journal.writeInt(size);
// Next write the data
journal.writeInt(last.length);
journal.write(last);
journal.writeInt(first.length);
journal.write(first);
journal.writeInt(emp.id);
journal.writeInt(emp.zip);
if ( emp.employed )
journal.writeByte(1);
else
journal.writeByte(0);
journal.writeInt(comments.length);
journal.write(comments);
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
journal.writeByte(0); //inactive record
journal.writeLong(newEmptyRecordSize);
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
In contrast java.nio, the code to add and employee record is as follows:
public boolean addRecord_NIO(Employee emp) {
try {
data.clear();
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
data.putInt(last.length);
data.put(last);
data.putInt(first.length);
data.put(first);
data.putInt(emp.id);
data.putInt(emp.zip);
byte employed = 0;
if ( emp.employed )
employed = 1;
data.put(employed);
data.putInt(comments.length);
data.put(comments);
data.flip();
int dataLen = data.limit();
header.clear();
header.put((byte)1); // 1=active record
header.putInt(dataLen);
header.flip();
long headerLen = header.limit();
int length = (int)(headerLen + dataLen);
long offset = getStorageLocation((int)dataLen);
//
// Store the record by key and save the offset
//
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = channel.position();
long jounralLen = channel.size();
if ( jounralLen != currentPos )
channel.position(jounralLen);
offset = jounralLen;
}
else {
// Seek to the returned insertion point
channel.position(offset);
}
// Fist write the header
long written = channel.write(srcs);
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
data.clear();
data.put((byte)0);
data.putInt(newEmptyRecordSize);
data.flip();
channel.write(data);
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
Using a MappedByteBuffer, the code to add and employee record is as follows:
public boolean addRecord_MBB(Employee emp) {
try {
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
int datalen = last.length + first.length + comments.length + 12 + 9;
int headerlen = 5;
int length = headerlen + datalen;
//
// Store the record by key and save the offset
//
long offset = getStorageLocation(datalen);
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = mbb.position();
long journalLen = channel.size();
if ( (currentPos+length) >= journalLen ) {
//log("GROWING FILE BY ANOTHER PAGE");
mbb.force();
journal.setLength(journalLen + PAGE_SIZE);
channel = journal.getChannel();
journalLen = channel.size();
mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, journalLen);
currentPos = mbb.position();
}
if ( currentEnd != currentPos )
mbb.position(currentEnd);
offset = currentEnd;//journalLen;
}
else {
// Seek to the returned insertion point
mbb.position((int)offset);
}
// write header
mbb.put((byte)1); // 1=active record
mbb.putInt(datalen);
// write data
mbb.putInt(last.length);
mbb.put(last);
mbb.putInt(first.length);
mbb.put(first);
mbb.putInt(emp.id);
mbb.putInt(emp.zip);
byte employed = 0;
if ( emp.employed )
employed = 1;
mbb.put(employed);
mbb.putInt(comments.length);
mbb.put(comments);
currentEnd += length;
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
mbb.put((byte)0);
mbb.putInt(newEmptyRecordSize);
currentEnd += 5;
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
I then tried each method in a loop to add 100,000 employee records, and recorded the elapsed time. Here are the results:
java.io: ~10,000 milliseconds
java.nio: ~2,000 milliseconds
MappedByteBuffer: ~970 milliseconds
It's quite an improvement just going from java.io to java.nio. Going further to use a MappedByteBuffer (and the OS' virtual memory system) cut the elapsed time even further. Impressive!
Reading the records using the same three forms of file IO yielded the following results:
java.io: ~6,900 milliseconds
java.nio: ~1,400 milliseconds
MappedByteBuffer: ~355 milliseconds
Again, java.nio resulted in the most efficient code, with the use of MappedByteBuffer yielding dramatic improvements. Therefore, if you cannot or do not want to use a real database to persist data in your application, java.nio with a MappedByteBuffer yields the most efficient code. Of course, there are details such as actually when the data is persisted to the file system that you need to be aware of. You can learn all about that in the JavaDoc documentation.
The complete code is below:
package filetest;
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.util.*;
/**
*
* @author ericjbruno
*/
public class Main {
class DestinationEntry {
public String dirName = null;
public String[] msgNames = null;
}
class Header {
byte active;
int size;
static final int HEADER_SIZE = 5; // 5 bytes
}
class Employee {
String last; // the key
String first;
int id;
int zip;
boolean employed;
String comments;
}
// Keep an index by last name of each emp in the file
public HashMap<String, Long> employeeIdx =
new HashMap<String, Long>();
// Keep an index of LinkedList, where the index key is the size
// of the empty record. The LinkedList contains pointers (offsets) to
// the empty records
public TreeMap<Integer, LinkedList<Long>> emptyIdx =
new TreeMap<Integer, LinkedList<Long>>();
public Main() {
testRandomAccessFile();
}
private long intervalTimerStart = 0;
RandomAccessFile journal = null;
FileChannel channel = null;
MappedByteBuffer mbb = null;
ByteBuffer header = ByteBuffer.allocateDirect(5);
ByteBuffer data = ByteBuffer.allocateDirect(65536);
ByteBuffer[] srcs = { header, data };
int PAGE_SIZE = 1024 * 1024; // 1 megabyte page size
int currentEnd = 0;
int records = 0;
public static final byte IO_MODE = 1;
public static final byte NIO_MODE = 2;
public static final byte MBB_MODE = 3;
public byte fileMode = MBB_MODE;
public void testRandomAccessFile() {
try {
Employee[] emps = new Employee[100000];
for ( int i = 0; i < emps.length; i++ ) {
emps[i] = new Employee();
emps[i].last = "Doe" + i;
emps[i].first = "John";
if ( i % 2 == 0 )
emps[i].employed = true;
else
emps[i].employed = false;
emps[i].id = 2000 + i;
emps[i].zip = 11900 + i;
emps[i].comments = "This is employee number " + i;
}
String userHome = System.getProperty("user.home");
StringBuffer pathname = new StringBuffer(userHome);
pathname.append(File.separator);
pathname.append("employees.ejb");
journal = new RandomAccessFile(pathname.toString(), "rw");
channel = journal.getChannel();
if ( fileMode == MBB_MODE ){
journal.setLength(PAGE_SIZE);
mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, journal.length() );
}
else {
journal.setLength(0);
}
// Write the employees to the journal
startIntervalTimer(true);
for ( Employee emp : emps ) {
switch ( fileMode ) {
case IO_MODE:
addRecord_IO(emp);
break;
case NIO_MODE:
addRecord_NIO(emp);
break;
case MBB_MODE:
addRecord_MBB(emp);
break;
}
}
endIntervalTimer(true);
// Read the records
startIntervalTimer(true);
switch ( fileMode ) {
case IO_MODE:
readRecords_IO(journal);
break;
case NIO_MODE:
readRecords_NIO(channel);
break;
case MBB_MODE:
readRecords_MBB(mbb);
break;
}
endIntervalTimer(true);
log("Records read=" + records);
journal.close();
}
catch ( Exception e ) {
e.printStackTrace();
}
}
public void readRecords_IO(RandomAccessFile journal) {
// Scan the journal
try {
journal.seek(0);
while ( true ) {
// First check if the record is active (not deleted)
boolean active = false;
if ( journal.readByte() == 1 )
active = true;
// Next read the record's size
int size = journal.readInt();
if ( active ) {
// Active record, read in the data
records++;
Employee emp = new Employee();
byte[] last = new byte[journal.readInt()];
journal.read(last);
emp.last = new String(last);
byte[] first = new byte[journal.readInt()];
journal.read(first);
emp.first = new String(first);
emp.id = journal.readInt();
emp.zip = journal.readInt();
emp.employed = false;
if ( journal.readByte() == 1 )
emp.employed = true;
byte[] comments = new byte[journal.readInt()];
journal.read(comments);
emp.comments = new String(comments);
}
else {
// Inactive (deleted) record. Skip over it
int skipped = journal.skipBytes((int)size);
}
}
}
catch ( EOFException eof ) {
}
catch ( Exception e ) {
e.printStackTrace();
}
}
public void readRecords_NIO(FileChannel channel) {
try {
records = 0;
channel.position(0);
while ( true ) {
// Read a header
header.clear();
int b = channel.read(header);
header.flip();
// Is the record active (not deleted)?
boolean active = false;
if ( header.get() == 1 )
active = true;
// Get the record's size
int size = header.getInt();
if ( active ) {
// Active record, read the data
ByteBuffer filedata = ByteBuffer.allocateDirect((int)size);
channel.read(filedata);
filedata.flip();
records++;
Employee emp = new Employee();
byte[] last = new byte[filedata.getInt()];
filedata.get(last);
emp.last = new String(last);
byte[] first = new byte[filedata.getInt()];
filedata.get(first);
emp.first = new String(first);
emp.id = filedata.getInt();
emp.zip = filedata.getInt();
emp.employed = false;
if ( filedata.get() == 1 )
emp.employed = true;
byte[] comments = new byte[filedata.getInt()];
filedata.get(comments);
emp.comments = new String(comments);
}
else {
// Inactive (deleted) record. Skip over it
long current = channel.position();
channel.position((int)(current + size));
}
}
} catch ( Exception e ) { }
}
public void readRecords_MBB(MappedByteBuffer mbb) {
try {
records = 0;
mbb.position(0);
while ( true ) {
// Is the record active (not deleted)?
boolean active = false;
if ( mbb.get() == 1 )
active = true;
// Get the record's size
int size = mbb.getInt();
if ( active ) {
// Active record, read the data
records++;
Employee emp = new Employee();
byte[] last = new byte[mbb.getInt()];
mbb.get(last);
emp.last = new String(last);
byte[] first = new byte[mbb.getInt()];
mbb.get(first);
emp.first = new String(first);
emp.id = mbb.getInt();
emp.zip = mbb.getInt();
emp.employed = false;
if ( mbb.get() == 1 )
emp.employed = true;
byte[] comments = new byte[mbb.getInt()];
mbb.get(comments);
emp.comments = new String(comments);
}
else {
// Inactive (deleted) record. Skip over it
long current = mbb.position();
mbb.position((int)(current + size));
}
}
} catch ( Exception e ) { }
}
public boolean addRecord_MBB(Employee emp) {
try {
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
int datalen = last.length + first.length + comments.length + 12 + 9;
int headerlen = 5;
int length = headerlen + datalen;
//
// Store the record by key and save the offset
//
long offset = getStorageLocation(datalen);
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = mbb.position();
long journalLen = channel.size();
if ( (currentPos+length) >= journalLen ) {
//log("GROWING FILE BY ANOTHER PAGE");
mbb.force();
journal.setLength(journalLen + PAGE_SIZE);
channel = journal.getChannel();
journalLen = channel.size();
mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, journalLen);
currentPos = mbb.position();
}
if ( currentEnd != currentPos )
mbb.position(currentEnd);
offset = currentEnd;//journalLen;
}
else {
// Seek to the returned insertion point
mbb.position((int)offset);
}
// write header
mbb.put((byte)1); // 1=active record
mbb.putInt(datalen);
// write data
mbb.putInt(last.length);
mbb.put(last);
mbb.putInt(first.length);
mbb.put(first);
mbb.putInt(emp.id);
mbb.putInt(emp.zip);
byte employed = 0;
if ( emp.employed )
employed = 1;
mbb.put(employed);
mbb.putInt(comments.length);
mbb.put(comments);
currentEnd += length;
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
mbb.put((byte)0);
mbb.putInt(newEmptyRecordSize);
currentEnd += 5;
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
public boolean addRecord_NIO(Employee emp) {
try {
data.clear();
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
data.putInt(last.length);
data.put(last);
data.putInt(first.length);
data.put(first);
data.putInt(emp.id);
data.putInt(emp.zip);
byte employed = 0;
if ( emp.employed )
employed = 1;
data.put(employed);
data.putInt(comments.length);
data.put(comments);
data.flip();
int dataLen = data.limit();
header.clear();
header.put((byte)1); // 1=active record
header.putInt(dataLen);
header.flip();
long headerLen = header.limit();
int length = (int)(headerLen + dataLen);
long offset = getStorageLocation((int)dataLen);
//
// Store the record by key and save the offset
//
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = channel.position();
long jounralLen = channel.size();
if ( jounralLen != currentPos )
channel.position(jounralLen);
offset = jounralLen;
}
else {
// Seek to the returned insertion point
channel.position(offset);
}
// Fist write the header
long written = channel.write(srcs);
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
data.clear();
data.put((byte)0);
data.putInt(newEmptyRecordSize);
data.flip();
channel.write(data);
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
public boolean addRecord_IO(Employee emp) {
try {
byte[] last = emp.last.getBytes();
byte[] first = emp.first.getBytes();
byte[] comments = emp.comments.getBytes();
// Just hard-code the sizes for perfomance
int size = 0;
//size += 1; // active - byte
//size += 4; // size - Integer
size += emp.last.length();
size += 4; // strlen - Integer
size += emp.first.length();
size += 4; // strlen - Integer
size += 4; // emp.id - Integer
size += 4; // emp.zip - Integer
size += 1; // emp.employed - byte
size += emp.comments.length();
size += 4; // strlen - Integer
long offset = getStorageLocation(size);
//
// Store the record by key and save the offset
//
if ( offset == -1 ) {
// We need to add to the end of the journal. Seek there
// now only if we're not already there
long currentPos = journal.getFilePointer();
long jounralLen = journal.length();
if ( jounralLen != currentPos )
journal.seek(jounralLen);
offset = jounralLen;
}
else {
// Seek to the returned insertion point
journal.seek(offset);
}
// Fist write the header
journal.writeByte(1);
journal.writeInt(size);
// Next write the data
journal.writeInt(last.length);
journal.write(last);
journal.writeInt(first.length);
journal.write(first);
journal.writeInt(emp.id);
journal.writeInt(emp.zip);
if ( emp.employed )
journal.writeByte(1);
else
journal.writeByte(0);
journal.writeInt(comments.length);
journal.write(comments);
// Next, see if we need to append an empty record if we inserted
// this new record at an empty location
if ( newEmptyRecordSize != -1 ) {
// Simply write a header
journal.writeByte(0); //inactive record
journal.writeLong(newEmptyRecordSize);
}
employeeIdx.put(emp.last, offset);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
private int newEmptyRecordSize;
private long getStorageLocation(int recordLength)
{
long offset = -1; // where to write new record
try {
// Determine if there's an empty location to insert this new record
// There are a few criteria. The empty location must either be
// an exact fit for the new record with its header (replacing the
// existing empty record's header, or if the empty record is larger,
// it must be large enough for the new record's header and data,
// and another header to mark the empty record so the file can
// be traversed at a later time. In other words, the journal
// consists of sequential records, back-to-back, with no gap in
// between, otherwise it cannot be traversed from front to back
// without adding a substantial amount of indexing within the file.
// Therefore, even deleted records still exist within the journal,
// they are simply marked as deleted. But the record size is still
// part of the record so it can be skipped over when read back in
// First locate an appropriate location. It must match exactly
// or be equal in size to the record to write and a minimal record
// which is just a header (5 bytes). So, HEADER + DATA + HEADER,
// or (data length + 10 bytes).
boolean appendEmptyRecord = false;
newEmptyRecordSize = -1;
// Is there an exact match?
LinkedList<Long> records = emptyIdx.get(recordLength);
if ( records != null ) {
offset = records.remove(); // need to check for null!!!!
// No need to write append an empty record, just return offset
appendEmptyRecord = false;
newEmptyRecordSize = -1;
return offset;
}
// No exact size match, find one just large enough
for ( Integer size : this.emptyIdx.keySet() ) {
if ( size >= recordLength+(Header.HEADER_SIZE*2) ) {
records = emptyIdx.get(size);
if ( records == null ) {
// This was the last empty record of this size
// so delete the entry in the index and continue
// searching for a larger empty region (if any)
emptyIdx.remove(size);
continue;
}
offset = records.remove();
// We need to append an empty record after the new record
// taking the size of the header into account
newEmptyRecordSize = (size - recordLength) - Header.HEADER_SIZE;
appendEmptyRecord = true;
// Store the new empty record's offset
storeEmptyRecord(offset + recordLength, newEmptyRecordSize);
return offset;
}
}
}
catch ( Exception e ) {
e.printStackTrace();
}
return offset;
}
public boolean deleteRecord(FileChannel channel, String key) {
try {
// Locate the record's offset within the journal by key
Long offset = employeeIdx.get(key);
if ( offset == null )
return false;
channel.position(offset);
// Set the record as inactive (header field) in the journal
// Mark the record as inactive
header.clear();
header.put((byte)0); // 0=inactive record
header.flip();
channel.write(header);
// Read the empty record's length
header.clear();
channel.read(header);
header.flip();
int length = header.getInt();
// Store the empty record for later use
storeEmptyRecord(offset, length);
// Remove from key index
employeeIdx.remove(key);
return true;
}
catch ( Exception e ) {
e.printStackTrace();
}
return false;
}
private void storeEmptyRecord(long offset, int length) {
// Store the empty record in an index. Look to see if there
// are other records of the same size (in a LinkedList). If
// so, add this one to the end of the linked list
//
LinkedList<Long> emptyRecs = emptyIdx.get(length);
if ( emptyRecs == null ) {
// There are no other records of this size. Add an entry
// in the hash table for this new linked list of records
emptyRecs = new LinkedList<Long>();
emptyIdx.put(length, emptyRecs);
}
// Add the pointer (file offset) to the new empty record
emptyRecs.add(offset);
}
public void startIntervalTimer(boolean print) {
if ( print )
log("START INTERVAL TIMER");
intervalTimerStart = System.currentTimeMillis();
}
public long endIntervalTimer(boolean print) {
long interval = System.currentTimeMillis() - intervalTimerStart;
if ( print )
log("END INTERVAL TIMER: " + interval);
return interval;
}
public static void log(String s) {
System.out.println(s);
}
public static void main(String[] args) {
Main main = new Main();
}
}