강의 : 김영한의 실전 자바 - 고급 2편, I/O, 네트워크, 리플렉션

2. I/O 기본 1

자바 프로세스가 메모리에 가진 데이터를 hello.dat 라는 디스크에 있는 파일에 저장하려면,
출력 스트림을 사용해 hello.dat파일로 메모리에 있는 데이터를 보내고
반대로 프로세스 외부에 있는 데이터를 자바 프로세스 안으로 가져오려면 입력 스트림을 사용한다.

자바에서 외부 자원(파일, 네트워크 등)에 있는 데이터를 읽고 쓸 때는 스트림을 사용한다.
컴퓨터에서 데이터는 문자가 아닌 바이트로 저장되기 때문에 스트림은 바이트를 사용한다.

2-1. 파일 읽기

2-1-1. 파일의 끝까지 읽기

파일의 데이터를 읽을 때 파일의 끝까지 읽어야 한다면 다음과 같이 반복문을 사용하면 된다.
입력 스트림의 read() 메서드는 파일의 끝에 도달하면 -1을 반환한다. 따라서 -1을 반환할 때 까지 반복문을 사용하면 파일의 데이터를 모두 읽을 수 있다.

public class StreamStartMain2 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("temp/hello.dat");
        fos.write(65);
        fos.write(66);
        fos.write(67);
        fos.close();

        FileInputStream fis = new FileInputStream("temp/hello.dat");

        int data;
        while((data = fis.read()) != -1) {
            System.out.println(data);
        }
        fis.close();
    }
}

// 실행 결과
65
66
67

참고 - read()가 int를 반환하는 이유

자바의 byte는 -128~127까지 256 종류의 값만 가질 수 있어 EOF를 위한 특별한 값을 할당하기 어렵다.
int 타입을 반환함으로서 0~255 까지 모든 가능한 바이트 값을 부호 없이 표현하고, 여기에 -1로 EOF를 추가해 스트림의 끝을 나타낼 수 있다.
write()의 경우도 비슷한 이유로 int 타입을 입력 받는다.

2-1-2. byte[]을 사용해 원하는 크기 만큼 저장하고 읽는 방법

출력 스트림

write(byte[]): byte[]에 원하는 데이터를 담고 write()에 전달하면 해당 데이터를 한 번에 출력할 수 있다.

입력 스트림

read(byte[], offset, length): byte[]를 미리 만들어두고, 만들어 둔 byte[]로 한 번에 데이터를 읽어올 수 있다.

byte[]: 데이터가 담는 버퍼
offset: 데이터가 기록되는 Byte[]의 인덱스 시작 위치
length: 읽어올 byte의 최대 길이
반환 값: 버퍼에 읽은 총 바이트 수. 스트림의 끝에 도달해 더 이상 데이터가 없는 경우 -1을 반환
offset, length 파라메터를 생략한 read(byte[]) 메서드도 있다. offset: 0, length: byte[].length 로 설정해 호출한다.

public class StreamStartMain3 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("temp/hello.dat");
        byte[] input = {65, 66, 67};
        fos.write(input);
        fos.close();

        FileInputStream fis = new FileInputStream("temp/hello.dat");
        byte[] buffer = new byte[10];
        int readCount = fis.read(buffer, 0, 10);

        System.out.println("readCount = " + readCount);
        System.out.println(Arrays.toString(buffer));
        fis.close();
    }
}

2-1-3. 모든 byte 한 번에 읽기

readAllBytes()를 사용하면 파일의 끝에 도달해 스트림이 끝날 때 까지 모든 데이터를 한 번에 읽어올 수 있다.

public class StreamStartMain4 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("temp/hello.dat");
        byte[] input = {65, 66, 67};
        fos.write(input);
        fos.close();

        FileInputStream fis = new FileInputStream("temp/hello.dat");
        byte[] readBytes = fis.readAllBytes();
        System.out.println(Arrays.toString(readBytes));
        fis.close();
    }
}

부분으로 나누어 읽기 vs 전체 읽기

read(byte[], offset, length)
- 스트림의 내용을 부분적으로 ㅇ릭거나, 읽은 내용을 처리하면서 트릠을 계속해서 읽어야 할 경우에 적합하다.
- 메모리 사용량을 제어할 수 있다.
- 100M의 파일을 1M 단위로 나누어 읽고 처리하는 방식을 사용하면 한 번에 최대 1M의 메모리만 사용한다.
readAllBytes()
- 한 번의 호출로 모든 데이터를 읽을 수 있어 편리하다.
- 작은 파일이나 메모리에 모든 내용을 올려서 처리해야 하는 경우에 적합하다.
- 메모리 사용량을 제어할 수 없고 큰 파일의 경우 OutOfMemoryError가 발생할 수 있다.

2-2. InputStream, OutputStream

자바 프로세스의 메모리에 있는 데이터를 외부에 있는 파일에 저장하거나 네트워크를 통해 전송할 때 모두 스트림을 사용해 byte 단위로 데이터를 주고 받는다.
파일, 네트워크, 콘솔 등 외부 자원과 데이터를 주고 받는 방식을 통힐하기 위해 자바는 InputStream, OutputStream 이라는 기본 추상 클래스를 제공한다.

InputStream은 read(), read(byte[]), readAllBytes()를 제공한다.
InputStream을 상속 받는 클래스로는 FileInputStream, ByteArrayInputStream, SocketInputStream 등이 있다.

OutputStream은 write(int), write(byte[])를 제공한다.
OutputStream을 상속 받는 클래스로는 FileOutputStream, ByteArrayOutputStream, SocketOutputStream 등이 있다.

2-2-1. 메모리 스트림

메모리에 어떤 데이터를 저장하고 읽을 때는 컬렉션이나 배열을 사용하면 되기 때문에, 이 기능은 잘 사용하지 않는다.
주로 스트림을 간단하게 테스트 하거나 스트림의 데이터를 확인하는 용도로 사용한다.

public class ByteArrayStreamMain {

    public static void main(String[] args) throws IOException {
        byte[] input = {1, 2, 3};

        // 메모리에 쓰기
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        baos.write(input);

        // 메모리에서 읽기
        ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
        byte[] bytes = bais.readAllBytes();
        System.out.println(Arrays.toString(bytes));
    }
}

2-2-2. 콘솔 스트림

System.out은 OutputStream을 상속 받는 PrintStream이다. 이 스트림은 자바가 시작될 때 자동으로 만들어진다.
println(String)은 PrintStream이 자체적으로 제공하는 추가 기능이다.

import static java.nio.charset.StandardCharsets.UTF_8;
  public class PrintStreamMain {
      public static void main(String[] args) throws IOException {
          PrintStream printStream = System.out;
          byte[] bytes = "Hello!\n".getBytes(UTF_8);
          printStream.write(bytes);
          printStream.println("Print!");
} }

2-3. 파일 입출력과 성능 최적화1 - 하나씩 읽고 쓰기

2-3-1. 하나씩 쓰기

1을 1000만 번(10 * 1024 * 1024) 호출해 10MB의 파일이 만들어진다.

public class BufferedConst {
    public static final String FILE_NAME = "temp/buffered.dat";
    public static final int FILE_SIZE = 10 * 1024 * 1024;
    public static final int BUFFER_SIZE = 8192;
}

public class CreateFileV1 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream(FILE_NAME);
        long startTime = System.currentTimeMillis();

        for (int i = 0; i < FILE_SIZE; i++) {
            fos.write(1);
        }
        fos.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File created: " + FILE_NAME);
        System.out.println("File size: " + FILE_SIZE / 1024 / 1024 + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

2-3-2. 하나씩 읽기

1byte씩 데이터를 읽는다. 파일의 크기가 10MB 이므로 fis.read() 메서드를 약 1000만 번 호출한다.

public class ReadFileV1 {

    public static void main(String[] args) throws IOException {

        FileInputStream fis = new FileInputStream(FILE_NAME);
        long startTime = System.currentTimeMillis();

        int fileSize = 0;
        int data;
        while ((data = fis.read()) != -1) {
            fileSize++;
        }
        fis.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File name: " + FILE_NAME);
        System.out.println("File size: " + (fileSize / 1024/ 1024) + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

10MB 크기의 파일을 하나씩 읽고 쓰면 오랜 시간이 걸린다. 오래 걸린 이유는 자바에서 1byte 씩 디스크에 데이터를 전달하기 때문이다.
디스크는 1byte의 데이터를 받아 1byte의 데이터를 쓰는 과정을 1000만 번 반복한다.

write()나 read()를 호출할 때마다 OS의 시스템 콜을 통해 파일을 읽거나 쓰는 명령어를 전달하는데 시스템 콜은 상대적으로 무거운 작업이다.
HDD, SDD 같은 외부 자원들도 하나의 데이터를 읽고 쓸 때 마다 메모리에 비해 많은 시간이 걸린다.

이런 문제를 해결하려면 한 번의 읽기/쓰기 호출에 많은 데이터를 담아 보낸다.

2-4. 파일 입출력과 성능 최적화2 - 버퍼 활용

byte[]을 통해 배열에 담에서 한 번에 여러 byte를 전달한다.

2-4-1. 버퍼로 쓰기

BufferedConst.BUFFER_SIZE에 정의된 만큼 데이터를 모아서 write()를 호출한다.
BUFFER_SIZE를 8192(8KB)로 설정해 결과를 보면 하나씩 쓰는 것에 비해 1000배 가까이 빨라진 것을 확인할 수 있다.
BUFFER_SIZE 크기를 늘려가며 테스트를 하면 아래와 같이 나오는데, 버퍼의 크기가 커진다고 해서 속도가 계속 줄어들진 않는다.
디스크나 파일 시스템에서 데이터를 읽고 쓰는 기본 단위고 보통 4KB 또는 8KB 이기 때문이다.
많은 데이터를 버퍼에 담아 보내도 디스크나 파일 시스템에서 기본 단위로 나누어 저장하기 때문에 효율에는 한계가 있다.
따라서 버퍼의 크기는 보통 4KB, 8KB 정도로 잡는 것이 효율적이다.
|BUFFER_SIZE|걸린 시간| |:–:|:–:| |1|13s| |10|1s| |100|150ms| |1000|23ms| |2000|17ms| |4000|13ms| |8000|10ms| |80000|10ms|

public class CreateFileV2 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream(FILE_NAME);
        long startTime = System.currentTimeMillis();

        byte[] buffer = new byte[BUFFER_SIZE];
        int bufferIndex = 0;

        for (int i = 0; i < FILE_SIZE; i++) {
            buffer[bufferIndex++] = 1;

            // 버퍼가 가득 차면 쓰고, 버퍼를 비운다.
            if (bufferIndex == BUFFER_SIZE) {
                fos.write(buffer);
                bufferIndex = 0;
            }
        }

        if (bufferIndex > 0) {
            fos.write(buffer, 0, bufferIndex);
        }

        fos.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File created: " + FILE_NAME);
        System.out.println("File size: " + FILE_SIZE / 1024 / 1024 + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

2-4-2. 버퍼로 읽기

버퍼를 사용해 읽으면 약 1000배 빨라진 것을 확인할 수 있다.

public class ReadFileV2 {

    public static void main(String[] args) throws IOException {

        FileInputStream fis = new FileInputStream(FILE_NAME);
        long startTime = System.currentTimeMillis();

        byte[] buffer = new byte[BUFFER_SIZE];
        int fileSize = 0;
        int size;
        while ((size = fis.read(buffer)) != -1) {
            fileSize += size;
        }
        fis.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File name: " + FILE_NAME);
        System.out.println("File size: " + (fileSize / 1024/ 1024) + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

2-5. 파일 입출력과 성능 최적화3 - BufferedOutputStream 읽기/쓰기

BufferedOutputStream은 버퍼 기능을 내부에서 대신 처리해 준다.
BufferedOutputStream은 내부에서 단순히 버퍼 기능만 제공하므로 대상 OutputStream을 파라메터로 전달해야 한다.

FileOutputStream과 같이 단독으로 사용할 수 있는 스트림을 기본 스트림이라 한다.
BufferedOutputStream과 같이 단독으로 사용할 수 없고, 보조 기능을 제공하는 스트림을 보조 스트림이라고 한다.

2-5-1. BufferedOutputStream 쓰기

public class CreateFileV3 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream(FILE_NAME);
        BufferedOutputStream bos = new BufferedOutputStream(fos, BUFFER_SIZE);
        long startTime = System.currentTimeMillis();

        for (int i = 0; i < FILE_SIZE; i++) {
            bos.write(1);
        }
        fos.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File created: " + FILE_NAME);
        System.out.println("File size: " + FILE_SIZE / 1024 / 1024 + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

2-5-2. BufferedOutputStream 동작

2-5-2-1. 실행 순서

BufferedOutputStream은 내부에 byte[] buf 라는 버퍼를 갖고 있다.
BufferedOutputStream에 write(byte)를 통해 byte 하나를 전달하면 byte[] buf에 보관된다.
byte[] buf가 가득 차면 FileOutputStream에 있는 write(byte[]) 메서드를 호출한다.
FileOutputStream에서 전달된 모든 byte[]를 시스템 콜로 OS에 전달하고 byte[] buf의 내용을 비운다.
이후에 write(byte)가 호출되면 다시 버퍼를 채운다.

2-5-2-2. flush()

byte[] buf가 다 차지 않아도 버퍼에 남아있는 데이터를 전달하려면 flush() 라는 메서드를 호출하면 된다.
BufferedOutputStream은 flush()를 호출하면 버퍼에 남은 데이터를 FileOutputStream에 전달하고 버퍼를 비운다.

2-4-2-3. close()

byte[] buf에 데이터가 남아있는 상태로 BufferedOutputStream을 close() 하면,
내부에서 flush()를 호출해 버퍼에 남아 있는 데이터를 모두 전달하고 비운 뒤 BufferedOutputStream의 자원을 정리한다.
그 다음 연결된 스트림(FileOutputStream)의 close()를 호출한다.

2-5-3. BufferdOutputStream 읽기

public class ReadFileV3 {

    public static void main(String[] args) throws IOException {

        FileInputStream fis = new FileInputStream(FILE_NAME);
        BufferedInputStream bis = new BufferedInputStream(fis, BUFFER_SIZE);
        long startTime = System.currentTimeMillis();

        int fileSize = 0;
        int data;
        while ((data = bis.read()) != -1) {
            fileSize++;
        }
        bis.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File name: " + FILE_NAME);
        System.out.println("File size: " + (fileSize / 1024/ 1024) + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

2-5-3-1. 실행 순서

BufferedInputStream의 read()는 1byte만 조회한다.
BufferedInputStream은 FileInputStream에서 read(byte[])를 사용해 버퍼 크기 만큼의 데이터를 불러온다.
불러온 데이터를 버퍼에 보관한다.
버퍼에 있는 데이터 중에 1byte를 반환한다.
버퍼가 비어 있으면 다시 버퍼 크기 만큼의 데이터를 불러와서 담아둔다.

2-5-4. 버퍼를 직접 다루는 것 보다 BufferedInput/OutputStream의 성능이 떨어지는 이유

버퍼를 직접 다루는 것이 BufferedInput/OutputStream을 사용하는 것 보다 더 빠르다.
이 이유는 BufferedOutputStream.write()는 동기화 처리가 되어 있기 때문이다.

자바의 BufferedXxx 클래스는 처음부터 멀티 스레드를 고려해서 만든 클래스로 모두 동기화 처리가 되어있다.
따라서 멀티 스레드에 안전하지만 락을 걸고 푸는 동기화 코드로 인해 성능이 약간 저하될 수 있다.
하지만 시윽ㄹ 스레드 상황에서는 동기화 락이 필요하지 않기 때문에 직접 버퍼를 다룰 때에 비해서 성능이 떨어진다.

@Override
public void write(int b) throws IOException {
    if (lock != null) {
        lock.lock();
        try {
            implWrite(b);
        } finally {
            lock.unlock();
}
} else {
        synchronized (this) {
            implWrite(b);
} }
}

2-5-4. 파일 입출력과 성능 최적화5 - 한 번에 쓰기

파일의 크기가 크기 않다면 간단하게 한 번에 쓰고 읽는 것도 좋은 방법이다.
그러나 메모리를 한 번에 많이 사용하기 때문에 읽으려는 파일의 크기가 작아야한다.

디스크나 파일 시스템에서 데이터를 읽고 쓰는 기본 단위가 보통 4KB, 8KB 이므로,
읽기, 쓰기의 실행 시간은 8KB 버퍼를 직접 사용한 예제2와 오차 범위 정도로 거의 비슷하다.
자바 구현에 따라 다르지만 보통 4KB, 8KB, 16KB 단위로 데이터를 읽어들인다.

2-5-4-1. 쓰기

public class CreateFileV4 {

    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream(FILE_NAME);
        long startTime = System.currentTimeMillis();

        byte[] buffer = new byte[FILE_SIZE];
        for (int i = 0; i < FILE_SIZE; i++) {
            buffer[i] = 1;
        }
        fos.write(buffer);
        fos.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File created: " + FILE_NAME);
        System.out.println("File size: " + FILE_SIZE / 1024 / 1024 + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}

2-5-4-2. 읽기

public class ReadFileV4 {

    public static void main(String[] args) throws IOException {

        FileInputStream fis = new FileInputStream(FILE_NAME);
        long startTime = System.currentTimeMillis();

        byte[] bytes = fis.readAllBytes();
        fis.close();

        long endTime = System.currentTimeMillis();
        System.out.println("File name: " + FILE_NAME);
        System.out.println("File size: " + (bytes.length / 1024/ 1024) + "MB");
        System.out.println("Time taken: " + (endTime - startTime) + "ms");
    }
}