HHeLiBeXの日記 正道編

日々の記憶の記録とメモ‥

CSVファイルの扱いに関する挙動の比較(2)

前回、以下の記事を書いた。

hhelibex.hatenablog.jp

その中で、「自身が書き出したCSVファイルを読み込むとエラーになる」可能性が出たので、実際に検証してみた。 なお、今回は、PHPについては7.4.0以降を対象とする。

事前準備

以下のデータを共通のデータとして使用する。

  • CsvTestData.php
<?php

class CsvTestData {
    private static $data = array(
        array(
            array('なんてことない文字列', 'abc!#$%&\'()-=^~@`[]{};+:*,.<>/?_123'),
        ),
        array(
            array('空文字列', ''),
        ),
        array(
            array('カンマ', ','),
        ),
        array(
            array('ダブルクォーテーション', '"'),
        ),
        array(
            array('バックスラッシュ', '\\'),
        ),
        array(
            array('空白文字', ' bbb '),
        ),
        array(
            array('改行', "a\nb\r\nc"),
        ),
        array(
            array('テスト1', 'a"b c,d'),
        ),
        array(
            array('テスト2', 'a"b"c d,e,f'),
        ),
        array(
            array('テスト3', 'a\"b\ c\,d'),
        ),
        array(
            array('テスト4', 'a\"b\"c\ d,e,f'),
        ),
    );
    /**
     * @return int
     */
    public static function size() {
        return count(self::$data);
    }
    /**
     * @param int $idx
     * @return
     */
    public static function get($idx) {
        return self::$data[$idx];
    }
}
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class CsvTestData {
    private static String[][][] data = {
        {
            { "なんてことない文字列", "abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123" },
        },
        {
            { "空文字列", "" },
        },
        {
            { "カンマ", "," },
        },
        {
            { "ダブルクォーテーション", "\"" },
        },
        {
            { "バックスラッシュ", "\\" },
        },
        {
            { "空白文字", " bbb " },
        },
        {
            { "改行", "a\nb\r\nc" },
        },
        {
            { "テスト1", "a\"b c,d" },
        },
        {
            { "テスト2", "a\"b\"c d,e,f" },
        },
        {
            { "テスト3", "a\\\"b\\ c\\,d" },
        },
        {
            { "テスト4", "a\\\"b\\\"c\\ d,e,f" },
        },
    };
    public static int size() {
        return data.length;
    }
    public static List<List<String>> get(int idx) {
        List<List<String>> res = new ArrayList<>();
        for (int i = 0; i < data[idx].length; ++i) {
            res.add(new ArrayList<>(Arrays.asList(data[idx][i])));
        }
        return res;
    }
}

また、Java用に、PHPのfile_get_contentsに相当するライブラリを用意する。

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.io.IOException;

public class Utils {
    public static String getFileContents(String filename, String fileEncoding) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), fileEncoding));
        char[] buf = new char[1024];
        StringBuilder sb = new StringBuilder();
        int len;
        while ((len = reader.read(buf)) > 0) {
            sb.append(buf, 0, len);
        }
        return sb.toString();
    }
}

fgetcsv/fputcsv (PHP (7.4.0以降))

なぜ7.4.0以降かというと、fputcsvのescapeに空文字列を指定したいため。7.4.0より前だと、escapeには1文字を指定しないとダメという警告が出てしまう。

<?php

mb_internal_encoding('UTF-8');

require_once('CsvTestData.php');

for ($i = 0; $i < CsvTestData::size(); ++$i) {
    $filename = "test${i}.csv";

    $rows = CsvTestData::get($i);

    $original = $rows;

    // 書き込み
    $fp = fopen($filename, "w");
    foreach ($rows as $row) {
        foreach ($row as $key => $val) {
            $row[$key] = mb_convert_encoding($val, 'SJIS-win', 'UTF-8');
        }
        fputcsv($fp, $row, ',', '"', '');
    }
    fflush($fp);
    fclose($fp);

    // ファイルの内容
    printf("[%d]\n", $i);
    printf("file     = %s\n", var_export(mb_convert_encoding(file_get_contents($filename), 'UTF-8', 'SJIS-win'), true));

    // 読み込み
    $fp = fopen($filename, "r");
    for ($j = 0; ($row = fgetcsv($fp, 0, ',', '"', '')); ++$j) {
        foreach ($row as $key => $val) {
            $row[$key] = mb_convert_encoding($val, 'UTF-8', 'SJIS-win');
        }
        printf("original = %s\n",
            var_export($original[$j], true));
        printf("row      = %s\n",
            var_export($row, true));
        printf("result   = %s\n",
            ($row === $original[$j] ? "O" : "X"));
    }
    fclose($fp);
}

実行結果。

[0]
file     = 'なんてことない文字列,"abc!#$%&\'()-=^~@`[]{};+:*,.<>/?_123"
'
original = array (
  0 => 'なんてことない文字列',
  1 => 'abc!#$%&\'()-=^~@`[]{};+:*,.<>/?_123',
)
row      = array (
  0 => 'なんてことない文字列',
  1 => 'abc!#$%&\'()-=^~@`[]{};+:*,.<>/?_123',
)
result   = O
[1]
file     = '空文字列,
'
original = array (
  0 => '空文字列',
  1 => '',
)
row      = array (
  0 => '空文字列',
  1 => '',
)
result   = O
[2]
file     = 'カンマ,","
'
original = array (
  0 => 'カンマ',
  1 => ',',
)
row      = array (
  0 => 'カンマ',
  1 => ',',
)
result   = O
[3]
file     = 'ダブルクォーテーション,""""
'
original = array (
  0 => 'ダブルクォーテーション',
  1 => '"',
)
row      = array (
  0 => 'ダブルクォーテーション',
  1 => '"',
)
result   = O
[4]
file     = 'バックスラッシュ,\\
'
original = array (
  0 => 'バックスラッシュ',
  1 => '\\',
)
row      = array (
  0 => 'バックスラッシュ',
  1 => '\\',
)
result   = O
[5]
file     = '空白文字," bbb "
'
original = array (
  0 => '空白文字',
  1 => ' bbb ',
)
row      = array (
  0 => '空白文字',
  1 => ' bbb ',
)
result   = O
[6]
file     = '改行,"a
b
c"
'
original = array (
  0 => '改行',
  1 => 'a
b
c',
)
row      = array (
  0 => '改行',
  1 => 'a
b
c',
)
result   = O
[7]
file     = 'テスト1,"a""b c,d"
'
original = array (
  0 => 'テスト1',
  1 => 'a"b c,d',
)
row      = array (
  0 => 'テスト1',
  1 => 'a"b c,d',
)
result   = O
[8]
file     = 'テスト2,"a""b""c d,e,f"
'
original = array (
  0 => 'テスト2',
  1 => 'a"b"c d,e,f',
)
row      = array (
  0 => 'テスト2',
  1 => 'a"b"c d,e,f',
)
result   = O
[9]
file     = 'テスト3,"a\\""b\\ c\\,d"
'
original = array (
  0 => 'テスト3',
  1 => 'a\\"b\\ c\\,d',
)
row      = array (
  0 => 'テスト3',
  1 => 'a\\"b\\ c\\,d',
)
result   = O
[10]
file     = 'テスト4,"a\\""b\\""c\\ d,e,f"
'
original = array (
  0 => 'テスト4',
  1 => 'a\\"b\\"c\\ d,e,f',
)
row      = array (
  0 => 'テスト4',
  1 => 'a\\"b\\"c\\ d,e,f',
)
result   = O

全てのテストにパスしている。

Super CSV (Java)

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.util.ArrayList;
import java.util.List;

import org.supercsv.io.CsvListReader;
import org.supercsv.io.CsvListWriter;
import org.supercsv.io.ICsvListReader;
import org.supercsv.io.ICsvListWriter;
import org.supercsv.prefs.CsvPreference;

public class Main {
    private static final String FILE_ENCODING = "Windows-31J";

    public static void main(String[] args) throws IOException {
        for (int i = 0; i < CsvTestData.size(); ++i) {
            String filename = "test" + i + ".csv";

            List<List<String>> original = CsvTestData.get(i);

            // 書き込み
            writeCsv(original, filename);

            // ファイルの内容
            String fileContents = Utils.getFileContents(filename, FILE_ENCODING);

            // 読み込み
            List<List<String>> rows = new ArrayList<>();
            try {
                rows = readCsv(filename);
            } catch (Exception e) {
                e.printStackTrace();
            }

            System.out.printf("[%d]%n", i);
            System.out.printf("file     = '%s'%n", fileContents);
            if (original.size() == rows.size()) {
                for (int j = 0; j < rows.size(); ++j) {
                    System.out.printf("original = %s%n", original.get(j));
                    System.out.printf("row      = %s%n", rows.get(j));
                    System.out.printf("result   = %s%n", (rows.get(j).equals(original.get(j)) ? "O" : "X"));
                }
            } else {
                System.out.printf("original = %s%n", original);
                System.out.printf("row      = %s%n", rows);
                System.out.printf("result   = %s%n", "X");
            }
        }
    }
    private static void writeCsv(List<List<String>> rows, String filename) throws IOException {
        CsvPreference csvPref = new CsvPreference.Builder(
            CsvPreference.STANDARD_PREFERENCE).surroundingSpacesNeedQuotes(true).build();

        ICsvListWriter writer = new CsvListWriter(
            new OutputStreamWriter(new FileOutputStream(filename), FILE_ENCODING),
            csvPref);
        for (List<String> row : rows) {
            writer.write(row);
        }
        writer.flush();
        writer.close();
    }
    private static List<List<String>> readCsv(String filename) throws IOException {
        CsvPreference csvPref = new CsvPreference.Builder(
            CsvPreference.STANDARD_PREFERENCE).surroundingSpacesNeedQuotes(true).build();
        ICsvListReader reader = new CsvListReader(
            new BufferedReader(new InputStreamReader(new FileInputStream(filename), FILE_ENCODING)), csvPref);

        List<List<String>> res = new ArrayList<>();
        List<String> row;
        while ((row = reader.read()) != null) {
            res.add(row);
        }
        reader.close();

        return res;
    }
}

実行結果。

[0]
file     = 'なんてことない文字列,"abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123"
'
original = [なんてことない文字列, abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
row      = [なんてことない文字列, abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
result   = O
[1]
file     = '空文字列,
'
original = [空文字列, ]
row      = [空文字列, null]
result   = X
[2]
file     = 'カンマ,","
'
original = [カンマ, ,]
row      = [カンマ, ,]
result   = O
[3]
file     = 'ダブルクォーテーション,""""
'
original = [ダブルクォーテーション, "]
row      = [ダブルクォーテーション, "]
result   = O
[4]
file     = 'バックスラッシュ,\
'
original = [バックスラッシュ, \]
row      = [バックスラッシュ, \]
result   = O
[5]
file     = '空白文字," bbb "
'
original = [空白文字,  bbb ]
row      = [空白文字,  bbb ]
result   = O
[6]
file     = '改行,"a
b
c"
'
original = [改行, a
b
c]
row      = [改行, a
b
c]
result   = X
[7]
file     = 'テスト1,"a""b c,d"
'
original = [テスト1, a"b c,d]
row      = [テスト1, a"b c,d]
result   = O
[8]
file     = 'テスト2,"a""b""c d,e,f"
'
original = [テスト2, a"b"c d,e,f]
row      = [テスト2, a"b"c d,e,f]
result   = O
[9]
file     = 'テスト3,"a\""b\ c\,d"
'
original = [テスト3, a\"b\ c\,d]
row      = [テスト3, a\"b\ c\,d]
result   = O
[10]
file     = 'テスト4,"a\""b\""c\ d,e,f"
'
original = [テスト4, a\"b\"c\ d,e,f]
row      = [テスト4, a\"b\"c\ d,e,f]
result   = O

全てのテストにパスしていると言いたいところだが、空文字列に対してnullで返す仕様なのが惜しい。

改行コードは、ファイルに書き込む際に変換されてしまうらしい。

$ od -cx test6.csv 
0000000 211 374 215   s   ,   "   a  \r  \n   b  \r  \n   c   "  \r  \n
           fc89    738d    222c    0d61    620a    0a0d    2263    0a0d
0000020

「"a\nb\r\nc"」が「"a\r\nb\r\nc"」になっている。更に読み込んだ際に、今回もCentOS 7環境で実行したため、「"a\nb\nc"」に変換されている。

Super CSV Annotation

import com.github.mygreen.supercsv.annotation.CsvBean;
import com.github.mygreen.supercsv.annotation.CsvColumn;

@CsvBean(header=false)
public class HogeBean {
    @CsvColumn(number=1)
    private String fieldA;
    @CsvColumn(number=2)
    private String fieldB;

    public HogeBean() {
    }

    public String getFieldA() {
        return fieldA;
    }
    public void setFieldA(String fieldA) {
        this.fieldA = fieldA;
    }

    public String getFieldB() {
        return fieldB;
    }
    public void setFieldB(String fieldB) {
        this.fieldB = fieldB;
    }

    public boolean equals(Object o) {
        if (o instanceof HogeBean) {
            HogeBean that = (HogeBean)o;
            return (this.fieldA == null && that.fieldA == null
                || this.fieldA != null && this.fieldA.equals(that.fieldA))
                && (this.fieldB == null && that.fieldB == null
                    || this.fieldB != null && this.fieldB.equals(that.fieldB));
        }
        return false;
    }
    public String toString() {
        return "[" + fieldA + "][" + fieldB + "]";
    }
}
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.util.ArrayList;
import java.util.List;

import com.github.mygreen.supercsv.io.CsvAnnotationBeanReader;
import com.github.mygreen.supercsv.io.CsvAnnotationBeanWriter;

import org.supercsv.prefs.CsvPreference;

public class Main {
    private static final String FILE_ENCODING = "Windows-31J";

    public static void main(String[] args) throws IOException {
        for (int i = 0; i < CsvTestData.size(); ++i) {
            String filename = "test" + i + ".csv";

            List<List<String>> tmpOriginal = CsvTestData.get(i);
            List<HogeBean> original = new ArrayList<>();
            for (List<String> list : tmpOriginal) {
                HogeBean hoge = new HogeBean();
                hoge.setFieldA(list.get(0));
                hoge.setFieldB(list.get(1));
                original.add(hoge);
            }

            // 書き込み
            writeCsv(original, filename);

            // ファイルの内容
            String fileContents = Utils.getFileContents(filename, FILE_ENCODING);

            // 読み込み
            List<HogeBean> rows = new ArrayList<>();
            try {
                rows = readCsv(filename);
            } catch (Exception e) {
                e.printStackTrace();
            }

            System.out.printf("[%d]%n", i);
            System.out.printf("file     = '%s'%n", fileContents);
            if (original.size() == rows.size()) {
                for (int j = 0; j < rows.size(); ++j) {
                    System.out.printf("original = %s%n", original.get(j));
                    System.out.printf("row      = %s%n", rows.get(j));
                    System.out.printf("result   = %s%n", (rows.get(j).equals(original.get(j)) ? "O" : "X"));
                }
            } else {
                System.out.printf("original = %s%n", original);
                System.out.printf("row      = %s%n", rows);
                System.out.printf("result   = %s%n", "X");
            }
        }
    }
    private static void writeCsv(List<HogeBean> rows, String filename) throws IOException {
        CsvPreference csvPref = new CsvPreference.Builder(
            CsvPreference.STANDARD_PREFERENCE).surroundingSpacesNeedQuotes(true).build();
        CsvAnnotationBeanWriter<HogeBean> writer = new CsvAnnotationBeanWriter<>(
            HogeBean.class,
            new OutputStreamWriter(new FileOutputStream(filename), FILE_ENCODING),
            csvPref);
        for (HogeBean hoge : rows) {
            writer.write(hoge);
        }
        writer.flush();
        writer.close();
    }
    private static List<HogeBean> readCsv(String filename) throws IOException {
        CsvPreference csvPref = new CsvPreference.Builder(
            CsvPreference.STANDARD_PREFERENCE).surroundingSpacesNeedQuotes(true).build();
        CsvAnnotationBeanReader<HogeBean> reader = new CsvAnnotationBeanReader<>(
            HogeBean.class,
            new BufferedReader(new InputStreamReader(new FileInputStream(filename), FILE_ENCODING)),
            csvPref);

        List<HogeBean> res = new ArrayList<>();
        HogeBean hoge;
        while ((hoge = reader.read()) != null) {
            res.add(hoge);
        }
        reader.close();

        return res;
    }
}

実行結果。

[0]
file     = 'なんてことない文字列,"abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123"
'
original = [なんてことない文字列][abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
row      = [なんてことない文字列][abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
result   = O
[1]
file     = '空文字列,
'
original = [空文字列][]
row      = [空文字列][null]
result   = X
[2]
file     = 'カンマ,","
'
original = [カンマ][,]
row      = [カンマ][,]
result   = O
[3]
file     = 'ダブルクォーテーション,""""
'
original = [ダブルクォーテーション]["]
row      = [ダブルクォーテーション]["]
result   = O
[4]
file     = 'バックスラッシュ,\
'
original = [バックスラッシュ][\]
row      = [バックスラッシュ][\]
result   = O
[5]
file     = '空白文字," bbb "
'
original = [空白文字][ bbb ]
row      = [空白文字][ bbb ]
result   = O
[6]
file     = '改行,"a
b
c"
'
original = [改行][a
b
c]
row      = [改行][a
b
c]
result   = X
[7]
file     = 'テスト1,"a""b c,d"
'
original = [テスト1][a"b c,d]
row      = [テスト1][a"b c,d]
result   = O
[8]
file     = 'テスト2,"a""b""c d,e,f"
'
original = [テスト2][a"b"c d,e,f]
row      = [テスト2][a"b"c d,e,f]
result   = O
[9]
file     = 'テスト3,"a\""b\ c\,d"
'
original = [テスト3][a\"b\ c\,d]
row      = [テスト3][a\"b\ c\,d]
result   = O
[10]
file     = 'テスト4,"a\""b\""c\ d,e,f"
'
original = [テスト4][a\"b\"c\ d,e,f]
row      = [テスト4][a\"b\"c\ d,e,f]
result   = O

これもSuper CSVと同じく、空文字列をnullにしてしまうところが惜しい。あとは改行コードも。

OrangeSignal CSV (Java)

import com.orangesignal.csv.annotation.CsvColumn;
import com.orangesignal.csv.annotation.CsvEntity;

@CsvEntity(header=false)
public class HogeBean {
    @CsvColumn(position=0)
    private String fieldA;
    @CsvColumn(position=1)
    private String fieldB;

    public HogeBean() {
    }

    public String getFieldA() {
        return fieldA;
    }
    public void setFieldA(String fieldA) {
        this.fieldA = fieldA;
    }

    public String getFieldB() {
        return fieldB;
    }
    public void setFieldB(String fieldB) {
        this.fieldB = fieldB;
    }

    public boolean equals(Object o) {
        if (o instanceof HogeBean) {
            HogeBean that = (HogeBean)o;
            return (this.fieldA == null && that.fieldA == null
                || this.fieldA != null && this.fieldA.equals(that.fieldA))
                && (this.fieldB == null && that.fieldB == null
                    || this.fieldB != null && this.fieldB.equals(that.fieldB));
        }
        return false;
    }
    public String toString() {
        return "[" + fieldA + "][" + fieldB + "]";
    }
}
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.util.ArrayList;
import java.util.List;

import com.orangesignal.csv.annotation.CsvColumnException;
import com.orangesignal.csv.CsvConfig;
import com.orangesignal.csv.CsvReader;
import com.orangesignal.csv.CsvWriter;
import com.orangesignal.csv.io.CsvEntityReader;
import com.orangesignal.csv.io.CsvEntityWriter;

public class Main {
    private static final String FILE_ENCODING = "Windows-31J";

    public static void main(String[] args) throws IOException {
        for (int i = 0; i < CsvTestData.size(); ++i) {
            String filename = "test" + i + ".csv";

            List<List<String>> tmpOriginal = CsvTestData.get(i);
            List<HogeBean> original = new ArrayList<>();
            for (List<String> list : tmpOriginal) {
                HogeBean hoge = new HogeBean();
                hoge.setFieldA(list.get(0));
                hoge.setFieldB(list.get(1));
                original.add(hoge);
            }

            // 書き込み
            writeCsv(original, filename);

            // ファイルの内容
            String fileContents = Utils.getFileContents(filename, FILE_ENCODING);

            // 読み込み
            List<HogeBean> rows = new ArrayList<>();
            try {
                rows = readCsv(filename);
            } catch (Exception e) {
                e.printStackTrace();
            }

            System.out.printf("[%d]%n", i);
            System.out.printf("file     = '%s'%n", fileContents);
            if (original.size() == rows.size()) {
                for (int j = 0; j < rows.size(); ++j) {
                    System.out.printf("original = %s%n", original.get(j));
                    System.out.printf("row      = %s%n", rows.get(j));
                    System.out.printf("result   = %s%n", (rows.get(j).equals(original.get(j)) ? "O" : "X"));
                }
            } else {
                System.out.printf("original = %s%n", original);
                System.out.printf("row      = %s%n", rows);
                System.out.printf("result   = %s%n", "X");
            }
        }
    }
    private static void writeCsv(List<HogeBean> rows, String filename) throws IOException {
        CsvConfig cfg = new CsvConfig(',', '"', '"');
        CsvEntityWriter<HogeBean> writer = CsvEntityWriter.newInstance(new CsvWriter(
            new OutputStreamWriter(new FileOutputStream(filename), FILE_ENCODING), cfg), HogeBean.class);

        for (HogeBean hoge : rows) {
            writer.write(hoge);
        }
        writer.flush();
        writer.close();
    }
    private static List<HogeBean> readCsv(String filename) throws IOException {
        CsvConfig cfg = new CsvConfig(',', '"', '"');
        CsvEntityReader<HogeBean> reader = CsvEntityReader.newInstance(new CsvReader(
            new BufferedReader(new InputStreamReader(new FileInputStream(filename), "Windows-31J")), cfg), HogeBean.class);

        List<HogeBean> res = new ArrayList<>();
        while (true) {
            HogeBean hoge;
            try {
                if ((hoge = reader.read()) == null) {
                    break;
                }
                res.add(hoge);
            } catch (CsvColumnException e) {
                e.printStackTrace();
                continue;
            } catch (RuntimeException e) {
                // 最大限の配慮
                e.printStackTrace();
                break;
            }
        }
        reader.close();

        return res;
    }
}

実行結果。

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[0]
file     = '"なんてことない文字列","abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123"
'
original = [なんてことない文字列][abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
row      = [なんてことない文字列][abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[1]
file     = '"空文字列",""
'
original = [空文字列][]
row      = [空文字列][]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[2]
file     = '"カンマ",","
'
original = [カンマ][,]
row      = [カンマ][,]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[3]
file     = '"ダブルクォーテーション",""""
'
original = [ダブルクォーテーション]["]
row      = [ダブルクォーテーション]["]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[4]
file     = '"バックスラッシュ","\"
'
original = [バックスラッシュ][\]
row      = [バックスラッシュ][\]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[5]
file     = '"空白文字"," bbb "
'
original = [空白文字][ bbb ]
row      = [空白文字][ bbb ]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[6]
file     = '"改行","a
b
c"
'
original = [改行][a
b
c]
row      = [改行][a
b
c]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[7]
file     = '"テスト1","a""b c,d"
'
original = [テスト1][a"b c,d]
row      = [テスト1][a"b c,d]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[8]
file     = '"テスト2","a""b""c d,e,f"
'
original = [テスト2][a"b"c d,e,f]
row      = [テスト2][a"b"c d,e,f]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[9]
file     = '"テスト3","a\""b\ c\,d"
'
original = [テスト3][a\"b\ c\,d]
row      = [テスト3][a\"b\ c\,d]
result   = O
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.orangesignal.csv.io.CsvEntityReader.convert(CsvEntityReader.java:300)
        at com.orangesignal.csv.io.CsvEntityReader.read(CsvEntityReader.java:198)
        at Main.readCsv(Main.java:82)
        at Main.main(Main.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
[10]
file     = '"テスト4","a\""b\""c\ d,e,f"
'
original = [テスト4][a\"b\"c\ d,e,f]
row      = [テスト4][a\"b\"c\ d,e,f]
result   = O

結果だけ見ると全てOKに見えるけど、IndexOutOfBoundsExceptionが発生するのが最大の難点。 自身が吐き出したCSVファイルを読み込んでエラーになるってどんなだ。

opencsv (Java)

import com.opencsv.bean.CsvBindByPosition;

public class HogeBean {
    @CsvBindByPosition(position=0)
    private String fieldA;
    @CsvBindByPosition(position=1)
    private String fieldB;

    public HogeBean() {
    }

    public String getFieldA() {
        return fieldA;
    }
    public void setFieldA(String fieldA) {
        this.fieldA = fieldA;
    }

    public String getFieldB() {
        return fieldB;
    }
    public void setFieldB(String fieldB) {
        this.fieldB = fieldB;
    }

    public boolean equals(Object o) {
        if (o instanceof HogeBean) {
            HogeBean that = (HogeBean)o;
            return (this.fieldA == null && that.fieldA == null
                || this.fieldA != null && this.fieldA.equals(that.fieldA))
                && (this.fieldB == null && that.fieldB == null
                    || this.fieldB != null && this.fieldB.equals(that.fieldB));
        }
        return false;
    }
    public String toString() {
        return "[" + fieldA + "][" + fieldB + "]";
    }
}
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;

import com.opencsv.bean.CsvToBean;
import com.opencsv.bean.CsvToBeanBuilder;
import com.opencsv.bean.StatefulBeanToCsv;
import com.opencsv.bean.StatefulBeanToCsvBuilder;
import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;

public class Main {
    private static final String FILE_ENCODING = "Windows-31J";

    public static void main(String[] args) throws IOException {
        for (int i = 0; i < CsvTestData.size(); ++i) {
            String filename = "test" + i + ".csv";

            List<List<String>> tmpOriginal = CsvTestData.get(i);
            List<HogeBean> original = new ArrayList<>();
            for (List<String> list : tmpOriginal) {
                HogeBean hoge = new HogeBean();
                hoge.setFieldA(list.get(0));
                hoge.setFieldB(list.get(1));
                original.add(hoge);
            }

            // 書き込み
            writeCsv(original, filename);

            // ファイルの内容
            String fileContents = Utils.getFileContents(filename, FILE_ENCODING);

            // 読み込み
            List<HogeBean> rows = new ArrayList<>();
            try {
                rows = readCsv(filename);
            } catch (Exception e) {
                e.printStackTrace();
            }

            System.out.printf("[%d]%n", i);
            System.out.printf("file     = '%s'%n", fileContents);
            if (original.size() == rows.size()) {
                for (int j = 0; j < rows.size(); ++j) {
                    System.out.printf("original = %s%n", original.get(j));
                    System.out.printf("row      = %s%n", rows.get(j));
                    System.out.printf("result   = %s%n", (rows.get(j).equals(original.get(j)) ? "O" : "X"));
                }
            } else {
                System.out.printf("original = %s%n", original);
                System.out.printf("row      = %s%n", rows);
                System.out.printf("result   = %s%n", "X");
            }
        }
    }
    private static void writeCsv(List<HogeBean> rows, String filename) throws IOException {
        Writer w = new OutputStreamWriter(new FileOutputStream(filename), FILE_ENCODING); // flush()するために変数にセット。
        StatefulBeanToCsv<HogeBean> writer = new StatefulBeanToCsvBuilder<HogeBean>(
                w
            ).build();

        for (HogeBean hoge : rows) {
            try {
                writer.write(hoge);
            } catch (CsvDataTypeMismatchException|CsvRequiredFieldEmptyException e) {
                e.printStackTrace();
            }
        }
        
        w.flush();
        w.close();
    }
    private static List<HogeBean> readCsv(String filename) throws IOException {
        Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(filename), FILE_ENCODING));
        CsvToBean<HogeBean> reader = new CsvToBeanBuilder<HogeBean>(
                r
            ).withType(HogeBean.class).build();

        List<HogeBean> res = new ArrayList<>();
        try {
            for (HogeBean hoge : reader) {
                res.add(hoge);
            }
        } catch (RuntimeException e) {
            e.printStackTrace();
        } finally {
            r.close();
        }

        return res;
    }
}

実行結果。

[0]
file     = '"なんてことない文字列","abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123"
'
original = [なんてことない文字列][abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
row      = [なんてことない文字列][abc!#$%&'()-=^~@`[]{};+:*,.<>/?_123]
result   = O
[1]
file     = '"空文字列",""
'
original = [空文字列][]
row      = [空文字列][]
result   = O
[2]
file     = '"カンマ",","
'
original = [カンマ][,]
row      = [カンマ][,]
result   = O
[3]
file     = '"ダブルクォーテーション",""""
'
original = [ダブルクォーテーション]["]
row      = [ダブルクォーテーション]["]
result   = O
java.lang.RuntimeException: Error capturing CSV header!
        at com.opencsv.bean.CsvToBean.prepareToReadInput(CsvToBean.java:304)
        at com.opencsv.bean.CsvToBean.iterator(CsvToBean.java:322)
        at Main.readCsv(Main.java:89)
        at Main.main(Main.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.opencsv.exceptions.CsvMalformedLineException: Unterminated quoted field at end of CSV line. Beginning of lost text: ["
]
        at com.opencsv.CSVReader.primeNextRecord(CSVReader.java:245)
        at com.opencsv.CSVReader.flexibleRead(CSVReader.java:598)
        at com.opencsv.CSVReader.peek(CSVReader.java:574)
        at com.opencsv.bean.ColumnPositionMappingStrategy.captureHeader(ColumnPositionMappingStrategy.java:72)
        at com.opencsv.bean.CsvToBean.prepareToReadInput(CsvToBean.java:302)
        ... 9 more
[4]
file     = '"バックスラッシュ","\"
'
original = [[バックスラッシュ][\]]
row      = []
result   = X
[5]
file     = '"空白文字"," bbb "
'
original = [空白文字][ bbb ]
row      = [空白文字][ bbb ]
result   = O
[6]
file     = '"改行","a
b
c"
'
original = [改行][a
b
c]
row      = [改行][a
b
c]
result   = X
[7]
file     = '"テスト1","a""b c,d"
'
original = [テスト1][a"b c,d]
row      = [テスト1][a"b c,d]
result   = O
[8]
file     = '"テスト2","a""b""c d,e,f"
'
original = [テスト2][a"b"c d,e,f]
row      = [テスト2][a"b"c d,e,f]
result   = O
java.lang.RuntimeException: Error capturing CSV header!
        at com.opencsv.bean.CsvToBean.prepareToReadInput(CsvToBean.java:304)
        at com.opencsv.bean.CsvToBean.iterator(CsvToBean.java:322)
        at Main.readCsv(Main.java:89)
        at Main.main(Main.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.opencsv.exceptions.CsvMalformedLineException: Unterminated quoted field at end of CSV line. Beginning of lost text: [a""b c,d
]
        at com.opencsv.CSVReader.primeNextRecord(CSVReader.java:245)
        at com.opencsv.CSVReader.flexibleRead(CSVReader.java:598)
        at com.opencsv.CSVReader.peek(CSVReader.java:574)
        at com.opencsv.bean.ColumnPositionMappingStrategy.captureHeader(ColumnPositionMappingStrategy.java:72)
        at com.opencsv.bean.CsvToBean.prepareToReadInput(CsvToBean.java:302)
        ... 9 more
[9]
file     = '"テスト3","a\""b\ c\,d"
'
original = [[テスト3][a\"b\ c\,d]]
row      = []
result   = X
[10]
file     = '"テスト4","a\""b\""c\ d,e,f"
'
original = [テスト4][a\"b\"c\ d,e,f]
row      = [テスト4][a""b""c d,e,f]
result   = X

こちらはバックスラッシュ(エスケープ文字)が全滅。

まとめ

  • PHP
    • 特に問題なし
  • Super CSV / Super CSV Annotation
    • 空文字列がnullで返されるのが惜しい
    • 改行コードは、書き込みの際に「\r\n」に統一され、読み込みの際に環境に応じたものに変換されてしまう
  • OrangeSignal CSV
    • とにかくIndexOutOfBoundsExceptionが発生するのが難点というに尽きる
  • opencsv
    • エスケープ文字(デフォルトではバックスラッシュ)の扱いが雑