How to read and parse CSV file in Java
A Comma-Separated Values (CSV) file is just a normal plain-text file, store data in column by column, and split it by a separator (e.g normally it is a comma “,”).
For example :
1,US,United States 2,MY,Malaysia 3,AU,Australia
or
"1","US","United States" "2","MY","Malaysia" "3","AU","Australia"
Note
Read this RFC4180 document for Comma-Separated Values (CSV) format.
In a CSV file, normally there are two issues:
1. The field containing separator, for example, separator is a comma, and the field containing commas:
"aaa","b,bb","ccc"
2. The double-quotes are used to enclose fields, and the field containing double-quotes. To solve it, a double-quote appearing inside a field must be escaped by preceding it with another double quote (RFC4180)
"aaa","b""bb","ccc"
In this tutorial, we show you three examples to read, parse and print out the values from a CSV file.
Simple solution to parse a simple formatted CSV file.
Advance solution to parse a weird formatted CSV file (field containing separator or double-quotes)
Third party solution, OpenCSV example.
1. Simple Solution
If you are sure the CSV files doesn’t contain “separator or double-quotes”, just use the standard split() to parse the CSV file.
1.1 Review a simple CSV file
/Users/mkyong/csv/country.csv
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia" "1.0.1.0","1.0.3.255","16777472","16778239","CN","China" "1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia" "1.0.8.0","1.0.15.255","16779264","16781311","CN","China" "1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan" "1.0.32.0","1.0.63.255","16785408","16793599","CN","China" "1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan" "1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
1.2 No magic, just read above text file, and splits it by a comma separator.
CSVReader.csv
package com.mkyong.csv; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; public class CSVReader { public static void main(String[] args) { String csvFile = "/Users/mkyong/csv/country.csv"; BufferedReader br = null; String line = ""; String cvsSplitBy = ","; try { br = new BufferedReader(new FileReader(csvFile)); while ((line = br.readLine()) != null) { // use comma as separator String[] country = line.split(cvsSplitBy); System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]"); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (br != null) { try { br.close(); } catch (IOException e) { e.printStackTrace();
1.3 For JDK 7 and above, uses try-resources.
CSVReader.csv
package com.mkyong.csv; import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class CSVReader { public static void main(String[] args) { String csvFile = "/Users/mkyong/csv/country.csv"; String line = ""; String cvsSplitBy = ","; try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) { while ((line = br.readLine()) != null) { // use comma as separator String[] country = line.split(cvsSplitBy); System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]"); } catch (IOException e) { e.printStackTrace();
Output
Country [code= "AU" , name="Australia"] Country [code= "CN" , name="China"] Country [code= "AU" , name="Australia"] Country [code= "CN" , name="China"] Country [code= "JP" , name="Japan"] Country [code= "CN" , name="China"] Country [code= "JP" , name="Japan"] Country [code= "TH" , name="Thailand"]
2. Advance Solution
This solution will solve the field containing “separator or double-quotes” issue, and also support the custom separator and custom enclosed field. Review the following CSV parsing example and also the JUnit test cases to understand how it works.
Note
Again, if a double-quote appearing inside a field must be escaped by preceding it with another double quote, for example
"aaa","b""bb","ccc"
2.1 Review another CSV file
/Users/mkyong/csv/country2.csv
10,AU,Australia 11,AU,Aus""tralia "12","AU","Australia" "13","AU","Aus""tralia" "14","AU","Aus,tralia"
2.2 The below example is inspired by this article – “Only Class You Need for CSV Files” (with some fixed to support other features, read “fixed comments”) and also this third party OpenCSV library.
CSVUtils.java
package com.mkyong.utils; import java.io.File; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class CSVUtils { private static final char DEFAULT_SEPARATOR = ','; private static final char DEFAULT_QUOTE = '"'; public static void main(String[] args) throws Exception { String csvFile = "/Users/mkyong/csv/country2.csv"; Scanner scanner = new Scanner(new File(csvFile)); while (scanner.hasNext()) { List<String> line = parseLine(scanner.nextLine()); System.out.println("Country [id= " + line.get(0) + ", code= " + line.get(1) + " , name=" + line.get(2) + "]"); scanner.close(); public static List<String> parseLine(String cvsLine) { return parseLine(cvsLine, DEFAULT_SEPARATOR, DEFAULT_QUOTE); public static List<String> parseLine(String cvsLine, char separators) { return parseLine(cvsLine, separators, DEFAULT_QUOTE); public static List<String> parseLine(String cvsLine, char separators, char customQuote) { List<String> result = new ArrayList<>(); //if empty, return! if (cvsLine == null && cvsLine.isEmpty()) { return result; if (customQuote == ' ') { customQuote = DEFAULT_QUOTE; if (separators == ' ') { separators = DEFAULT_SEPARATOR; StringBuffer curVal = new StringBuffer(); boolean inQuotes = false; boolean startCollectChar = false; boolean doubleQuotesInColumn = false; char[] chars = cvsLine.toCharArray(); for (char ch : chars) { if (inQuotes) { startCollectChar = true; if (ch == customQuote) { inQuotes = false; doubleQuotesInColumn = false; } else { //Fixed : allow "" in custom quote enclosed if (ch == '\"') { if (!doubleQuotesInColumn) { curVal.append(ch); doubleQuotesInColumn = true; } else { curVal.append(ch); } else { if (ch == customQuote) { inQuotes = true; //Fixed : allow "" in empty quote enclosed if (chars[0] != '"' && customQuote == '\"') { curVal.append('"'); //double quotes in column will hit this! if (startCollectChar) { curVal.append('"'); } else if (ch == separators) { result.add(curVal.toString()); curVal = new StringBuffer(); startCollectChar = false; } else if (ch == '\r') { //ignore LF characters continue; } else if (ch == '\n') { //the end, break! break; } else { curVal.append(ch); result.add(curVal.toString()); return result;
Output
Country [id= 10, code= AU , name=Australia] Country [id= 11, code= AU , name=Aus"tralia] Country [id= 12, code= AU , name=Australia] Country [id= 13, code= AU , name=Aus"tralia] Country [id= 14, code= AU , name=Aus,tralia]
3.3 Review the following unit test, it test the “commas and double quotes” issue.
CSVUtilsTest.java
package com.mkyong.csv; import com.mkyong.utils.CSVUtils; import org.hamcrest.core.IsNull; import org.junit.Test; import java.util.List; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; public class CSVUtilsTest { @Test public void test_no_quote() { String line = "10,AU,Australia"; List<String> result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); @Test public void test_no_quote_but_double_quotes_in_column() throws Exception { String line = "10,AU,Aus\"\"tralia"; List<String> result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); @Test public void test_double_quotes() { String line = "\"10\",\"AU\",\"Australia\""; List<String> result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); @Test public void test_double_quotes_but_double_quotes_in_column() { String line = "\"10\",\"AU\",\"Aus\"\"tralia\""; List<String> result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); @Test public void test_double_quotes_but_comma_in_column() { String line = "\"10\",\"AU\",\"Aus,tralia\""; List<String> result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus,tralia"));
3.4 Review another unit test, it test the custom separator and custom enclosed field.
CSVUtilsTestCustom.java
package com.mkyong.csv; import com.mkyong.utils.CSVUtils; import org.hamcrest.core.IsNull; import org.junit.Test; import java.util.List; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; public class CSVUtilsTestCustom { @Test public void test_custom_separator() { String line = "10|AU|Australia"; List<String> result = CSVUtils.parseLine(line, '|'); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); @Test public void test_custom_separator_and_quote() { String line = "'10'|'AU'|'Australia'"; List<String> result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); @Test public void test_custom_separator_and_quote_but_custom_quote_in_column() { String line = "'10'|'AU'|'Aus|tralia'"; List<String> result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus|tralia")); @Test public void test_custom_separator_and_quote_but_double_quotes_in_column() { String line = "'10'|'AU'|'Aus\"\"tralia'"; List<String> result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia"));
3. OpenCSV Example
If you are not comfortable with above simple and advance solution, try using third party CSV library – OpenCSV.
3.1 Maven.
pom.xml
<dependency> <groupId>com.opencsv</groupId> <artifactId>opencsv</artifactId> <version>3.8</version> </dependency>
3.2 Review a CSV file.
/Users/mkyong/csv/country3.csv
10,AU,Australia 11,AU,Aus""tralia "12","AU","Australia" "13","AU","Aus""tralia" "14","AU","Aus,tralia"
3.2 OpenCSV example to parse above CSV file.
CSVReaderExample.java
package com.mkyong.csv; import com.opencsv.CSVReader; import java.io.FileReader; import java.io.IOException; public class CSVReaderExample { public static void main(String[] args) { String csvFile = "/Users/mkyong/csv/country3.csv"; CSVReader reader = null; try { reader = new CSVReader(new FileReader(csvFile)); String[] line; while ((line = reader.readNext()) != null) { System.out.println("Country [id= " + line[0] + ", code= " + line[1] + " , name=" + line[2] + "]"); } catch (IOException e) { e.printStackTrace();
Output
Country [id= 10, code= AU , name=Australia] Country [id= 11, code= AU , name=Aus"tralia] Country [id= 12, code= AU , name=Australia] Country [id= 13, code= AU , name=Aus"tralia] Country [id= 14, code= AU , name=Aus,tralia]
Note
Please refer to this OpenCSV official documentation for more examples.
Done.
References
From:一号门
COMMENTS