CSVeed

Easy-to-use CSV to Java Bean utility

View project onGitHub

Annotations

CSVeed currently has the following annotations:

  • CsvFile; generic instructions for parsing the CSV file and converting to Rows and Beans
  • CsvCell; custom instructions for properties, allowing mappings to column index or names and whether value is required
  • CsvIgnore; orders CSVeed to ignore a property
  • CsvDate; allows a custom date format to be applied to a property
  • CsvConverter; set a custom Converted to be applied for converting text to a property

For the annotations to work, the Bean class must be passed to CsvClient:

  CsvClient<Bean> csvReader = new CsvClientImpl<Bean>(reader, Bean.class);

CsvFile

This annotation is set on the Bean class. Contains the generic instructions for parsing the CSV file and converting to Rows and Beans. The following settings are supported by CsvFile:

  • parse instructions; escape, quote, separator, end-of-line and comment -- this determines what your CSV file looks like
  • use header; whether the CSV file contains a header and must be read as such. Using the header is essential for employing the ColumnNameMapping strategy.
  • start row; the line from where to start reading the CSV file, zero-based
  • skip lines; both empty and comment lines and whether they must be ignored or parse must be attempted
  • mapping strategy; by default this will be ColumnIndexMapping, which maps to Bean properties on the basis of the column index. Alternatively, this could be ColumnNameMapping, which maps to Bean properties on the basis of the name of the column (ie, the header name).

Parse instructions

Parse instructions help CsvClient to read and interpret the CSV file. Assume the following CSV:

  first name, surname, street, city, trademark
  % First a line on mr Hawking
  'Stephen', 'Hawking', '110th Avenue', 'New York', 'History of the \'Universe\''
  % Then on mr Einstein
  'Albert', 'Einstein', 'Leipzigerstrasse', 'Berlin', '\'E=mc2\''

The Bean header can be annotated as follows:

  @CsvFile(comment = '%', quote='\'', escape='\\', separator=',')
  public class Bean {

The following parse instructions are available:

  • separator; the character used to separate two cells. This is usually a ';' (northern Europe, also the default), ',' (USA), tab symbol or a pipe '|'. Default is ';'.
  • quote; the character used to signal the start and the end of a cell. Within a cell thus delimited, it is possible to have newlines and use the quote symbol, if escaped. Default is '"'.
  • escape; the character used to escape a quote symbol within a quoted field. This one is contentious, since RFC 4180 states that the escape symbol is the same as the quote symbol, so you use them twice to have one. Sometimes, it is desirable to have a custom escape character, which you can set here. Default is '"'.
  • end of line; a number of characters indicating when the end of a line has been reached. Default is '\r' and '\n'
  • comment; if a line starts with the comment character, it is assumed to be a comment line. Only used if skip comments is true (default). Default is '#'.

Use header

Suppose your CSV file does not have a header:

  "line 1";1
  "line 2";2
  "line 3";3

You need to disable useHeader in @CsvFile:

  @CsvFile(useHeader = false)
  public class Bean {

Note: it is now impossible to use ColumnNameMapping, since there is no header to supply the column names.

Start row

CSV files exist the contain a lot of non-essential information before the actual content starts, while not being marked as comment lines:

  Roses are red,
  Violets are blue,
  And some more of that
  "Here";"We";"Go"

If you are in the lucky position that you can identify the exact start row, you could pass that information on in @CsvFile:

  @CsvFile(startRow = 3)
  public class Bean {

Skip lines

There are two skip instructions:

  • skip empty lines; it can be useful to convert empty lines into single-column rows. By default empty lines will be skipped.
  • skip comment lines; it can be useful to disable the skipping of comment lines when the comment symbol can be a legitimate symbol in your CSV file. By default comment lines will be skipped.

Example of a file where you may want to include empty lines:

  Alpha

  Beta
  Gamma

Example of a file where you may want to ignore comments:

  issue number; description
  #12;Some error somewhere
  #31;NPE

In these cases, make sure to instruct @CsvFile properly:

  @CsvFile(skipCommentLines = false, skipEmptyLines = false)
  public class Bean {

Mapping strategy

For converting Rows to Beans, this is the most important setting of @CsvFile. There are two mapping strategies currently supported:

  • ColumnIndexMapper; maps cells based on their column index to Bean properties
  • ColumnNameMapper; maps cells based on their column name (ie, header name) to Bean properties

ColumnIndexMapper

The default strategy to employ if none is passed. Cells will be mapped to Bean properties by their column index. When no instructions are passed to a property (using @CsvCell#columnIndex), CSVeed will take the declared order of the property and use that order to self-assemble the index.

The following Bean properties (assuming they have public getters and setters):

  private String name;
  private Date birthdate;
  private Integer creditRating;

Will lead to the following index:

 0 -> name
 1 -> birthdate
 2 -> creditRating

ColumnNameMapper

Cells will be mapped to Bean properties by their column name (ie, header name). When no instructions are passed to a property (using @CsvCell#columnName), CSVeed will take the property name and use that to self-assemble the index.

The following Bean properties (assuming public getters/setters):

  private String name;
  private Date birthdate;
  private Integer creditRating;

Will lead to the following index:

  name -> name
  birthdate -> birthdate
  creditrating -> creditRating

Note that the key 'creditrating' is all lower-case. Property-names are all lower-cased before storing them in the index. Lookups will also be done with lookup keys that are first lower-cased. Therefore ColumnNameMapper is case-insensitive.

CsvCell

This annotation is set on a Bean property. @CsvCell offers three tools:

  • columnIndex; maps a column index to this Bean property
  • columnName; maps a column name to this Bean property
  • required; when this value is true, the content of the cell must be not empty

columnIndex

When not all columns in a CSV file are needed, columnIndex may be of great help. We have a CSV file here with multiple columns and no headers:

  L1C1;L1C2;L1C3;L1C4;valuable info 1;l1C6
  L2C1;L2C2;L2C3;L2C4;valuable info 2;l2C6

The Bean property can now be annotated as follows:

  @CsvCell(columnIndex = 4)
  private String valuableInfo;

Note that the columnIndex works zero-based. Also note that Bean properties following valuableInfo will use the set columnIndex as their starting point. In other words, the next property will automatically have index column 5 mapped to itself.

columnName

It is possible to set up your own mapping for ColumnNameMapper, which is especially useful if the CSV header tends to be verbose, contains lots of special characters or has a name which you do not want to reuse, ie have names that translate badly to property names:

  the first column; my my, how verbose; isn't it?
  @CsvCell(columnName = "the first column")
  private String first;
  @CsvCell(columnName = "my my, how verbose")
  private String second;
  @CsvCell(columnName = "isn't it?")
  private String third;

required

Although validation is not the providence of CSVeed, this annotation does a little bit to help you along. When Bean properties are marked as required and they are found to be null or "", an exception will be thrown.

  first name, surname, street, city, trademark
  'Stephen', 'Hawking', '110th Avenue', 'New York', 'History of the \'Universe\''
  'Albert', 'Einstein', '', 'Berlin', '\'E=mc2\''

Note how Einstein's street cell is empty.

  @CsvCell(required = true)
  private String street;

This will result in the following error:

Exception in thread "main" nl.tweeenveertig.csveed.report.CsvException: Bean property "street" is required and may not be empty or null
2: 'Albert', 'Einstein', '', 'Berlin', '\'E=mc2\''[EOF]
2:                         ^

CsvIgnore

Marking Bean properties to be ignored, means they will not be automatically picked up for indexing, neither for ColumnIndexMapper, nor for ColumnNameMapper. The Bean property will be completely ignored.

  private String name;
  @CsvIgnore
  private Date birthdate;
  private Integer creditRating;

Will lead to the following index:

 0 -> name
 1 -> creditRating

CsvDate

Converting to java.util.Date from String brings it owns challenges. This annotation lets you determine the date format to employ. The default format that will be used is "yyyy-MM-dd" (for example: 2013-02-28), the date format that also sorts very well.

  name;date
  Jane;21-03-2011
  Jill;03-11-2013

So the date format is day-month-year, or "dd-MM-yyyy".

  @CsvDate(format = "dd-MM-yyyy")
  private String date;

Be sure to check the docs on Java SDK's SimpleDateFormat for a better understanding of the syntax involved.

CsvConverter

It is conceivable that you bring your own String-to-property conversion wishes into the game, which is why this annotation exists. First make sure that you create or supply your converter, based on the Converter pattern, similar to how Spring does it.

public class BeanSimpleConverter extends AbstractConverter<BeanSimple> {

    @Override
    public BeanSimple fromString(String text) throws Exception {
        return new BeanSimple(text);
    }

    @Override
    public Class getType() {
        return BeanSimple.class;
    }

}

As you can see, it is basically a matter of supplying a way to go from String to a Class and vice versa. Nothing much to it, really.