Ingester

The job of the ingester is to import event information from schedule data files into the database.

The ingester is looking in the service directory for files The first step is to extract information from the source files. As various file formats for the exchange of event information exist, cherryEPG has a set of different parsers which are actually doing the job.

As a first step when importing source files, the directory of the service is searched only for files without the extension .md5. For each source file found, the ingester is looking whether a file with the same name and the extension .md5 exists. If it is older than the source file or does not exist, a new one will be created. Therefore a MD5 checksum from the source file is calculated and saved to a file with the extension .md5. Now for a file with the extension .md5.parsed is searched. The content is compared with the .md5 file. When the content is equal nothing is done and the ingester continues with the next source file. When the content differs or the .md5.parsed file cannot be found, the source file is passed to the corresponding parser. Afterwards the .md5 file is copied to the .md5.parsed file.

This procedure prevent ingesting the same schedule file over and over. Files are parsed and ingested only when they are modified and hold different data.

During the ingest process all event data is UTF-8 encoded and then stored in database.

The configuration of the ingester is done by the scheme file. The only columns relevant for configuring the ingester are Parser and Parser option.

Parser

This field contains the name of the parser to use for parsing - extracting event information from the schedule data files. For easier user handling parsers contain the extension of the source file they can parse. This is only a recommendation. The default parsers in the public release are

Parser name Schedule format
TVXML XMLTV
SimpleXLS Excel spreadsheet - simple format
AMCXML XML format
BarcoNetXML XML format used by BarcoNet
DivaXLS Excel spreadsheet
MezzoXLS Excel spreadsheet
N1HTML HTML format
Nova24XML XML format
OBNHTML HTML format
PlanetEarthXML XML format
PlanetXML XML format
ProPlusXML XML format
RtvSloJSON JSON format
SimpleXLS Excel format
SportTVXML XML format
STNHTML HTML format
TVAnytimeXML XML format

Parser option

This field can be used to pass extra parameters to the parser. Not all parsers use this feature.

Control from Command-line

The ingester is invoked from command-line with the cherryTool script.

Ingest all files in a service directory with SID. Only modified files will be ingested (see MD5 handling above)

cherryTool -i 42

It is possible to remove all files with .md5 extension. This will reset source files for ingesting.

cherryTool -r 42

After that all source files can be ingested again.

Command-line switches can be combined. e.g. Reset + Ingest (add the verbose flag for extended information)

cherryTool -riv 42