Ingester
The job of the ingester is to import event information from schedule data files into the database.
The ingester is looking in the service directory for files The first step is to extract information from the source files. As various file formats for the exchange of event information exist, cherryEPG has a set of different parsers which are actually doing the job.
As a first step when importing source files, the directory of the service is searched only for files without the extension .md5. For each source file found, the ingester is looking whether a file with the same name and the extension .md5 exists. If it is older than the source file or does not exist, a new one will be created. Therefore a MD5 checksum from the source file is calculated and saved to a file with the extension .md5. Now for a file with the extension .md5.parsed is searched. The content is compared with the .md5 file. When the content is equal nothing is done and the ingester continues with the next source file. When the content differs or the .md5.parsed file cannot be found, the source file is passed to the corresponding parser. Afterwards the .md5 file is copied to the .md5.parsed file.
This procedure prevent ingesting the same schedule file over and over. Files are parsed and ingested only when they are modified and hold different data.
During the ingest process all event data is UTF-8 encoded and then stored in database.
The configuration of the ingester is done by the scheme file. The only columns relevant for configuring the ingester are Parser and Parser option.
Parser
This field contains the name of the parser to use for parsing - extracting event information from the schedule data files. For easier user handling parsers contain the extension of the source file they can parse. This is only a recommendation. The default parsers in the public release are
Parser name | Schedule format |
---|---|
TVXML | XMLTV |
SimpleXLS | Excel spreadsheet - simple format |
AMCXML | XML format |
BarcoNetXML | XML format used by BarcoNet |
DivaXLS | Excel spreadsheet |
MezzoXLS | Excel spreadsheet |
N1HTML | HTML format |
Nova24XML | XML format |
OBNHTML | HTML format |
PlanetEarthXML | XML format |
PlanetXML | XML format |
ProPlusXML | XML format |
RtvSloJSON | JSON format |
SimpleXLS | Excel format |
SportTVXML | XML format |
STNHTML | HTML format |
TVAnytimeXML | XML format |
Parser option
This field can be used to pass extra parameters to the parser. Not all parsers use this feature.
Control from Command-line
The ingester is invoked from command-line with the cherryTool script.
Ingest all files in a service directory with SID. Only modified files will be ingested (see MD5 handling above)
cherryTool -i 42
It is possible to remove all files with .md5 extension. This will reset source files for ingesting.
cherryTool -r 42
After that all source files can be ingested again.
Command-line switches can be combined. e.g. Reset + Ingest (add the verbose flag for extended information)
cherryTool -riv 42