Chapter 4. Data Sources

4. Data Sources

In ReportServer data sources serve as principal data basis for report objects, i.e. a report draws the data to be displayed from a defined data source. As with most other objects, data sources are maintained in a hierarchical structure. The data source management module is to be found in the Administration module under Data sources. The following object types can be created in the tree:

  • Folder Serve to structure data sources.
  • Datasource Here various data sources are optionally available which we will discuss in more detail in the following.

Data sources will be configured in two steps. In Data source management, data sources will be created and the basic settings made. For relational databases, here, for instance, user name, password and access URL are stored. However, the specific configuration per use will be set at the point where the data source will be used (this is mostly with the respective report). Here, for instance, for a relational database the SQL query can be set on which the report is based.

ReportServer supports the following data sources.

4.1. Relational Databases

It is possible to access common relational databases via the data source type "Relational databases". Use the option "database" to control the SQL dialect created by ReportServer. At present, ReportServer supports the following SQL dialects :

  • DB2
  • H2
  • HSQL
  • Informix
  • MSSQL
  • MySQL
  • Postgre SQL
  • Oracle
Make sure to integrate the respective JDBC database driver prior to use. (For more detailed information refer to the database driver description.) Information on how to set user name and password as well as the URL is given in the data- base manual. In the following example we will demonstrate how to configure a MySQL data source.

After the initial installation, MySQL can usually be started by entering user name "root" and password "root". A JDBC URL could be as follows:

jdbc:mysql://127.0.0.1:3306/ClassicModels

After having transferred the data, and you wish to test whether ReportServer can establish a database connection, apply the button Test connection. Be aware to always test the saved connection.

The query will actually be configured when selecting the data source, for instance, if you want to create a dynamic list on the basis of this data source.

4.2. Storage of Database Passwords

In the ReportServer development we particularly emphasized the safety of the system to the greatest possible extent. One of the main issues was to store sensitive data as securely as possible. Therefore, the data source settings are of special importance as they provide the potential to access your data warehouse. To store passwords as securely as possible, we follow a two-way strategy. Firstly, database passwords should never be transferred to the client (to your web browser), but only be used to establish database connections. This results in the data source password field always being empty upon reloading the form. However, you may safely change the data source, the password will only be reset when you add an entry to the password field. The second safety measure is that ReportServer database passwords will be encrypted when saving. For further information, please refer to the configuration instructions.

4.3. The Internal Data Base

Non-relational database data (such as CSV lists) will be cached in an internal H2 database to be able to quickly and comfortably access them. For this purpose, H2 stores some files on your local hard disk. By default this location points to the folder dbtmp in your ReportServer installation. For setting the location as well as other properties please refer to the configuration instructions. You may access the internal database in just the same way as you do with all other data sources. To do so, create a new data source of type "relational database", set the database type to H2 and configure as follows.

Username: sa
Password: Refer to the configuration file reportserver.properties. If you activated the encryption of the internal data- base, the password is "encryption password user password". If encryption is deactivated, the password is the user password set in reportserver.properties.
URL: jdbc:h2:VERZEICHNIS/rsdb;CIPHER=AES (however, omit ";CIPHER=AES" if database encryption was deactivated).
4.3.1. Demo Data

If desired, you can load the demo data on the fictitious organization "1 to 87" as soon as you have started ReportServer. They will then become available in the internal H2 database. For further information see Section 1.3. as well as the configuration guide.

You can easily configure a data source with access to the internal demo data. To do this, create a new H2 data source with the following settings:

Username: demo
Password: demo
URL: jdbc:h2:DIRECTORY/rsdemodb

Be aware that the demo database will never be encrypted, therefore, there is no need to configure any encryption in the URL.

4.4. Datasource Pool

Data source connections can be provided in a pool in ReportServer. This can clearly increase the performance and enables to better control the individual connections. If pooling is activated, ReportServer keeps a pre-defined number of connections open per relational database integrated. ReportServer will recycle these connections by user to save the costs incurred for setting up the connection. In the configuration guide you will find detailed information on how to pool databases exactly and which settings to enter.

4.5. CSV Lists

Apart from relational databases, ReportServer can provide data in form of CSV files (Comma-Separated Values; further information on CSV you will find, for example, at http:// en.wikipedia.org/wiki/Comma-separated_values) as data basis for reports. To work with a CSV List, create one. To set the format of your CSV file, use the fields

Quotes: Delimiting characters for an individual data record.
Separator: Separating character between data records

Use the Connector setting to define the location of the CSV data. The following connectors are presently supported:

Text-Connector: Allows to directly enter data in a text field at the data source.
Argument-Connector: Allows to directly enter data in a text field when selecting the data source. This enables, for instance, to easily simulate static lists for report parameters (refer to the section on datasource parameters).
URL-Connector: Allows to load data by using an URL. Please observe that you have the option to load the data from the internal ReportServer file system (refer here to the Section File Server).
4.5.1. Database Cache

As described before, prior to their use CSV data will be loaded to an internal database. This may take some time if you load a larger data volume for the first time. Therefore, it is quite often reasonable not to continuously load the data. Use the Cache database setting to determine after how many minutes the data should be reloaded from the source to the internal database. If you set -1, the data will be loaded only one time. If you set 0, the data will be reloaded every time you use them.

If you wish to load the data manually (because they were changed) it is sufficient to simply save the data source again. After every saving process, ReportServer will initiate that the data will be removed from the internal cache and reloaded again.

Please observe that in case of the Argument Connector, ReportServer will ignore the cache setting, so the data will not be cached.

4.5.2. Configuration at the Object

Similar to relational databases, you can make additional settings for CSV data sources at the location where the data source will be selected (e.g. at the report or parameter). As already explained, CSV data will be buffered to an internal database. The query type to be used is

SELECT * FROM TMP_TABLENAME

where TMP_TABLENAME is a temporary table name assigned by ReportServer. Using the Query Wrapper setting, now you can extend the query created automatically. Here, use the syntax for parameters (see section on report parameters). The following replacements will be available to you:

TMP_TABLENAME: The name of the table
query: The basic query.

For instance, by using the following query you could limit the data volume to all those data records where the attribute REGION has value 3.

SELECT * FROM ($!{TMP_TABLENAME}) WHERE REGION = "3"

Please observe that any CSV data will generally be treated as if it were of type string. In addition, we want to point out that if you use replacements, you need to use $!{} instead of ${} as replacements need to be directly written into the query (for further information on the replacement syntax refer to Section 6.3. Working with Parameters).

4.6. Script Data Sources

If you wish to load data which are in a format that has not been supported so far, or if you wish to perform complex pre-processing of data, it is advisable to use script data sources. Script data sources provide data by running a ReportServer script and, therefore, they can be applied very flexibly. Similar to CSV lists, the result of a script data source will first be buffered in the internal database. Here as well, you will have the option to define in the Cache database setting how often the data will be reloaded.

Script data sources run a ReportServer script whenever the datasource is accessed. This script will be filed with the data source and, if selected, it can be parameterized with a report (i.e. parameters can be transferred to the script). The return value of the script must be an object of type RSTableModel (included in the package net.datenwerke.rs.base.service.reportengines.table.output.object). In the following we will give a simple example script which builds up a static table consisting of 3 columns.

import net.datenwerke.rs.base.service.reportengines.table.output.object.*;

def definition = new TableDefinition(['a','b','c'],
					[Integer.class,Integer.class,Integer. class]);

def model = new RSTableModel(definition);
model.addDataRow(1,2,3);
model.addDataRow(4,5,6);
model.addDataRow(7,8,9);

return model;
4.6.1. Configuration at the Object

You can further reduce the data volume by using Query wrapper in the same way as you proceeded with CSV data sources. In addition, you can pass arguments to the script.

4.7. BIRT Report Data Source

The BIRT report engine enables to define data records within BIRT Reports. For instance, they can be used for feeding parameters. By using BIRT Report data sources, you can access this data in ReportServer.

As the most frequent application case for BIRT Report data sources will surely be the reading out of parameters, the BIRT data source will directly be configured at the report. This means you only have to create a data source in the data source tree, any further configuration will be entered at the location where it is used. When using the data source, you eventually have to select the respective BIRT report and enter the name of the data set. Please observe that BIRT provides the option to access so-called "data sets" as well as parameter data. Depending on the origin of the data, you have to set the respective type.

In the same way as you proceeded with CSV and script data sources, you can modify the query by using the Query wrapper configuration.

4.8. Mondrian Datasource

Mondrian is a java-based OLAP engine (Online Analytical Processing) developed by Pentaho (http://mondrian.pentaho.com) allowing you to perform multi-dimensional analysis on your data. For an introduction to OLAP and Mondrian we refer to the Mondrian documentation available online at http://mondrian.pentaho.com/documentation and we assume basic familiarity with Mondrian and OLAP for the following discussion.

Mondrian datasources are used to define so called Mondrian schemas which can then be used by the Mondrian backed Saiku reporting format within ReportServer (see Chapter 6.7. Saiku / Mondrian Reports). The main configuration options of Mondrian datasources are

Properties: The properties define the connection to the underlying relational database.
Schema: The schema describes the data warehouse semantics.

Note: in order to configure your Mondrian instance you can create a "mondrian.properties" file in your WEB-INF/classes directory or modify it if it already exists. In this file you can set the Mondrian properties needed, e.g. "mondrian.rolap.queryTimeout=3" (without the quotes). This property gives you an error if your query runs more than 3 seconds. Refer to https://mondrian.pentaho.com/documentation/configuration.php for all Mondrian configuration options.

When you create a new Mondrian datasource, ReportServer already specifies an example definition for the properties pointing at a MySQL database called foodmart .

type=OLAP
name=Foodmart
driver=mondrian.olap4j.MondrianOlap4jDriver
location=jdbc:mondrian:Jdbc=jdbc:mysql://localhost/foodmart
jdbcDrivers=com.mysql.jdbc.Driver
jdbcUser=
jdbcPassword=

A Mondrian schema defines multi-dimensional data warehouses on top of relational databases that are usually assumed to be managed in a star schema like form. Following is a simple schema definition based on the foodmart demo data and taken from the Mondrian documentation . The schema consists of a single cube called Sales which is made up of two dimensions (Gender and Time) and four measures.

<Schema>
	<Cube name="Sales">
		<Table name="sales_fact_1997"/>
			<Dimension name="Gender" foreignKey="customer_id">
				<Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
					<Table name="customer"/>
					<Level name="Gender" column="gender" uniqueMembers="true"/>
				</Hierarchy>
			</Dimension>
			<Dimension name="Time" foreignKey="time_id">
				<Hierarchy hasAll="false" primaryKey="time_id">
					<Table name="time_by_day"/>
					<Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
					<Level name="Quarter" column="quarter" uniqueMembers="false"/>
					<Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
				</Hierarchy>
			</Dimension>
			<Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
			<Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
			<Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/>
			<CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
				<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
			</CalculatedMember>
	</Cube>
</Schema>

Schema and properties are sufficient for the Mondrian server to query the defined ware house using queries written in the MDX language, an SQL like query language first introduced by Microsoft in 1997 (see, e.g., http://en.wikipedia.org/wiki/Multidimensional_Expressions). In order to use Mondrian within ReportServer you will need to create so called Saiku reports (see Chapter 6.7. Saiku / Mondrian Reports) which provide a beautiful user interface to access a data specified in a cube in a Pivot like fashion.

4.9. Data Source Bundle

The datasource bundle allows you to use the same report for different datasources an have the users select which database to use. To use this feature you first have to define sets or bundles of similar datasources from which a selection can be made. For this, in the datasource manager create a new datasource of type database bundle. After you configured the bundle, instead of using a specific datasource, you use the bundle as the datasource for your report.

The database bundle needs two options to be configured: The Key Provider defines where the key used for the lookup of the actual datasource (the datasource that is selected for a single execution) is taken from. There are two key providers:

Login Key Provider The login dialog contains a dropdown list that allows the user to select the key the bundle uses to lookup the assigned datasource. For this to work properly you have to configure the available values in the /etc/datasources/databasebundle.cf file. Please refer to the Configuration guide for additional information.
Report Parameter Provider One of the report parameters is used to provide the key the bundle uses to lookup the assigned datasource. You also have to enter the parameter-key of the parameter that will be used.

The Mapping provider defines how a datasource gets selected from the key. There are three providers to choose from:

Static Mapping The static mapping allows you to manualy specify a map of keys and associated datasources.
Auto: Ds-Node (by ID) Instead of manually adding all the datasources for the bundle to your mapping this mapping provider automatically chooses the datasource that has an id matching the provided key. The mapping table is used to specify the search path. You can add single datasources or whole folders to your mapping. If you add a folder to your mapping, the datasources must be direct children of the given folder. The key column is ignored in this configuration.
Auto: Ds-Node (by Name) Similar to the previous strategy this mapping provider automatically chooses from a set of datasource without explicitly defining a key for each datasource. Instead of using the id to find a matching datasource, the datasources name is used. If your bundle contains multiple datasources with the same name, the result is undefined.
You can also use a datasource bundle as the datasource for a database parameter. If you use the Report Parameter Key Provider you have to make the parameter that uses the bundle dependent on the paramter that is used as the key source.
4.10. Configuration of a Standard Data Source

For a quick configuration of reports, ReportServer allows to define a default data source. It can then be configured by a single click at the locations where data sources can be selected. The default data source can be set up by using the configuration file etc/datasources/data- sources.cf (in the internal file system, refer also to the configuration guide). In the following please find a sample configuration selecting the data source by name. It can optionally be selected by its ID.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
	<datasource> 
		<defaultDatasourceName>Demodaten</defaultDatasourceName>
	</datasource>
</configuration>

Please observe to run the terminal command config reload when configuration files have been modified. For further information see the ReportServer configuration guide as well as Chapter 12. Terminal.