Chapter 4. Datasources

4. Datasources

In ReportServer datasources serve as principal data basis for report objects, i.e. a report draws the data to be displayed from a defined datasource. As with most other objects, datasources are maintained in a hierarchical structure. The datasource management module is to be found in the Administration module under Datasources. The following object types can be created in the tree:

  • Folder Serve to structure datasources.
  • Datasource Here various datasources are optionally available which we will discuss in more detail in the following.

datasources will be configured in two steps. In datasource management, datasources will be created and the basic settings made. For relational databases, here, for instance, user name, password and access URL are stored. However, the specific configuration per use will be set at the point where the datasource will be used (this is mostly with the respective report). Here, for instance, for a relational database the SQL query can be set on which the report is based.

ReportServer supports the following datasources.

4.1. Relational Databases

It is possible to access common relational databases via the datasource type "Relational databases". Use the option "database" to control the SQL dialect created by ReportServer. Currently, ReportServer supports the following SQL dialects :

  • Amazon Redshift
  • DB2
  • Firebird
  • Google BigQuery
  • H2
  • HSQL
  • Informix
  • MariaDB
  • MSSQL
  • MonetDB
  • MySQL
  • Oracle
  • PostgreSQL
  • Sybase
  • Teradata
  • Vertica
Make sure to integrate the respective JDBC database driver prior to use. (For more detailed information refer to the database driver description.) Information on how to set user name and password as well as the URL is given in the data- base manual. In the following example we will demonstrate how to configure a MySQL datasource.

After the initial installation, MySQL can usually be started by entering user name "root" and password "root". A JDBC URL could be as follows:

jdbc:mysql://127.0.0.1:3306/ClassicModels

After having transferred the data, and you wish to test whether ReportServer can establish a database connection, apply the button Test connection. Be aware to always test the saved connection.

The query will actually be configured when selecting the datasource, for instance, if you want to create a dynamic list on the basis of this datasource.

4.2. Amazon Redshift

Amazon Redshift https://aws.amazon.com/redshift/ is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data. It allows you to run complex analytic queries against petabytes of structured data.

ReportServer supports Amazon Redshift by its official JDBC driver, downloadable from https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html. Download the JDBC 4.2-compatible version and install it in your ReportServer lib directory. We strongly recommend using the external Configdir for this, as explained in the Configuration Guide. In such a way, the driver will be separated from the ReportServer libraries and will not get overwritten with your next ReportServer upgrade. At the moment of the writing, the version of the JDBC driver was 1.2.32.1056 (RedshiftJDBC42-no-awssdk-1.2.32.1056.jar).

As stated in the JDBC documentation, you have to obtain your JDBC URL from your AWS. An example URL is jdbc:redshift://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dev. In the ReportServer datasource user and password fields, put your database user and password previously configured in AWS. More information can be found in the Amazon Redshift documentation.

Note: If you get the following error: ''Error setting/closing connection: Not Connected'' try adding to your connection pool's configuration (pool.cf) the following property:

<idleConnectionTestPeriod>5</idleConnectionTestPeriod>
With this property set, c3p0 will test all idle, pooled but unchecked-out connections, every this number of seconds. Add this property to your Redshift datasource. You can find more information on the connection pool in the Configuration Guide.

4.3. Google BigQuery

Google BigQuery https://cloud.google.com/bigquery/ is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

ReportServer supports Google BigQuery datasources by the official Simba JDBC driver. Download the JDBC 4.2-compatible version from the Simba web site (https://cloud.google.com/bigquery/providers/simba-drivers/?hl=en) and add the all .jars in the downloaded zip to your lib directory exluding the following libraries, since these are already included in ReportServer:

  • commons-codec-1.10.jar
  • commons-lang3-3.5.jar
  • guava-26.0-android.jar
  • jackson-core-2.9.6.jar
  • jackson-core-asl-1.9.13.jar
  • jackson-mapper-asl-1.9.13.jar
  • javax.annotation-api-1.2.jar
  • jsr305-3.0.2.jar
  • slf4j-api-1.7.7.jar

At the moment of writing, the Simba JDBC driver had version 1.2.0.1000.

We strongly recommend using the external Configdir for this, as explained in the Configuration Guide. In such a way, the libraries will be separated from the ReportServer libraries and will not get overwritten with your next ReportServer upgrade.

The Simba JDBC driver download includes a configuration guide (Simba JDBC Driver for Google BigQuery Install and Configuration Guide.pdf). The ''Configuring Authentication'' section includes some authentication options; use e.g. the ''Using a Google Service Account'' option.

Your URL should look similar to this:

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=MyBigQueryProject;OAuthType=0;OAuthServiceAcctEmail=bqtest1@data-driver- testing.iam.gserviceaccount.com;OAuthPvtKeyPath=/SecureFiles/ServiceKeyFile.json;

Your username and password fields should be empty in ReportServer, since they are not being used.

In some cases, you may also want to adapt the timeout parameter, you can change it by appending it into the URL:

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=MyBigQueryProject;OAuthType=0;OAuthServiceAcctEmail=bqtest1@data-driver- testing.iam.gserviceaccount.com;OAuthPvtKeyPath=/SecureFiles/ServiceKeyFile.json;Timeout=3600

For more details refer to the Simba JDBC Driver Configuration Guide.

4.4. Teradata

Teradata https://www.teradata.com/ is a fully scalable relational database management system produced by Teradata Corp. It is widely used to manage large data warehousing operations.

ReportServer supports Teradata datasources by the official Teradata JDBC driver. Download the driver from https://downloads.teradata.com/download/connectivity/jdbc-driver and add it to your libs directory. We strongly recommend using the external Configdir for this, as explained in the Configuration Guide. In such a way, the libraries will be separated from the ReportServer libraries and will not get overwritten with your next ReportServer upgrade.

At the moment of writing, the Teradata JDBC driver had version 16.10.00.07.

Your URL should look similar to this: jdbc:teradata://IP/DATABASE=myDatabase

4.5. Storage of Database Passwords

In the ReportServer development we particularly emphasized the safety of the system to the greatest possible extent. One of the main issues was to store sensitive data as securely as possible. Therefore, the datasource settings are of special importance as they provide the potential to access your data warehouse. To store passwords as securely as possible, we follow a two-way strategy. Firstly, database passwords should never be transferred to the client (to your web browser), but only be used to establish database connections. This results in the datasource password field always being empty upon reloading the form. However, you may safely change the datasource, the password will only be reset when you add an entry to the password field. The second safety measure is that ReportServer database passwords will be encrypted when saving. For further information, please refer to the configuration instructions.

4.6. Datasource Pool

datasource connections can be provided in a pool in ReportServer. This can clearly increase the performance and enables to better control the individual connections. If pooling is activated, ReportServer keeps a pre-defined number of connections open per relational database integrated. ReportServer will recycle these connections by user to save the costs incurred for setting up the connection. In the configuration guide you will find detailed information on how to pool databases exactly and which settings to enter.

4.7. CSV Lists

Apart from relational databases, ReportServer can provide data in form of CSV files (Comma-Separated Values; further information on CSV you will find, for example, at http:// en.wikipedia.org/wiki/Comma-separated_values) as data basis for reports. To work with a CSV List, create one. To set the format of your CSV file, use the fields

Quotes: Delimiting characters for an individual data record.
Separator: Separating character between data records

Use the Connector setting to define the location of the CSV data. The following connectors are presently supported:

Text-Connector: Allows to directly enter data in a text field at the datasource.
Argument-Connector: Allows to directly enter data in a text field when selecting the datasource. This enables, for instance, to easily simulate static lists for report parameters (refer to the section on datasource parameters).
URL-Connector: Allows to load data by using an URL. Please observe that you have the option to load the data from the internal ReportServer file system (refer here to the Section File Server).
4.7.1. Database Cache

As described before, prior to its use, CSV data will be loaded to internal temporary tables. This may take some time if you load a larger data volume for the first time. Therefore, it is quite often reasonable not to continuously load the data. Use the Cache database setting to determine after how many minutes the data should be reloaded from the source to the internal temporary tables. If you set -1, the data will be loaded only one time. If you set 0, the data will be reloaded every time you use it.

If you wish to load the data manually (because they were changed) it is sufficient to simply save the datasource again. After every saving process, ReportServer will initiate that the data will be removed from the internal cache and reloaded again.

Please observe that in case of the Argument Connector, ReportServer will ignore the cache setting, so the data will not be cached.

4.7.2. Configuration at the Object

Similar to relational databases, you can make additional settings for CSV datasources at the location where the datasource will be selected (e.g. at the report or parameter). As already explained, CSV data will be buffered to internal, temporary tables. The query type to be used is

SELECT * FROM _RS_TMP_TABLENAME

where _RS_TMP_TABLENAME is a temporary table name assigned by ReportServer. Using the Query Wrapper setting, now you can extend the query created automatically. Here, use the syntax for parameters (see section on report parameters). The following replacements will be available to you:

_RS_TMP_TABLENAME: The name of the table
_RS_QUERY: The basic query.

For instance, by using the following query you could limit the data volume to all those data records where the attribute REGION has value 3.

SELECT * FROM $!{_RS_TMP_TABLENAME} WHERE REGION = "3"

Please observe that any CSV data will generally be treated as if it were of type string. In addition, we want to point out that if you use replacements, you need to use $!{} instead of ${} as replacements need to be directly written into the query (for further information on the replacement syntax refer to Section 6.3. Working with Parameters).

If you wish to use CSV datasources together with Jasper, BIRT, or JXLS reports, refer to the Scripting Documentation, Chapter ''Script Datasources'', as this can be done in an analogous way.

4.8. Script datasources

If you wish to load data which are in a format that has not been supported so far, or if you wish to perform complex pre-processing of data, it is advisable to use script datasources. Script datasources provide data by running a ReportServer script and, therefore, they can be applied very flexibly. Similar to CSV lists, the result of a script datasource will first be buffered in the internal temporary tables. Here as well, you will have the option to define in the Cache database setting how often the data will be reloaded.

Script datasources run a ReportServer script whenever the datasource is accessed. This script will be filed with the datasource and, if selected, it can be parameterized with a report (i.e. parameters can be transferred to the script). The return value of the script must be an object of type RSTableModel (included in the package net.datenwerke.rs.base.service.reportengines.table.output.object). In the following we will give a simple example script which builds up a static table consisting of 3 columns.

import net.datenwerke.rs.base.service.reportengines.table.output.object.*;

def definition = new TableDefinition(['a','b','c'],
					[Integer.class,Integer.class,Integer.class]);

def model = new RSTableModel(definition);
model.addDataRow(1,2,3);
model.addDataRow(4,5,6);
model.addDataRow(7,8,9);

return model;
4.8.1. Configuration at the Object

You can further reduce the data volume by using Query wrapper in the same way as you proceeded with CSV datasources. In addition, you can pass arguments to the script, which can be referred in the script with the args variable. E.g., refer to the following example:

import net.datenwerke.rs.base.service.reportengines.table.output.object.*;

def definition = new TableDefinition(['a','b','c'],
					[String.class,String.class,String.class]);

def model = new RSTableModel(definition);
model.addDataRow(args[0],"2","3");
model.addDataRow("4","5","6");
model.addDataRow("7","8","9");

return model;

The args[0] prints the 0th argument passed to the script. You can either pass a text, e.g.''myValue'', or the value of a given report parameter, e.g. ${myParam} for a ''myParam'' parameter. Note that if the value contains blank spaces, quotation marks are needed.

Refer to the Script Guide, Chapter ''Script Datasources'' for more details. Further, if you wish to use script datasources together with Jasper, BIRT, or JXLS reports, also refer to the Scripting Documentation, Chapter ''Script Datasources''.

4.9. BIRT Report datasource

The BIRT report engine enables to define data records within BIRT Reports. For instance, they can be used for feeding parameters. By using BIRT Report datasources, you can access this data in ReportServer.

As the most frequent application case for BIRT Report datasources will surely be the reading out of parameters, the BIRT datasource will directly be configured at the report. This means you only have to create a datasource in the datasource tree, any further configuration will be entered at the location where it is used. When using the datasource, you eventually have to select the respective BIRT report and enter the name of the data set. Please observe that BIRT provides the option to access so-called "data sets" as well as parameter data. Depending on the origin of the data, you have to set the respective type.

In the same way as you proceeded with CSV and script datasources, you can modify the query by using the Query wrapper configuration.

4.10. Mondrian Datasource

Mondrian is a java-based OLAP engine (Online Analytical Processing) developed by Pentaho (http://mondrian.pentaho.com) allowing you to perform multi-dimensional analysis on your data. For an introduction to OLAP and Mondrian we refer to the Mondrian documentation available online at http://mondrian.pentaho.com/documentation and we assume basic familiarity with Mondrian and OLAP for the following discussion.

Mondrian datasources are used to define so called Mondrian schemas which can then be used by the Mondrian backed Saiku reporting format within ReportServer (see Chapter 6.7. Saiku / Mondrian Reports). The main configuration options of Mondrian datasources are

Properties: The properties define the connection to the underlying relational database.
Schema: The schema describes the data warehouse semantics.

Note: in order to configure your Mondrian instance you can create a "mondrian.properties" file in your WEB-INF/classes directory or modify it if it already exists. In this file you can set the Mondrian properties needed, e.g. "mondrian.rolap.queryTimeout=3" (without the quotes). This property gives you an error if your query runs more than 3 seconds. Refer to https://mondrian.pentaho.com/documentation/configuration.php for all Mondrian configuration options.

When you create a new Mondrian datasource, ReportServer already specifies an example definition for the properties pointing at a MySQL database called foodmart .

type=OLAP
name=Foodmart
driver=mondrian.olap4j.MondrianOlap4jDriver
location=jdbc:mondrian:Jdbc=jdbc:mysql://localhost/foodmart
jdbcDrivers=com.mysql.cj.jdbc.Driver
jdbcUser=
jdbcPassword=

A Mondrian schema defines multi-dimensional data warehouses on top of relational databases that are usually assumed to be managed in a star schema like form. Following is a simple schema definition based on the foodmart demo data and taken from the Mondrian documentation . The schema consists of a single cube called Sales which is made up of two dimensions (Gender and Time) and four measures.

<Schema>
	<Cube name="Sales">
		<Table name="sales_fact_1997"/>
			<Dimension name="Gender" foreignKey="customer_id">
				<Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
					<Table name="customer"/>
					<Level name="Gender" column="gender" uniqueMembers="true"/>
				</Hierarchy>
			</Dimension>
			<Dimension name="Time" foreignKey="time_id">
				<Hierarchy hasAll="false" primaryKey="time_id">
					<Table name="time_by_day"/>
					<Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
					<Level name="Quarter" column="quarter" uniqueMembers="false"/>
					<Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
				</Hierarchy>
			</Dimension>
			<Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
			<Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
			<Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/>
			<CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
				<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
			</CalculatedMember>
	</Cube>
</Schema>

Schema and properties are sufficient for the Mondrian server to query the defined ware house using queries written in the MDX language, an SQL like query language first introduced by Microsoft in 1997 (see, e.g., http://en.wikipedia.org/wiki/Multidimensional_Expressions). In order to use Mondrian within ReportServer you will need to create so called Saiku reports (see Chapter 6.7. Saiku / Mondrian Reports) which provide a beautiful user interface to access a data specified in a cube in a Pivot like fashion.

4.11. Datasource Bundle

The datasource bundle allows you to use the same report for different datasources an have the users select which database to use. To use this feature you first have to define sets or bundles of similar datasources from which a selection can be made. For this, in the datasource manager create a new datasource of type database bundle. After you configured the bundle, instead of using a specific datasource, you use the bundle as the datasource for your report.

The database bundle needs two options to be configured: The Key Provider defines where the key used for the lookup of the actual datasource (the datasource that is selected for a single execution) is taken from. There are two key providers:

Login Key Provider The login dialog contains a dropdown list that allows the user to select the key the bundle uses to lookup the assigned datasource. For this to work properly you have to configure the available values in the /etc/datasources/databasebundle.cf file. Please refer to the Configuration guide for additional information.
Report Parameter Provider One of the report parameters is used to provide the key the bundle uses to lookup the assigned datasource. You also have to enter the parameter-key of the parameter that will be used.

The Mapping provider defines how a datasource gets selected from the key. There are three providers to choose from:

Static Mapping The static mapping allows you to manualy specify a map of keys and associated datasources.
Auto: Ds-Node (by ID) Instead of manually adding all the datasources for the bundle to your mapping this mapping provider automatically chooses the datasource that has an id matching the provided key. The mapping table is used to specify the search path. You can add single datasources or whole folders to your mapping. If you add a folder to your mapping, the datasources must be direct children of the given folder. The key column is ignored in this configuration.
Auto: Ds-Node (by Name) Similar to the previous strategy this mapping provider automatically chooses from a set of datasource without explicitly defining a key for each datasource. Instead of using the id to find a matching datasource, the datasources name is used. If your bundle contains multiple datasources with the same name, the result is undefined.
You can also use a datasource bundle as the datasource for a database parameter. If you use the Report Parameter Key Provider you have to make the parameter that uses the bundle dependent on the paramter that is used as the key source.
4.12. Configuration of a Standard Datasource

For a quick configuration of reports, ReportServer allows to define a default datasource. It can then be configured by a single click at the locations where datasources can be selected. The default datasource can be set up by using the configuration file etc/datasources/data- sources.cf (in the internal file system, refer also to the configuration guide). In the following please find a sample configuration selecting the datasource by name. It can optionally be selected by its ID.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
	<datasource> 
		<defaultDatasourceName>Demodaten</defaultDatasourceName>
	</datasource>
</configuration>

Please observe to run the terminal command config reload when configuration files have been modified. For further information see the ReportServer configuration guide as well as Chapter 12. Terminal.