Magic Pivot Tables

This short video explains how business users can create powerful Pivot Tables using Pentaho, free of IT involvement.

Watch movie

pentaho analyser pivot tables

Pentaho and MS Dynamics Navision

pentaho-dynamics-navision

This video represents the outcome of a proof of concept, showing invoice line items from within Dynamics Navision visible within the Pentaho Analyser:

Watch movie

Pentaho Data Integration 4.0

Pentaho Data Integration 4.0 breaks the Business Intelligence mold by successfully combining data transformation with data interrogation in one seamless product.

Use the graphical tool to pull data from operational data stores (databases, flat files, spreadsheets, web services), transform and link records with pre-defined components then view results using the auto-generated Pivot Table. If you like the results then just publish for all to see on the Pentaho BI Server. Simple!

Watch the Pentaho Data Integration Movie

Excel with Mondrian, Just Crunch IT

Introduction to using Excel with Mondrian covering the key features and benefits. This demonstration uses the Greenplum database to hold the cubes but virtually any database can be used. Millersoft Ltd is happy to help companies with cube design, ETL as well as Mondrian optimisation and configuration.

Open Source Resources

Library

 

Make sure you register for our twitter feed as we will be posting regular open source reviews and demonstrations in our resource centre.

Open source OLAP

cube

At the heart of both Pentaho and Jaspersoft is powerful open source OLAP engine called Mondrian, which enables OLAP capabilities on top of any database. Using this tool you can create Pivot Tables to splice and dice terabytes of data without the huge costs associated with high end businesses intelligence applications like Cognos or Microstrategy.

Millersoft Ltd has long experience with Mondrian, running against both small and huge databases. If you need help realising its capabilities then give us a call.

Greenplum under VMWare

“This new, free version of Greenplum Database gives data analysts access to Greenplum’s high-performance database for large-scale analytical projects outside the enterprise data warehouse (EDW). The Single-Node Edition is a state-of-the-art parallel analytic database, and can participate as a distributed node of Greenplum’s Enterprise Data Cloud — allowing centralized management, data discovery and data sharing across databases.”

The new stand-a-lone version should be a boon for anyone using Postgres for OLAP as it will allow the database to utilise every CPU core. What follows are brief notes to enable users to create a VMWare build running with Jaspersoft (email for futher help or assistance and I’ll update this post):

You can download a copy from:

http://www.greenplum.com/community/downloads/

I used the Red Hat Enterprise Linux 5.x / CentOS 5.x (x86 – 64bit) version

wget

You can download a virgin copy of VMWare CentOS 5 here using a torrent client:

http://torrents.thoughtpolice.co.uk/centos-5.3-x86_64-server.zip.torrent

Remember to open ports 5432 and 8080 during the install

You need to increase your CPU and Memory settings for this VMWare appliance otherwise you will run out of resources

Create a linux user called gpadmin and set this user up so that they can login on the same box without a password, otherwise the installation process will ask you to login 100 times:

http://www.fnode.com/2009/09/how-to-enable-ssh-key-authentication-ssh-login-without-a-password/

Add your hostname to /etc/sysconfig/network

Mine is greenplumx

Add your IP address to /etc/hosts

Mine is 172.16.245.133  greenplumx

Restart networking

Follow the installation instructions for greenplum, a copy of my gp_init_config file is listed below

You can install the standard open source version of jaspersoft (jasperserver-3.5.0-linux-installer.bin) direct from source forge. You will need to run this against the default Mysql database for configuration as Greenplum has DML restrictions. Once installed you can then add in the Greenplum data source and start your analysis. Use this ant task to load the FoodMart demo data into Greenplum:

<target name=”CopyFoodmartFromFile”
description=”Runs a few queries.” >
<java classpathref=”project.classpath” classname=”mondrian.test.loader.MondrianFoodMartLoader” fork=”no”>
<arg value=”-verbose” />
<arg value=”-indexes” />

<arg value=”-jdbcDrivers=org.postgresql.Driver” />
<arg value=”-inputFile=FoodMartCreateData.sql” />
<arg value=”-outputJdbcURL=jdbc:postgresql://172.16.245.133/foodmart” />
<arg value=”-outputJdbcUser=gpadmin” />
<arg value=”-outputJdbcPassword=YOURPASSWORD” />
</java>
</target>

Copy of my gp_init_config file

# FILE NAME: gp_init_config

# A configuration file is needed by the gpinitsystem
# script to tell it how to configure the master and segment
# instances in your Greenplum Database system. This file can be named
# whatever you like, and is referenced when you run gpinitsystem.

################################################
# REQUIRED PARAMETERS
################################################

# A name for the array you are configuring. You can use any name you
# like. Enclose the name in quotes if the name contains spaces.

ARRAY_NAME=”Greenplum Database”

# This specifies the file that contains the list of segment host names
# that comprise the Greenplum system  the master host is assumed to
# be the host from which you are running the script. If the host list
# file does not reside in the same directory where the gpinitsystem
# script is executed, specify the absolute path to the file.

MACHINE_LIST_FILE=/home/gpadmin/single_host_file

# This specifies a prefix that will be used to name the data directories
# of the master and segment instances. The naming convention for data
# directories in a Greenplum Database system is SEG_PREFIX<number>
# where <number> starts with 0 for segment instances and the master
# is always -1. So for example, if you choose the prefix gp, your
# master instance data directory would be named gp-1, and the segment
# instances would be named gp0, gp1, gp2, gp3, and so on.

SEG_PREFIX=gp

# Base port number on which primary segment instances will be
# started on a segment host. If a host has multiple primary segment
# instances, the base port number will be incremented by one for each
# additional segment instance started on that host.

PORT_BASE=50000

# This specifies the data storage location(s) where the script will
# create the primary segment data directories. The script creates a
# unique data directory for each segment instance. If you want multiple
# segment instances per host, list a data storage area for each primary
# segment you want created. The recommended number is one primary segment
# per CPU. It is OK to list the same data storage area multiple times
# if you want your data directories created in the same location. The
# number of data directory locations specified will determine the number
# of primary segment instances created per host.
# You must make sure that the user who runs gpinitsystem (for example,
# the gpadmin user) has permissions to write to these directories. You
# may want to create these directories on the segment hosts before running
# gpinitsystem and chown them to the appropriate user.

declare -a DATA_DIRECTORY=(/dbfast1 )
# declare -a DATA_DIRECTORY=(/gp_primary /gp_primary)

# The host name of the Greenplum Database master instance.

MASTER_HOSTNAME=greenplumx

# The location where the data directory will be created on the
# Greenplum master host.
# You must make sure that the user who runs gpinitsystem
# has permissions to write to this directory. You may want to
# create this directory on the master host before running
# gpinitsystem and chown it to the appropriate user.

MASTER_DIRECTORY=/master

# The port number for the master instance. This is the port number
# that users and client connections will use when accessing the
# Greenplum Database system.

MASTER_PORT=5432

# The shell the gpinitsystem script uses to execute
# commands on remote hosts. Allowed value is ssh. You must set up
# your trusted host environment before running the gpinitsystem
# script. You can use gpssh-exkeys to do this.

TRUSTED_SHELL=ssh

# Maximum distance between automatic write ahead log (WAL)
# checkpoints, in log file segments (each segment is normally 16
# megabytes). This will set the checkpoint_segments parameter
# in the postgresql.conf file for each segment instance in the
# Greenplum Database system.

CHECK_POINT_SEGMENTS=8

# The character set encoding to use. Greenplum supports the
# same character sets as PostgreSQL. See ‘Character Set Support’
# in the PostgreSQL documentation for allowed character sets.
# Should correspond to the OS locale specified with the
# gpinitsystem -n option.

ENCODING=UNICODE

################################################
# OPTIONAL PARAMETERS
################################################

# Optional. Uncomment to create a database of this name after the
# system is initialized. You can always create a database later using
# the CREATE DATABASE command or the createdb script.

DATABASE_NAME=warehouse

################################################
# OPTIONAL PARAMETERS FOR SEGMENT MIRRORING
################################################

# Uncomment these parameters to set up mirroring at initialization.
# If you are using multiple network interfaces per segment host,
# do NOT set up mirrors at initialization time. Do so afterwards
# using gpaddmirrors

# The base port number on which mirror segment instances will be
# started on a segment host. If a host has multiple mirror segment
# instances, the base port number will be incremented by one for
# each additional mirror segment instance started on that host.
# Be sure to use a different number than the primary PORT_BASE.

#MIRROR_PORT_BASE=60000

# The data directory where mirror segment instances will be
# created on a host. There must be the same number of data directories
# declared for mirror segment instances as for primary segment instances
# (see the DATA_DIRECTORY parameter for more information).

# You must make sure that the user who runs gpinitsystem
# has permissions to write to these directories. You may want to
# create these directories on the segment hosts before running
# gpinitsystem and chown them to the appropriate user.

#declare -a MIRROR_DATA_DIRECTORY=(/dbfast3 /dbfast4)
#declare -a MIRROR_DATA_DIRECTORY=(/gp_mirror /gp_mirror)