Thursday, March 31, 2011

Fedora Core 14 Mahout Installation



Background:
This blog post is inspired by the need of a coworker of mine to get Mahout up and running on one of our development machines (OS: FC14). There is no content here that can not be found at other locations. In particular you can find more detailed information from the Hadoop common documentation page: http://hadoop.apache.org/common/docs/current/single_node_setup.html
and at the mahout project page: http://mahout.apache.org/

We are starting with a Fedora Core 14 32bit guest machine running in VMWare Workstation. Host is Windows XP SP3. All libraries required to get VMWare tools installed and running have been pre-installed so there may be some dependencies already in place that are not listed. The Fedora core instance is fully updated as of today. SELinux has also already been disabled.

First Step: Install Hadoop Common in Single Node Distrubuted Mode:
The end goal is to run Mahout but we will need to first install an instance of Hadoop that our Mahout can run on top of.

Looking at the Hadoop system requirements I can see that Java 1.6x or greater is required as well as ssh and the sshd daemon, lets get those out of the way and setup first.

Java 1.6x:
sudo yum install java-1.6.0-openjdk

ssh and sshd:
sudo yum install openssh openssh-server

Let's start the ssh daemon and make sure that sshd starts up automatically:
sudo /etc/init.d/sshd start
sudo chkconfig --add sshd
sudo chkconfig --levels 2345 sshd on

Next lets setup the JAVA_HOME environment variable so that it is always available for all users. First thing we need to do is figure out where the JAVA_HOME is:
which java
/usr/bin/java


K, dimes to dollars says that this is a link so let's find out where it is pointing.
ls -la /usr/bin/java
/usr/bin/java -> /etc/alternatives/java


K, dimes to dollars this is also a link, one more time:
ls -la /etc/alternatives/java
/etc/alternatives/java -> /usr/lib/jvm/jre-1.6.0-openjdk/bin/java


OK, so now we know that our JAVA_HOME environment variable should be set to: /usr/lib/jvm/jre-1.6.0-openjdk/ . Let's get this done in a friendly way for all users.
sudo touch /etc/profile.d/java_home.sh
sudo su
echo "export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk/" >> /etc/profile.d/java_home.sh
exit
source /etc/profile.d/java_home.sh


Alright, now go ahead and download hadoop from the following location:
http://hadoop.apache.org/common/releases.html . I am going to go ahead and install Hadoop version 0.20.2. As a fair warning to you. As of right now Mahout is at version 0.4 - 0.5 and Hadoop is at version 0.21.0. Although Mahout documentation states that it works with anything past hadoop version 0.20.2 this is not true, I have tried it with version 0.21.0 but received a linking error when I tried to run the examples. Hadoop version 0.20.0 does work though with this version of Mahout.
You can choose to install Hadoop anywhere you would like, I am going to choose to install it in /usr/local/ as a matter of personal preference. If you would like to install it in another location just make sure to change your path uniformly.
OK, once you have the package downloaded change directory to the download directory and:
tar xof hadoop-0.20.2.tar.gz
sudo mv hadoop-0.20.2 /usr/local/


Now we need to set the HADOOP_HOME variable as before:
sudo touch /etc/profile.d/hadoop.sh
sudo su
echo "export HADOOP_HOME=/usr/local/hadoop-0.20.2" >> /etc/profile.d/hadoop.sh
exit
source /etc/profile.d/hadoop.sh


One more thing we want to do before handing you off to the official documentation is to setup passphraseless ssh login (following borrows heavily from the official documents):
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys


At this point you are safe going to the Apache Common installation documentation and configuring your system in 'Pseudo-Distributed Mode'.
Please follow the instructions available here: http://hadoop.apache.org/common/docs/current/single_node_setup.html

Now on to Mahout!:

Mahout documentation states that it requires Maven2 and it looks like it will also require a copy of subversion (no git?) to checkout the project from source control. Lets get these requirements out of the way first.
sudo yum install maven2
sudo yum install subversion


Now lets checkout the Mahout project and move it to /usr/local/ directory (same disclaimer as above).
svn co http://svn.apache.org/repos/asf/mahout/trunk mahout
sudo mv mahout /usr/local


Same as above, we need to now set the MAHOUT_HOME environment variable.
sudo su
echo "export MAHOUT_HOME=/usr/local/mahout" >> /etc/profile.d/hadoop.sh
exit
source /etc/profile.d/hadoop.sh


OK, now lets run the compiler and installers. A note here, each compile process has optional unit tests that run. For me some of these unit tests are failing and if this happens for you you may want to investigate. When the unit tests do not fail they can take a very long time (We are talking like go get coffee and/or lunch long time). If you would like to make sure the unit tests do not run for whatever reason just pass -DskipTests=true to maven. Enough talking, lets go.
cd $MAHOUT_HOME
sudo mvn -DskipTests=true install
cd core
sudo mvn compile
sudo mvn install

Validate Installation, Run an Example:

To validate that everything is running correctly we are going to run the example listed here:
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data

<Section to be finished later>

One More Gotcha!:
If you are like me then you are used to retrieving hadoop results directly as text files. It turns out that mahout natively stores results in a byte format that is unreadable. In order to get the data you are trying to view into a text format that you can open in the editor of your choice you will need to first convert these files. More information on this process can be found here: https://cwiki.apache.org/MAHOUT/cluster-dumper.html

Fin:
We now have a running instance of mahout that can be used for development level machine learning. As always if anyone has any questions or comments please make sure to post them.
Thank you.

Monday, March 7, 2011

Teradata ODBC, Python pyodbc on Fedora Core 14 x32

Background:


This posting is an extension of an earlier posting and contains much of the same content starting out. Instead of trying to get the example programs to compile and run our objective is to get python working with ODBC through the pyodbc project.


The motivation of this project has to do with a desire of a colleague of mine to do development work on an installation of Fedora Core 14.

Keep in mind that FC is not at this time a supported distro for Teradata!

We are starting with a Fedora Core 14 32bit guest machine running in VMWare Workstation. Host is Windows XP SP3. All libraries required to get VMWare tools installed and running have been pre-installed so there may be some dependencies already in place that are not listed. The Fedora core instance is fully updated as of today. SELinux has also already been disabled.



Lets Go! - Download Driver Package:

Alright, first thing is first, lets start out by downloading the Teradata drivers. Go to:
www.teradata.com/downloadcenter/
Follow ODBC -> Linux and download the correct package (TTU 13.10 LINUX-INDEP tdodbc.13.10.00.01 in my case)

Keep going until you download the .tar.gz package (tdodbc__LINUX_INDEP.13.10.00.01-1.tar.gz)

Make sure to first read the accompanying README file before proceeding it will detail any dependencies that *may* need to be installed on your OS. Since FC is not supported we cannot rely on this documentation to be all inclusive but it can provide great hints.

Once you have downloaded the drivers, move into the directory holding the downloaded file. Create a directory to hold expanded files and then untar the archive:


[cj@fc14 Downloads]$ mkdir /tmp/td
[cj@fc14 Downloads]$ mv tdodbc__LINUX_INDEP.13.10.00.01-1.tar.gz /tmp/td/
[cj@fc14 Downloads]$ cd /tmp/td
[cj@fc14 td]$ tar xof tdodbc__LINUX_INDEP.13.10.00.01-1.tar
.gz

You should have now have 3 component archives and readme files in your working directory.
tdodbc*
tdicu*
TeraGSS*

TeraGSS has redhat and suse versions.

Take a moment out to remember that FC is a Redhat OS and expand each archive:

[cj@fc14 td]$ tar xof tdicu__linux_indep.13.10.00.00-1.tar.gz
[cj@fc14 td]$ tar xof TeraGSS_redhatlinux-i386__linux_i386.13.10.00.02-1.tar.gz
[cj@fc14 td]$ tar xof tdodbc__linux_x64.13.10.00.01.tar.gz


Installing Teradata Driver Packages:

Now install the tdicu package:

[cj@fc14 td]$ cd tdicu
[cj@fc14 tdicu]$ sudo rpm -ihv tdicu-13.10.00.00-1.noarch.rpm
 


Output:
Adding TD_ICU_DATA environment variable to /etc/profile file.
Adding TD_ICU_DATA environment variable to /etc/csh.login file.

Since the package updated /etc/profile, lets first load the profile changes into our shell (just in case):
[cj@fc14 tdicu]$ source /etc/profile

Next lets install TeraGSS:
[cj@fc14 tdicu]$ cd ../TeraGSS/
[cj@fc14 TeraGSS]$ sudo rpm -ihv TeraGSS_redhatlinux-i386-13.10.00.02-1.i386.rpm

Output:
Preparing...                ########################################### [100%]
   1:TeraGSS_redhatlinux-i38########################################### [100%]
/usr/teragss/redhatlinux-i386/13.10.00.02/bin/tdgssconfig: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory

Alright, so we are missing a dependecy, lets use yum to install that:
[cj@fc14 TeraGSS]$ sudo yum provides libstdc++-libc6.2-2.so.3

Output:
compat-libstdc++-296-2.96-143.i686 : Compatibility 2.96-RH standard C++
                                   : libraries
Repo        : fedora
Matched from:
Other       : libstdc++-libc6.2-2.so.

[cj@fc14 TeraGSS]$ sudo yum install compat-libstdc++-296-2.96-143.i686

Finally lets install the tdodbc package:
[cj@fc14 td]$ cd TeraGSS
[cj@fc14 TeraGSS]$ cd ../tdodbc
[cj@fc14 tdodbc]$ sudo rpm -ihv tdodbc-13.10.00.01-1.noarch.rpm

Output:
/var/tmp/rpm-tmp.9paHwf: /opt/teradata/client/13.10/odbc_32/bin/set_default_version: /usr/bin/ksh: bad interpreter: No such file or directory

Hmmm... its looking for korn shell, lets install that package from yum:
[cj@fc14 tdodbc]$ sudo yum install ksh

Really quick lets see where FC has put ksh:
[cj@fc14 tdodbc]$ which ksh

Output:
/bin/ksh

That is not going to work for us because the package is looking for ksh in /usr/bin/ksh , lets go ahead and create a link so that the Teradata installer can work:
[cj@fc14 tdodbc]$ sudo ln -s /bin/ksh /usr/bin/ksh

Now lets uninstall the tdodbc package and reinstall it to make sure we don't run into any more problems:
[cj@fc14 tdodbc]$ sudo rpm -e tdodbc
[cj@fc14 tdodbc]$ sudo rpm -ihv tdodbc-13.10.00.01-1.noarch.rpm

Perfect!



Python ODBC, PyODBC installation:


Looking through the documentation we can see that we will need to pre-install the gcc compiler and the unixODBC-devel  package, lets get that done.
[cj@fc14 ~]$ sudo yum install gcc unixODBC-devel



Next we need to install the pyodbc package so that we can access our database natively from within Python. The pyodbc project can be accessed from here: http://code.google.com/p/pyodbc/


Make sure you go to the downloads section and get the latest pyodbc package. At the time of this writing that is pyodbc-2.1.8.


For Linux you will want to download the source installation package (for me this is: pyodbc-2.1.8.zip). From the command line move to the directory you have saved the pyodbc archive to and unzip the archive.

[cj@fc14 ~]$ cd ~/Downloads/
[cj@fc14 Downloads]$ unzip pyodbc-2.1.8.zip


Now change into the pyodbc directory and attempt to build and install pyodbc:

[cj@fc14 Downloads]$ cd pyodbc-2.1.8
[cj@fc14 pyodbc-2.1.8]$ sudo python setup.py build

Output:
gcc: error trying to exec 'cc1plus': execvp: No such file or directoryerror: command 'gcc' failed with exit status 1

Oops, we need g++ installed, lets do that:
[cj@fc14 pyodbc-2.1.8]$ sudo yum install gcc-c++

Let's try and build that pyodbc package again:
[cj@fc14 pyodbc-2.1.8]$ sudo python setup.py build

Output:
fatal error: Python.h: No such file or directory

K, we need to install the Python development package, lets do that also:
[cj@fc14 pyodbc-2.1.8]$ sudo yum install python-devel

Awesome, that works, now lets go ahead and install the package:
[cj@fc14 pyodbc-2.1.8]$ sudo python setup.py install

Good, that all worked straight off.

odbc.ini and odbcinst.ini:
Before working further we need to configure ODBC further so that it can find the Teradata drivers. There are 2 main configuration files that you can modify to perform this configuration. The odbcinst.ini and odbc.ini files. There are plenty of resources online which will describe better than I can the breadth of these configuration and I would encourage anyone to look into it a little bit.

For our purpose though Teradata conveniently generates samples of these files that work well out of the box. Go ahead and run the following commands:
[cj@fc14 ~]$ sudo updatedb
[cj@fc14 ~]$ locate odbc.ini

Output:
/opt/teradata/client/13.10/odbc_32/odbc.ini
/opt/teradata/client/ODBC_32/odbc.ini

Go ahead and copy the first sample file found into your home directory as .odbc.ini (a '.' in front of the file name makes it 'private' or 'invisible'). Similarly copy the odbcinst.ini sample file to .odbcinst.ini within your home directory:
[cj@fc14 ~]$ cp /opt/teradata/client/13.10/odbc_32/odbc.ini ~/.odbc.ini
[cj@fc14 ~]$ cp /opt/teradata/client/13.10/odbc_32/odbcinst.ini ~/.odbcinst.ini

Connecting to Teradata DB using PyODBC:

To test pyodbc we are going to create a small sample file (test.py) that will simply make a connection to our Teradata database and perform a select statement.

Lets create and edit our sample file (Insert your own username and password values):
[cj@fc14 ~]$ touch test.py
[cj@fc14 ~]$ chmod +x test.py
[cj@fc14 ~]$ vim test.py
The contents of the sample (test.py) file will look like this:

#!/usr/bin/env pythonimport pyodbcfrom pprint import pprintcnx = pyodbc.connect("DRIVER={Teradata};DBCNAME=ip_or_fqdn;UID=user;PWD=password")cursor = cnx.cursor()cursor.execute('SELECT * FROM dbc.dbcinfo')rows = cursor.fetchall()for row in rows:        pprint(row)

OK, let's try running our sample program:
[cj@fc14 ~]$ ./test.py

Output:
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/opt/teradata/client/13.10/odbc_32/lib/tdata.so

K. Once again in my life I am not going to reveal how long this takes to figure out but I will tell you that tdata.so is present and accessible. The problem being reported has to do with a tdata.so having a missing dependency. Let's figure out what is missing and install it:
[cj@fc14 ~]$ ldd /opt/teradata/client/13.10/odbc_32/lib/tdata.so

Output:
libstdc++.so.5 => not found

(Use 'yum provides' to find the package that will install that library)
[cj@fc14 ~]$ sudo yum install compat-libstdc++-33-3.2.3-68.i686

Alright, once more lets go ahead and run our test program to see if we have it working:
[cj@fc14 ~]$ ./test.py

Output:
pyodbc.Error: ('200', '[200] [unixODBC][eaaa[DCTrdt rvr nbet e aao tig (0) (SQLDriverConnectW)')

OK, that's not going to work. That doesn't even look like english or anything else that pretty much looks like random memory being printed out to the screen....
Again, we are going to skip over an hour of 2 of investigation in regards to how this is tracked down. But, it turns out that Teradata has it's own version of odbc drivers and requires these libraries in order to function correctly. If you remember back we installed the unixODBC-devel package and those are the libraries that pyodbc is currently trying to read from, the difference in expected package and linked package is what is causing the problem. To fix this we need to destroy existing links within our /usr/lib directory and create links to our [Teradata]/lib directory:

[cj@fc14 lib]$ sudo rm libodbc.so
[cj@fc14 lib]$ sudo rm libodbc.so.2
[cj@fc14 lib]$ sudo rm libodbcinst.so

[cj@fc14 lib]$ sudo ln -s /opt/teradata/client/13.10/odbc_32/lib/libodbc.so libodbc.so
[cj@fc14 lib]$ sudo ln -s /opt/teradata/client/13.10/odbc_32/lib/libodbc.so libodbc.so.2
[cj@fc14 lib]$ sudo ln -s /opt/teradata/client/13.10/odbc_32/lib/libodbcinst.so libodbcinst.so


Alright, let's run that test program once more:
[cj@fc14 ~]$ ./test.py

Output:
Fatal Python error: Unable to set SQL_ATTR_CONNECTION_POOLING attribute.
Aborted (core dumped)

Well at least we are back in english... :-(
So we need to turn off connection pooling, in pyodbc we can accomplish this by adding the following line:
pyodbc.pooling = False
To the begining of our test program, test.py now looks like this:

#!/usr/bin/env pythonimport pyodbcfrom pprint import pprint
pyodbc.pooling = False
cnx = pyodbc.connect("DRIVER={Teradata};DBCNAME=ip_or_fqdn;UID=user;PWD=password")
cursor = cnx.cursor()cursor.execute('SELECT * FROM dbc.dbcinfo')rows = cursor.fetchall()for row in rows:        pprint(row)

Running our test program once more:
[cj@fc14 ~]$ ./test.py

Output:
pyodbc.Error: ('HY000', '[HY000] [DataDirect][ODBC lib] Unicode converter buffer overflow (0) (SQLDriverConnectW)')

Hmm doesn't like unicode either. We can disable this also by adding the 'ansi' flag to our pyodbc.connect() call. We need to set 'ansi' to True in this case to force pyodbc to try and connect using the non-unicode connection calls.

Our updated test.py file looks like this:

#!/usr/bin/env pythonimport pyodbcfrom pprint import pprint
pyodbc.pooling = False
cnx = pyodbc.connect("DRIVER={Teradata};DBCNAME=ip_or_fqdn;UID=user;PWD=password", ansi=True)
cursor = cnx.cursor()cursor.execute('SELECT * FROM dbc.dbcinfo')rows = cursor.fetchall()for row in rows:        pprint(row)

OK, lets run our test program once more:
[cj@fc14 ~]$ ./test.py

Output:
pyodbc.Error: ('HY000', '[HY000] [Teradata][ODBC Teradata Driver] Major Status=0x04bd Minor Status=0x20800002-[terasso]Cannot load TDGSS library. (0) (SQLDriverConnect)')

From a previous post where we got the C/C++ samples up and running we know to do the following in order to clear this error:
[cj@fc14 ~]$ sudo /opt/teradata/teragss/redhatlinux-i386/13.10.00.02/bin/run_tdgssconfig

Once that has finished executing we try once more to get our test program working:
[cj@fc14 ~]$ ./test.py

Output:
('RELEASE', '12.00.03.15')('VERSION', '12.00.03.17d')('LANGUAGE SUPPORT MODE', 'Standard')

Fin:

Well there we have it, we have verified that we can at least execute SELECT statements against our Teradata database using Python and the pyodbc package.
Please feel free to leave any comments if you run into any issues not presented in this post, assuming I have the time I am usually more than happy to help figure something out.

Wednesday, March 2, 2011

Teradata Client - Fedora Core 14 Build Example C++

Background:
The motivation of this project has to do with a desire of a colleague of mine to do development work on an installation of Fedora Core 14. The headache of this process has been left out. This is the abbreviated version that will show the process involved in getting a connection up and going as fast as possible.

Keep in mind that FC is not at this time a supported distro for Teradata!

We are starting with a Fedora Core 14 32bit guest machine running in VMWare Workstation. Host is Windows XP SP3. All libraries required to get VMWare tools installed and running have been pre-installed so there may be some dependencies already in place that are not listed. The Fedora core instance is fully updated as of today. SELinux has also already been disabled.

It's worth mentioning that we are listening to many Jay-Z albums while writing / figuring this out.

Lets Go! - Download Driver Package

Alright, first thing is first, lets start out by downloading the Teradata drivers. Go to:
www.teradata.com/downloadcenter/
Follow ODBC -> Linux and download the correct package (TTU 13.10 LINUX-INDEP tdodbc.13.10.00.01 in my case)

Keep going until you download the .tar.gz package (tdodbc__LINUX_INDEP.13.10.00.01-1.tar.gz)

Make sure to first read the accompanying README file before proceeding it will detail any dependencies that *may* need to be installed on your OS. Since FC is not supported we cannot rely on this documentation to be all inclusive but it can provide great hints.

Once you have downloaded the drivers, move into the directory holding the downloaded file. Create a directory to hold expanded files and then untar the archive:


[cj@fc14 Downloads]$ mkdir /tmp/td
[cj@fc14 Downloads]$ mv tdodbc__LINUX_INDEP.13.10.00.01-1.tar.gz /tmp/td/
[cj@fc14 Downloads]$ cd /tmp/td
[cj@fc14 td]$ tar xof tdodbc__LINUX_INDEP.13.10.00.01-1.tar.gz

You should have now have 3 component archives and readme files in your working directory.
tdodbc*
tdicu*
TeraGSS*

TeraGSS has redhat and suse versions.

Take a moment out to remember that FC is a Redhat OS and expand each archive:

[cj@fc14 td]$ tar xof tdicu__linux_indep.13.10.00.00-1.tar.gz 
[cj@fc14 td]$ tar xof TeraGSS_redhatlinux-i386__linux_i386.13.10.00.02-1.tar.gz
[cj@fc14 td]$ tar xof tdodbc__linux_x64.13.10.00.01.tar.gz

Installing Teradata Driver Packages:

Now install the tdicu package:

[cj@fc14 td]$ cd tdicu
[cj@fc14 tdicu]$ sudo rpm -ihv tdicu-13.10.00.00-1.noarch.rpm 

Output:
Adding TD_ICU_DATA environment variable to /etc/profile file.
Adding TD_ICU_DATA environment variable to /etc/csh.login file.

Since the package updated /etc/profile, lets first load the profile changes into our shell (just in case):
[cj@fc14 tdicu]$ source /etc/profile

Next lets install TeraGSS:
[cj@fc14 tdicu]$ cd ../TeraGSS/
[cj@fc14 TeraGSS]$ sudo rpm -ihv TeraGSS_redhatlinux-i386-13.10.00.02-1.i386.rpm

Output:
Preparing...                ########################################### [100%]
   1:TeraGSS_redhatlinux-i38########################################### [100%]
/usr/teragss/redhatlinux-i386/13.10.00.02/bin/tdgssconfig: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory

Alright, so we are missing a dependecy, lets use yum to install that:
[cj@fc14 TeraGSS]$ sudo yum provides libstdc++-libc6.2-2.so.3

Output:
compat-libstdc++-296-2.96-143.i686 : Compatibility 2.96-RH standard C++
                                   : libraries
Repo        : fedora
Matched from:
Other       : libstdc++-libc6.2-2.so.

[cj@fc14 TeraGSS]$ sudo yum install compat-libstdc++-296-2.96-143.i686

Finally lets install the tdodbc package:
[cj@fc14 td]$ cd TeraGSS
[cj@fc14 TeraGSS]$ cd ../tdodbc
[cj@fc14 tdodbc]$ sudo rpm -ihv tdodbc-13.10.00.01-1.noarch.rpm

Output:
/var/tmp/rpm-tmp.9paHwf: /opt/teradata/client/13.10/odbc_32/bin/set_default_version: /usr/bin/ksh: bad interpreter: No such file or directory

Hmmm... its looking for korn shell, lets install that package from yum:
[cj@fc14 tdodbc]$ sudo yum install ksh

Really quick lets see where FC has put ksh:
[cj@fc14 tdodbc]$ which ksh

Output:
/bin/ksh

That is not going to work for us because the package is looking for ksh in /usr/bin/ksh , lets go ahead and create a link so that the Teradata installer can work:
[cj@fc14 tdodbc]$ sudo ln -s /bin/ksh /usr/bin/ksh

Now lets uninstall the tdodbc package and reinstall it to make sure we don't run into any more problems:
[cj@fc14 tdodbc]$ sudo rpm -e tdodbc
[cj@fc14 tdodbc]$ sudo rpm -ihv tdodbc-13.10.00.01-1.noarch.rpm

Perfect!

Building Example C++ Program:

Now, lets skip a bunch of explanation and get to the point.
The ODBC driver does not work in FC (again, you can read more if you would like once I post the blog). But, we can reach the teradata database using the C++ example provided by the good folks at Teradata. Lets take a look, go to the sample directory provided by the ODBC package, lets look at the C++ example:
[cj@fc14 tdodbc]$ cd /opt/teradata/client/13.10/odbc_32/samples/C++/

A makefile is provided so lets run build the example using make
[cj@fc14 C++]$ sudo make

Output:
/usr/bin/g++   -m32 -Wno-deprecated  -DLINUX -DVG_UNIX  -DODBCVER=0x0350    -I/opt/teradata/client/13.10/odbc_32/include -c -o ./adhoc.o adhoc.cpp
make: /usr/bin/g++: Command not found

K, we need to install the gnu c++ compiler:
[cj@fc14 C++]$ sudo yum install gcc-c++

One more time lets try running make:
[cj@fc14 C++]$ sudo make

Output:
g++: /usr/lib/libstdc++.so.5: No such file or directory

Using the method above again we find out that compat-libstdc++-33-3.2.3-68.i686 provides the libstdc++.so.5 library required:
[cj@fc14 C++]$ sudo yum install compat-libstdc++-33-3.2.3-68.i686

One more time on that make:
[cj@fc14 C++]$ sudo make

All looks good, the executable has been build and we now have the 'adhoc' executable. Lets go ahead and run it and see if we can connect (fill in all required fields when prompted):

Output:
{error} STATE=HY000, CODE=0, MSG=[Teradata][ODBC Teradata Driver] Major Status=0x04bd Minor Status=0x20800002-[terasso]Cannot load TDGSS library.

Hmmm.... Lets take a look at what this executable is trying to access since 'TDGSS library' is not inspiring my mind to a solution.

We are going to need to try and get a better look at what the adhoc program is trying to access. The easiest way to do this is to install strace and watch program accesses:

[cj@fc14 C++]$ sudo yum install strace
Once the install is finished re-run the adhoc program with strace:
[cj@fc14 C++]$ strace ./adhoc 2>&1 | less

Output:
stat64("/usr/teragss/redhatlinux-i386/client/etc/tdgssconfig.bin", 0xbf77613c) =
 -1 ENOENT (No such file or directory)

Ahh, OK so adhoc is looking for the presence of a file called tdgssconfig.bin that does not exist on the OS. I'm not really going to go into detail regarding how long this took me to figure out but the solution is to do the following:
[cj@fc14 C++]$ sudo /opt/teradata/teragss/redhatlinux-i386/13.10.00.02/bin/run_tdgssconfig

Output:
 Output has been written to Binary file "/opt/teradata/teragss/redhatlinux-i386/13.10.00.02/bin/../etc/tdgssconfig.bin"

Awesome! Now lets go ahead and run the adhoc program again:
[cj@fc14 C++]$ ./adhoc

Output:
...ODBC connection successful.
ODBC version        = -03.52.0000-
DBMS name           = -Teradata-
DBMS version        = -12.00.0317  12.00.03.17D-
Driver name         = -tdata.so-
Driver version      = -13.10.00.01-
Driver ODBC version = -03.51-
Enter SQL string:
 
FIN:

Congratulations to us, this is awesome we now have an executable on a FC 14 machine that is capable of talking with our Teradata SQL server.

A big thank you to my girlfriend for giving me the time to share this garbage this evening, and a thank you to Kanye West for offering me the Jam for the whole process.

Monday, February 28, 2011

Installing MySQL ODBC Drivers on Fedora Core 14 (32 bit) Python

Background:
We are working with a virtual machine (VMWare Workstation) running Fedora Core 14 as a Guest Operating System (Windows XP SP Whatever is host).
VMWare tools has already been installed to make my life less tedious, but you should keep in mind that there are possible requirements that are covered by that installation process that wont show up here.

MySQL:

First lets install an instance of mysql and get it running:
( Always assume that I'm allowing yum to install all dependencies).
> sudo yum install mysql mysql-server

Now that we have
> sudo /etc/init.d/mysqld start

Now if you were paying attention you should have caught the following snippet coming out to terminal:

PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !To do so, start the server, then issue the following commands:
/usr/bin/mysqladmin -u root password 'new-password

So we need to do that next:
> sudo mysqladmin -u root password 'testing'

Now log into mysql:
> mysql -u root -p

.

Lets create a test datbase with a test table and some testing data:
mysql> CREATE DATABASE testing;
mysql> USE testing;
mysql> CREATE TABLE IF NOT EXISTS testing(c1 int, c2 text);
mysql> INSERT INTO testing VALUES(1, 'monkey soup');
mysql> \q

PyODBC:
Next we need to install the pyodbc libraries for python. check out the pyodbc site: http://code.google.com/p/pyodbc/
Looking through the documentation we can see that we will need to pre-install the gcc compiler and the unixODBC-devel  package, lets get that done.
> sudo yum install gcc unixODBC-devel

RightO! We have that out of the way, so now lets get the source (pyodbc-2.1.8.zip in my case) and try to build/install pyodbc:


> unzip pyodbc-[version].zip
> cd pyodbc-[version]
> sudo python setup.py build


Crap:
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: command 'gcc' failed with exit status 1

No sweat, googling the error, we need gcc for c++, simple enough:
> sudo yum install gcc-c++

Try this ish again!
> sudo python setup.py build

DAMNIT!:

fatal error: Python.h: No such file or directory

Easy enough, install python-devel (just from experience, just do it):
> sudo yum install python-devel

One more time baby!:
> sudo python setup.py build

Nice, one weird thing down in a surely painful journey... Now install
> sudo python setup.py install
Nice!

Python, ODBC, PyODBC:
Alright, now we need to create a test script so that we can make sure everything is working as expected.
> touch test.py
> chmod +x test.py
> vim test.py

test.py is going to look like this:

#!/usr/bin/env python
import pyodbc
from pprint import pprint
cnxn = pyodbc.connect("DRIVER={MySQL};SERVER=127.0.0.1;DATABASE=testing;UID=root;PWD=testing")
cursor = pyodbc.cursor()
cursor.execute("SELECT * FROM testing")
rows = cursor.fetchall()
for r in rows:
        pprint(r)
cursor.execute("INSERT INTO testing VALUES(2,'cheddar cheese')")



Alright, lets go!
> ./test.py

Of course it would be too much to ask for that to work, lets see what we have:

pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/usr/lib/libmyodbc5.so' : file not found (0) (SQLDriverConnectW)")

This is gonna take a second ... ... ... (30 minutes-ish later) ...
K, so we need to install the mysql-connector-odbc package
> sudo yum install mysql-connector-odbc

So lets run test.py again ...
> ./test.py

There we go everything works for MySQL, this is good we have verified that in some sense ODBC is up and running on our system and we can connect to a MySQL database.