Talend is an Open Source data integration platform which is most widely used in today’s World of Big data. It helps data scientists to effortlessly tune their large amount of raw data into business insights. It also helps in automating the tasks using the tools which are available for free to speed up their processes. Talend is a cross platform application which is available both for Linux and Windows operating systems, Mac and Solaris as well. It comes with an Enterprise and community edition backed by a strong community of its users.
Following in this article we are going to show you its installation and use on a Linux Operating System which is CentOS 7 with GUI in our case.
Prerequisites:
Before starting, let’s make some points to take the start that we have a Linux system running with CentOS 7 Desktop. Make sure to install the updates and security harden your system.
To install the system updates, run below command with a user with ‘sudo’ rights as below.
1 |
$ sudo yum update –y |
Once the updates are complete, we can move to the next step to get the Talend package.
Download Talend Package:
To get the download package for the latest Talend Open Studio go to its website link https://www.talend.com/free-trial/ get yourself register by providing some of the information about yourself. Then click on to get the free trial.
You will get the Talend Download link in your Email that you have provided during the registration.
Click to download the application or simply click Access Software, your Talend package will be downloaded. Take it to your system and extract the archived package using ‘unzip’ command as below.
1 2 3 |
$ cd /TOS $ unzip TOS_DI-20200219_1130-V7.3.1.zip |
Configure Talend JVM Parameters:
In order to use Talend, make sure that you have Java installed on your system. If Java is not already installed, you can do so using the ‘yum’ command.
1 |
$ sudo yum install java -y |
If you have already java installed, you can check its version by using below command.
1 2 3 4 5 6 7 |
$ java –version openjdk version "1.8.0_292" OpenJDK Runtime Environment (build 1.8.0_292-b10) OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode) |
Which is 1.8.0_292 is our case.
Let’s switch to the extracted Talend folder and update the JVM Xms and Xmx as per required using its .ini file for Linux.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
$ cd TOS_DI-20200219_1130-V7.3.1 $ vim TOS_DI-linux-gtk-x86_64.ini -vmargs -Xms512m -Xmx4096m -Dfile.encoding=UTF-8 -Dosgi.requiredJavaVersion=1.8 -XX:+UseG1GC -XX:+UseStringDeduplication -XX:MaxMetaspaceSize=1024m |
Here we have updated Xmx value to 4GB and MaxMetaspaceSize to 1GB. Now save and close the file to start the Talend application as in next step.
Starting Talend Application:
To start the talend application, execute its shell script and you will get its initial setup running as shown below.
Accept the License agreement to move forward for creating the new Talend Project as shown below.
Once you click on the finish button, it will go through the libraries and VM setup in the background to initiate the a fresh workspace for you to start using Talend.
If you are new to the Talend Open Studio, go through its quick tour and understand its usage.
Using Talend Open Studio for Data Integration:
Now, as we have Talend application configured, up and running for making your life as data scientist easier. You can create and load your own custom jobs by using its Job design Tab as shown below.
Conclusion:
At the end of this article, now you should be familiar with the installation and configuration setup of Talend Open Studio for Data Integration. We have used the latest currently available version TOS 7.3.1 which is one of the most useful applications which is being used by almost every data scientist and data integration engineer. It makes it easy to access and manage the large amount of data and to store them in an organized way. I hope you find this article very helpful, thank you.