top of page

Pivoting With Knime



Sometimes the data we have is not in the format we want and we may need to do an ETL study. For example, as in the data below. In this data, monthly Gross Rent and Turnover values are moving sideways contrary to what they should be. However, in order to use the data efficiently, there must be a separate month column and the Gross Rent and Turnover values must flow down the page.


We use Knime to organize our sample data. Knime is a free, open source data analytics, reporting and integration platform. You can easily download and install it from the internet.

When we open Knime, we will first see a blank screen. You can open a new worksheet by clicking the Blank Screen icon under File at the top left of this screen. We choose New KNIME Workflow.



On the screen that opens, we name our project and specify where we want to position it.


On the left side there is a window consisting of 3 parts. The part I indicated with the blue arrow is where your saved works are located. From here you can easily open your current work by selecting it. The part I indicated with the purple arrow is the list of the most used nodes, the part I indicated with the orange arrow is where all the nodes are located. We will use the part indicated by the orange arrow the most.



Since our data is in Excel, we find the Excel Reader node in the Node Repository and take it to our workspace by double-clicking or drag-and-drop. If you know the name of the node you will use, you can quickly find it by typing its name in the search field, if you do not know, you can search for your node under the relevant category. In addition; When you click on any node, a description box about the node you selected appears on the right side of the page.

We enter our node by right-clicking on our node and saying Configure.



We add our file by saying browse to the "Select file to read" field at the top of the screen that opens. After adding, we can see the contents of our file under Preview at the bottom. But as you can see, it did not detect our column names and assigned names as Col0, Col1 instead. To fix this, we tick the Table contains column names in row number box just above. In our sample report, I leave it as 1 because the column names are on the 1st line.

By the way, if your excel consists of more than one sheet, you can select your relevant sheet from the Select the sheet to read field. You can also specify how much of your data you want to read from the Select the columns and rows to read field..



When we complete our process by saying "Ok", the red circle in our node turns yellow. This shows us that we did everything right and the node is ready to run. Right click, Execute or select our node and click the green execute button at the top to run our node. The yellow circle turns green when the process is completed without any problems. Now we can see the data we have obtained by right-clicking, clicking Output.




Additional note: You may receive a warning when you click OK after selecting the Table contains column names in row number box. When this happens, all you have to do is refresh the table below by saying Reload.

We continue our process by correcting the sideways month data. We are searching for pivot in Node Repository. We choose Unpivoting from the resulting nodes. When we double click it, it will connect directly to our previous node, Excel Reader.



Every time we add a new node, our operations will proceed as Configure, Execute, then Output.

We make our arrangement by saying Configure again. First, we'll edit the Gross Rent column. In the Value columns section at the top, we select the Gross Rent columns to be unpivot. While doing this, instead of selecting one by one, I searched for the word "gross" and selected all columns containing the word I was looking for by saying Select all search hits. I add them to the Include section by saying Add. In the Retained section below, we select our remaining columns.



When we run it, the data we get is as follows. As you can see the Gross Lease is no longer positioned sideways but downwards. Now we will repeat the same process for Turnover. We repeat the above operations with a new Unpivoting node.




We edited Gross Rent and Turnover, but now we need to combine it. To do this, we need a common date column. We will separate our date value with String Manipulation. I use the substr function because the first 6 characters specify the date. Here, we say get 6 characters starting from the 0th character in the relevant column.


In the Append Column section, we set the name of our new column as Date..



We now have a separate date column. Before moving on to the merge process, we name our columns. The Column Rename node will help us with this.

Column Values indicates our Gross Rent values. We're changing its name to Gross Rent. Since our values are numeric, I leave them as IntValue, but you can change the data type in this section according to your data and choose other numeric values such as String or double, long.






After this point, we can combine our tables. We select the Joiner node. Joiner node works with sql logic. We are connecting the tables that we will join to the two existing entries.

You can choose which join type to use in the Join mode section. In this example I am using Inner Join. Now you need to specify how you will do the join. For this, we select the columns that we will match by saying Add Row. We aggregate rows with a common date and store name.



The result of the join is as follows.




We have apparently accomplished what we wanted to do, but we need to keep editing. Now we will convert Date value from string to date. But first we need to make our value conform to date format. Again we add String Manipulation.

Currently we have month and year value but no day value. For this, we added "01" to the beginning of each date value with the join function.



Now it's time to convert our string to date. We make the following adjustments to the String to Date&Time node. The important point is to fill in the Date format part correctly.



After getting a proper date format, I want to remove the m2 suffix in the Square meter column. Because when you import your data to a reporting tool such as Tableau, it will want to see numeric values as integers. But in this way, our square meter value is in string form.

I'm getting String Manipulation on the screen again. I use the removeChars function to remove the "m2" suffix from my related column.


Our result is as follows.



As you can see, there are too many unnecessary columns. We add the Column Filter node to clear them. We collect the columns we want to remain on the right, and the ones we do not want on the left.



Finally we got the painting we wanted.


Now let's write our table back to Excel. If we wish, we can also write our table to a database.

We add the Excel Writer node. At the top, in the Output location section, we specify where and with what name we will save our file. In the Sheet name section, we name our sheet. The important point here is not to forget to tick the add column headers box. If we are going to overwrite an existing file, we must tick the Overwrite existing file box.



Our Workflow has been completed as follows.


 
 
 
kara_beyaz_v2-arkaplansız.png

As Kara Consultancy, we provide Business Intelligence, Big Data, Data Warehouse solutions in fast, easy and scalable way with our products and services together with more than 25 years experience.

 

Kara Danışmanlık olarak müşterilerimizin iş zekası, veri tabanı teknolojileri, büyük veri ve veri ambarı ihtiyaçlarına yeni nesil, hızlı, kolay ve ölçeklenebilir çözümler sunuyoruz.

Teknolojiler

Exasol

MongoDB

Tableau

İletişim

Cevizli Mah. Zuhal Cad.  Ritim İstanbul No.44/B Blok Kat:16 D.163 34846 Maltepe/İstanbul 

Telefon: +90 (542) 717 35 57

Mail: survey@karadanismanlik.com.tr

Pazartesi - Cuma: 09.00 - 18.00 Cumartesi - Pazar: Tatil

Monday - Friday: 09.00 - 18.00 Saturday - Sunday: Closed

  • Youtube
  • Twitter
  • Instagram
  • Linkedin
bottom of page