Weka Tutorial 01: ARFF 101 (Data Preprocessing)

Weka Machine Learning Tutorial on how to prepare an arff file
javad bolboli (8 days ago)
Hi. thank you for sharing these valuable information. would u tell me how can I prepare my data that is in xlsx format. I know i can change it to CSV format. But i dont know how to define which of my column is my class and which rows are my date....
Ahmed Hamdy (1 month ago)
can you arrange this tutorial in numeric sequential to make it's easier to watch
Nethra Raj (2 months ago)
hai sir, im a beginner this video is really helpful for me
Jesse Wang (11 months ago)
Thank you for these videos!
Cody Curnutte (11 months ago)
life saver. Thanks for the quick and easy walk through of how to prep data for weka.
Random-access Memory (1 year ago)
I really really need this tutorials. Thanks.
how to convert csv file to arff file? how can we use string data type in arff file?
sabrina hammadi (11 months ago)
please which version of weka can i find chaid algorithm
MouryaViv MV (1 year ago)
Maha Shawky (1 year ago)
i have the data in word file in paragraphs how can i get those attributes that appears, because i dont think that getting arff file will make me have the attributes i think there is a way to get them and i'm stuck at this step. Any suggestions?
Maha Shawky (1 year ago)
my question is related to the very early step to insert the files in weka generally speaking, i do not know why in all tutorials i see them uploading files containing numbers plus the text like this ( 0,"1467814119","Mon Apr 06 22:20:40 PDT 2009","NO_QUERY","cooliodoc","@angry_barista I baked you a cake but I ated it ") , i think they do a step to get this format before they start with weka but i do not know how they do it. because i guess this is a tweet but is this serial number 0,"1467814119, how do they get this labeling?
Rafal Jacyna (1 year ago)
I've never done this before, but maybe something here will help you: https://www.quora.com/How-can-I-do-sentiment-analysis-on-weka
Maha Shawky (1 year ago)
For sentiment analysis
Rafal Jacyna (1 year ago)
so how do you want to use Weka with your news article?
Ali Athar (1 year ago)
Hi sir, How to increase heap size in Weka tool so we can import any heavy file into it?
deshanunit (2 years ago)
Hi, Which of your tutorial talks about decision trees using weka, thanks a lot?
Francisco Rabanne (2 years ago)
hello as I can make a PRACTICE to check this knowledge followed ..... or try a new instance .. OUTLOOK = sunny TEMP = cool HUMIDITY = high WINDY = true CLASS = ? THANK YOU !!
Seif Juventino (2 years ago)
How to convert the Mnist Dataset to ARFF :( i need help please ( its in idx format )
Seif Juventino (2 years ago)
Beldi Moss (2 years ago)
+Seif Juventino vous voyez n'est ce pas hh
opafem (2 years ago)
how to convert CSV file to the ARFF file
Michel Shumakar (2 years ago)
Sir i have large number of features in a text file.How can i make arff file for that ,should i have to add classes..
Darshan Koundinya (3 years ago)
hello sir, can you please explain me how to get results for the training set using knowledge flow
Aishwarya Kannan (3 years ago)
How to create an arff dataset for sql injection ?? can anyone prove me with a sample ?
Andres Alvarez (3 years ago)
Hello thank you so much for your video. I would like to know how to convert a .pcap file of large network traffic data to .ARFF to use in Weka? I have heard of a tool called Netmate but I can't seem to make it work, it is so frustrating. Do you know of any other tool/method?
Olexandr (3 years ago)
where can i download this file
Rushdi Shams (3 years ago)
+Олександр Лихенко In weka/data/ directory there are some default files that you can play with.
Nivedha Padmanabhan (3 years ago)
how to convert an excel file to .arff file
Rushdi Shams (3 years ago)
+Nivedha Padmanabhan This is a very general question and there is no explicit answer. This is mostly due to the variation in performances of the algorithms for different classification tasks and feature set. 
Nivedha Padmanabhan (3 years ago)
ya went through got it thank you so much... Given 1000(say) samples of data i want to know how to choose the correct classification algorithm that will predict the result accurately
Rushdi Shams (3 years ago)
Hi. This might become handy: http://sourceforge.net/projects/exceltoarffconv/
fatihah aziz (4 years ago)
hi, for the  @data part do i need to put all the data that i need to run. What if i had 1000 of data? i need to put all of it?
Rushdi Shams (4 years ago)
+fatihah aziz Hi. It depends. If you want to do a cross validation then you need to put all your data in data section and then choose cross validation options on the GUI or in the programming parameters. If you want to train and test, then you should separate your training and testing and say you put 600 data in training.arff and 400 in test.arff. It depends on how you are going to experiment and use your data.
Suneetha Uppu (4 years ago)
Hi, Can I please know how do I convert .txt file to .arff file? Thank you Suneetha
Pasan Fernando (1 year ago)
write a code to do that
Alfonso Vergara (2 years ago)
How do I convert a file txt file with 10million rows?
Rushdi Shams (4 years ago)
+Suneetha Uppu You just need to create a txt file in this format and then change the extension to arff.
Ramanuja Rao (4 years ago)
+Rushdi Shams Hi, could you tell me how to handle dates as string in CSV format. String Format like 3/3/2014.
Rushdi Shams (4 years ago)
It is a date format and should be treated as date.
Toàn Nguyễn (4 years ago)
Hi. I has a problem. I am using data in archive.ics.uci.edu/ml/datasets/Artificial+Characters. But I can't open data with Weka and don't how to using learning data file ( in character.tar.Z) How can i convert to weka data? Please help me
Rushdi Shams (4 years ago)
You need to use some programming language to read the web page content, extract the features you desire and format the data into arff file.
Rushdi Shams (4 years ago)
Just format it according to the way shown in the tutorial using a notepad software and then rename the file from .txt to .arff. It should work if you do everything perfectly.
Selva Perumal (4 years ago)
How to convert Text file in to arff file plzz
Selva Perumal (4 years ago)
Hi could you plz help me in my project. I want to classify web pages. How to convert Web page content to ARFF file format .
areeg abu-zaid (5 years ago)
last time I have problem in arff format but now its ok ....... thanx a lot :)
Rushdi Shams (5 years ago)
In anyway you have to figure it out how you can represent this matrix data into arff... by using some programming language perhaps?
Rushdi Shams (5 years ago)
In which way I can help you?
Rushdi Shams (5 years ago)
Rushdi Shams (5 years ago)
areeg abu-zaid (5 years ago)
hi, can u help me ????????
Ajay Verma (5 years ago)
where you have write this code????? i mean at notepad or where???
dewi sri rahma (5 years ago)
please help me to using weka for polemical analysis of divorce in society using k-means clustering method .... I was confused to start... thanks for the help.
Brahim laarif (5 years ago)
Very felpful videos! But i have a question. How can i represent an array attribute ? I have for exemple a matrix like this [{70, 25, 5}, {40, 25, 35}, {80, 15, 5}] and i need to be able to added with the others features. If any on have a solution for this, please let me know.
KuthaJuice (5 years ago)
@ = at the rate For those wondering what he meant.
zaft1g337 (5 years ago)
Hi. I have a regression problem i want to solve, but I dont know how to tell WEKA what my target output is, just like the way one does with Cubist in the .names file? Thanks
Rushdi Shams (5 years ago)
Yeah, there are different ways of doing it.
Rushdi Shams (5 years ago)
Not yet.
Rushdi Shams (5 years ago)
See the above ansower
Rushdi Shams (5 years ago)
Hmm. I think it is a good idea to use a piece of code to join the results into one file, and then create an ARFF file from it.
Taniya Tom (5 years ago)
Hi, I have a text file in which data is stored in 3 columns. 1st column represents the datapoint ID, 2nd column represents the attribute name and 3rd column represents the attribute value. Something like this; 1 23 5 1 25 7 2 20 3 (1 and 2 are datapoint ids, 23, 25, 27 are attribute names; 5,7,3 are attribute values) Have 12000 datapoints,1 lakh attributes & 20 class labels.Class labels are in another text file. How to preprocess this. Kindly give an idea.
lamp2k06 (5 years ago)
Hi! i do have 3 .arff files. The structure is like this: File 1: contains a list of items including several attributes and of course the attribute i want to predict. File 2: contains a list of DIFFERENT items and their attributes File 3: is linking the attributes of the first two files (many-to-many). each row just contains the ID of the attributes that are linked. How do i use the information from file 2 for the prediction of the attribute in file 1? Or at least: how do i merge the data? thx
Laura Ranta (5 years ago)
Hi, is there a tutorial with one class classification using libsvm or WEKA?
Adhanom Efrem (5 years ago)
Thanks man. But, I used notepad to type the program and save it as .arff file type. It worked thou.
Teja sree (5 years ago)
good video...thank you..
Mariam Tamimi (6 years ago)
VERY HELPFUL VIDEOS, Thank you so much.
Abdul -ur- Rehman Ali (6 years ago)
if it still making problem. search it on google thnx
Abdul -ur- Rehman Ali (6 years ago)
h t t p write with no spaces... Thanks for your replay
Abdul -ur- Rehman Ali (6 years ago)
h t t p://archive.ics.uci.edu/ml/datasets/p53+Mutants
Rushdi Shams (6 years ago)
Okay, it is a capital P not small P. I got it. To make it readable for Weka, you need to find out the feature names first and their data type. Once you get that, you are then supposed to order them in @attribute section. And according to that order you have to place the feature values in @data section.
Rushdi Shams (6 years ago)
The link is showing 404 not found!
Rushdi Shams (6 years ago)
No problem, it is a great tool to discover....
Rushdi Shams (6 years ago)
No problem, it is a great tool to discover....
Abdul -ur- Rehman Ali (6 years ago)
hi. i need help from you. i have to read the data set from following link archive.ics.uci.edu/ml/datasets/p53+Mutants and make it readable for weka.. how to start bcz i'm new to weka .. waiting for your replay thanks
Yan Liu (6 years ago)
Thank you so so so much! I almost decided to quit learning WEKA before I found your videos. My confidence is back :))))
stelarophie (6 years ago)
@rushdishams thanks a lot
Rushdi Shams (6 years ago)
@stelarophie Thank you. In order to analyze your data with Weka, you have to create the arff file (it is an ASCII file). You can first, however, put your data in Excel and then you can convert them to CSV (comma separated value) file. Then you need to put the @relation section and @attribute section and @data right before these CSV you created from excel file.

