CSV - remove rows in which any column is empty
i'm playing titanic data set kaggle. i'd remove rows train.csv have empty column (i know isn't best way deal missing data, question interesting me regardless).
i'd unix-type way (using awk, sed, or grep), because i'm trying better @ tools, i'm not sure start.
example of data:
passengerid,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked 1,0,3,"braund, mr. owen harris",male,22,1,0,a/5 21171,7.25,,s 2,1,1,"cumings, mrs. john bradley (florence briggs thayer)",female,38,1,0,pc 17599,71.2833,c85,c 3,1,3,"heikkinen, miss. laina",female,26,0,0,ston/o2. 3101282,7.925,,s
in second row, cabin empty, want remove file.
note fourth column contains commas, column contained in double quotes.
i'd know how specific columns, can ask separate question if answer question doesn't me answer one.
i stick language has csv parser because commas inside double quotes can problematic. , easier extend compare specific columns. here python example. extracts number of fields header , compare number each line decide if print or not:
import sys import csv open(sys.argv[1], 'r', newline='') csvfile: csvreader = csv.reader(csvfile) csvwriter = csv.writer(sys.stdout) row = next(csvreader) fields = len(row) csvwriter.writerow(row) row in csvreader: l = len(list(filter(str.strip, row))) if l < fields: continue csvwriter.writerow(row)
assuming code inside file name script.py
, run like:
python script.py infile
that yields:
passengerid,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked 2,1,1,"cumings, mrs. john bradley (florence briggs thayer)",female,38,1,0,pc 17599,71.2833,c85,c
