unix - CSV - remove rows in which any column is empty -
i'm playing titanic data set kaggle. i'd remove rows train.csv have empty column (i know isn't best way deal missing data, question interesting me regardless).
i'd unix-type way (using awk, sed, or grep), because i'm trying better @ tools, i'm not sure start.
example of data:
passengerid,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked 1,0,3,"braund, mr. owen harris",male,22,1,0,a/5 21171,7.25,,s 2,1,1,"cumings, mrs. john bradley (florence briggs thayer)",female,38,1,0,pc 17599,71.2833,c85,c 3,1,3,"heikkinen, miss. laina",female,26,0,0,ston/o2. 3101282,7.925,,s
in second row, cabin empty, want remove file.
note fourth column contains commas, column contained in double quotes.
aside:
i'd know how specific columns, can ask separate question if answer question doesn't me answer one.
i stick language has csv parser because commas inside double quotes can problematic. , easier extend compare specific columns. here python example. extracts number of fields header , compare number each line decide if print or not:
import sys import csv open(sys.argv[1], 'r', newline='') csvfile: csvreader = csv.reader(csvfile) csvwriter = csv.writer(sys.stdout) row = next(csvreader) fields = len(row) csvwriter.writerow(row) row in csvreader: l = len(list(filter(str.strip, row))) if l < fields: continue csvwriter.writerow(row)
assuming code inside file name script.py
, run like:
python script.py infile
that yields:
passengerid,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked 2,1,1,"cumings, mrs. john bradley (florence briggs thayer)",female,38,1,0,pc 17599,71.2833,c85,c
Comments
Post a Comment