Dataset : https://www.kaggle.com/stackoverflow/stacksample
Size: 3gb (zipped)
Loading : loading the dataset using Pandas would be a big mistake if you don't have about 5+ Gb of free ram as only the Questions.csv and Answers.csv takes about 5 Gb to load the data into DataFrame so instead use import csv with encoding as latin1.
import csv
with open('../input/stacksample/Questions.csv',encoding='latin1') as csv_file:
read_csv=csv.reader(csv_file,delimiter=',')
cnt=0
next(read_csv,None)
for row in read_csv:
cnt+=1
print(cnt,len(row)) # no of rows 1264216 , no of columns 7
}