I have a data frame (derived from a CSV file) with about 100M entries that looks like this:
df1:
var1 var2
0 1 2
1 2 1
2 1 {3,4,5}
3 5 6
4 {4,5,6,7} 8
I need to convert this into a new data frame in which (for every row) each element in the braces needs to be associated with the element in the other column, i.e.,
df2:
var1 var2
0 1 2
1 2 1
2 1 3
3 1 4
4 1 5
5 5 6
6 4 8
7 5 8
8 6 8
9 7 8
Each element is a string, even the brace entry itself. Note that the brace element can be in either column, which is different from this: [https://stackoverflow.com/questions/12680754/split-pandas-dataframe-string-entry-to-separate-rows?rq=1][1]. Does anyone know how can I achieve this efficiently for a dataset of about 100M entries? Thanks in advance.
Python example:
import pandas as pd
df1 = pd.DataFrame([{'var1': '1', 'var2': '2'},
{'var1': '2', 'var2': '1'},
{'var1': '1', 'var2': '{3,4,5}'},
{'var1': '5', 'var2': '6'},
{'var1': '{4,5,6,7}', 'var2': '8'}])
df2 = pd.DataFrame([{'var1': '1', 'var2': '2'},
{'var1': '2', 'var2': '1'},
{'var1': '1', 'var2': '3'},
{'var1': '1', 'var2': '4'},
{'var1': '1', 'var2': '5'},
{'var1': '5', 'var2': '6'},
{'var1': '4', 'var2': '8'},
{'var1': '5', 'var2': '8'},
{'var1': '6', 'var2': '8'},
{'var1': '7', 'var2': '8'}])
0 Answer(s)