How To Delete A Blog From Tumblr

TL;DR

A lot of effort to find a marginally more efficient solution. Difficult to justify the added complexity while sacrificing the simplicity of df.drop(dlst, 1, errors='ignore')

          df.reindex_axis(np.setdiff1d(df.columns.values, dlst), 1)

Preamble
Deleting a column is semantically the same as selecting the other columns. I'll show a few additional methods to consider.

I'll also focus on the general solution of deleting multiple columns at once and allowing for the attempt to delete columns not present.

Using these solutions are general and will work for the simple case as well.

Setup
Consider the pd.DataFrame df and list to delete dlst

          df = pd.DataFrame(dict(zip('ABCDEFGHIJ', range(1, 11))), range(3)) dlst = list('HIJKLM')

          df     A  B  C  D  E  F  G  H  I   J 0  1  2  3  4  5  6  7  8  9  10 1  1  2  3  4  5  6  7  8  9  10 2  1  2  3  4  5  6  7  8  9  10

          dlst  ['H', 'I', 'J', 'K', 'L', 'M']

The result should look like:

          df.drop(dlst, 1, errors='ignore')     A  B  C  D  E  F  G 0  1  2  3  4  5  6  7 1  1  2  3  4  5  6  7 2  1  2  3  4  5  6  7

Since I'm equating deleting a column to selecting the other columns, I'll break it into two types:

Label selection
Boolean selection

Label Selection

We start by manufacturing the list/array of labels that represent the columns we want to keep and without the columns we want to delete.

df.columns.difference(dlst)

              Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

np.setdiff1d(df.columns.values, dlst)

              array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype=object)

df.columns.drop(dlst, errors='ignore')

              Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

list(set(df.columns.values.tolist()).difference(dlst))

              # does not preserve order ['E', 'D', 'B', 'F', 'G', 'A', 'C']

[x for x in df.columns.values.tolist() if x not in dlst]

              ['A', 'B', 'C', 'D', 'E', 'F', 'G']

Columns from Labels
For the sake of comparing the selection process, assume:

                      cols = [x for x in df.columns.values.tolist() if x not in dlst]

Then we can evaluate

df.loc[:, cols]
df[cols]
df.reindex(columns=cols)
df.reindex_axis(cols, 1)

Which all evaluate to:

                      A  B  C  D  E  F  G 0  1  2  3  4  5  6  7 1  1  2  3  4  5  6  7 2  1  2  3  4  5  6  7

Boolean Slice

We can construct an array/list of booleans for slicing

~df.columns.isin(dlst)
~np.in1d(df.columns.values, dlst)
[x not in dlst for x in df.columns.values.tolist()]
(df.columns.values[:, None] != dlst).all(1)

Columns from Boolean
For the sake of comparison

          bools = [x not in dlst for x in df.columns.values.tolist()]

df.loc[: bools]

Which all evaluate to:

                      A  B  C  D  E  F  G 0  1  2  3  4  5  6  7 1  1  2  3  4  5  6  7 2  1  2  3  4  5  6  7

Robust Timing

Functions

          setdiff1d = lambda df, dlst: np.setdiff1d(df.columns.values, dlst) difference = lambda df, dlst: df.columns.difference(dlst) columndrop = lambda df, dlst: df.columns.drop(dlst, errors='ignore') setdifflst = lambda df, dlst: list(set(df.columns.values.tolist()).difference(dlst)) comprehension = lambda df, dlst: [x for x in df.columns.values.tolist() if x not in dlst]  loc = lambda df, cols: df.loc[:, cols] slc = lambda df, cols: df[cols] ridx = lambda df, cols: df.reindex(columns=cols) ridxa = lambda df, cols: df.reindex_axis(cols, 1)  isin = lambda df, dlst: ~df.columns.isin(dlst) in1d = lambda df, dlst: ~np.in1d(df.columns.values, dlst) comp = lambda df, dlst: [x not in dlst for x in df.columns.values.tolist()] brod = lambda df, dlst: (df.columns.values[:, None] != dlst).all(1)

Testing

          res1 = pd.DataFrame(     index=pd.MultiIndex.from_product([         'loc slc ridx ridxa'.split(),         'setdiff1d difference columndrop setdifflst comprehension'.split(),     ], names=['Select', 'Label']),     columns=[10, 30, 100, 300, 1000],     dtype=float )  res2 = pd.DataFrame(     index=pd.MultiIndex.from_product([         'loc'.split(),         'isin in1d comp brod'.split(),     ], names=['Select', 'Label']),     columns=[10, 30, 100, 300, 1000],     dtype=float )  res = res1.append(res2).sort_index()  dres = pd.Series(index=res.columns, name='drop')  for j in res.columns:     dlst = list(range(j))     cols = list(range(j // 2, j + j // 2))     d = pd.DataFrame(1, range(10), cols)     dres.at[j] = timeit('d.drop(dlst, 1, errors="ignore")', 'from __main__ import d, dlst', number=100)     for s, l in res.index:         stmt = '{}(d, {}(d, dlst))'.format(s, l)         setp = 'from __main__ import d, dlst, {}, {}'.format(s, l)         res.at[(s, l), j] = timeit(stmt, setp, number=100)  rs = res / dres

          rs                            10        30        100       300        1000 Select Label                                                            loc    brod           0.747373  0.861979  0.891144  1.284235   3.872157        columndrop     1.193983  1.292843  1.396841  1.484429   1.335733        comp           0.802036  0.732326  1.149397  3.473283  25.565922        comprehension  1.463503  1.568395  1.866441  4.421639  26.552276        difference     1.413010  1.460863  1.587594  1.568571   1.569735        in1d           0.818502  0.844374  0.994093  1.042360   1.076255        isin           1.008874  0.879706  1.021712  1.001119   0.964327        setdiff1d      1.352828  1.274061  1.483380  1.459986   1.466575        setdifflst     1.233332  1.444521  1.714199  1.797241   1.876425 ridx   columndrop     0.903013  0.832814  0.949234  0.976366   0.982888        comprehension  0.777445  0.827151  1.108028  3.473164  25.528879        difference     1.086859  1.081396  1.293132  1.173044   1.237613        setdiff1d      0.946009  0.873169  0.900185  0.908194   1.036124        setdifflst     0.732964  0.823218  0.819748  0.990315   1.050910 ridxa  columndrop     0.835254  0.774701  0.907105  0.908006   0.932754        comprehension  0.697749  0.762556  1.215225  3.510226  25.041832        difference     1.055099  1.010208  1.122005  1.119575   1.383065        setdiff1d      0.760716  0.725386  0.849949  0.879425   0.946460        setdifflst     0.710008  0.668108  0.778060  0.871766   0.939537 slc    columndrop     1.268191  1.521264  2.646687  1.919423   1.981091        comprehension  0.856893  0.870365  1.290730  3.564219  26.208937        difference     1.470095  1.747211  2.886581  2.254690   2.050536        setdiff1d      1.098427  1.133476  1.466029  2.045965   3.123452        setdifflst     0.833700  0.846652  1.013061  1.110352   1.287831

          fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharey=True) for i, (n, g) in enumerate([(n, g.xs(n)) for n, g in rs.groupby('Select')]):     ax = axes[i // 2, i % 2]     g.plot.bar(ax=ax, title=n)     ax.legend_.remove() fig.tight_layout()

This is relative to the time it takes to run df.drop(dlst, 1, errors='ignore'). It seems like after all that effort, we only improve performance modestly.

enter image description here

If fact the best solutions use reindex or reindex_axis on the hack list(set(df.columns.values.tolist()).difference(dlst)). A close second and still very marginally better than drop is np.setdiff1d.

          rs.idxmin().pipe(     lambda x: pd.DataFrame(         dict(idx=x.values, val=rs.lookup(x.values, x.index)),         x.index     ) )                        idx       val 10     (ridx, setdifflst)  0.653431 30    (ridxa, setdifflst)  0.746143 100   (ridxa, setdifflst)  0.816207 300    (ridx, setdifflst)  0.780157 1000  (ridxa, setdifflst)  0.861622