The conventional answer is to do it after splitting as there can be information leakage, if done before, from the TestSet. The contradicting answer is that, if only the Training Set chosen from the whole dataset is used for Feature Selection, then the feature selection or feature importance score orders is likely to be dynamically changed with change in random_state of the Train_Test_Split.