Assessing the Performance of Clinical Natural Language Processing Systems