Each year, an estimated 15 million people suffer stroke worldwide. About 1/3 of stroke survivors are left with some form of long term language deficit, known as aphasia. The study of post-stroke aphasia bears relevance both scientifically – to understand the functional organization of language – and clinically – to predict the potential for recovery. Despite the large availability of patients, neuroimaging studies are often limited by the lack of proper tools of analyses. For example, the current standard for lesion identification is manual tracing, while the availability of multiple imaging modalities are not matched by equally powerful analyzing techniques. In this talk, I will present an automated method for lesion identification with neighborhood data analysis (LINDA, http://dorianps.github.io/LINDA/). LINDA uses a single T1-weighted MRI volume to build additional image features, which serve as inputs in machine learning algorithm aimed at understanding the relationship between the voxel signal and it’s classification (healthy/lesioned). LINDA produces state of the art accuracy (dice=0.7, average displacement 2.5mm) and has been validated with data from another institution. In the second part of the talk, I will describe a method for predicting aphasia scores from multiple imaging modalities (tissue damage, resting BOLD, virtual tractography lesions). The method uses random forests to build mini-predictions of aphasia scores from each modality separately. The mini-predictions are then combined into a final predictive model. Preliminary results show that prediction stacking is more accurate than the simple combination of variables in a single model, and exceeds in accuracy any individual mini-prediction. The findings suggest that, with the advent of big data science, prediction stacking might become a useful tool.