LIBSVM-Promed experiments

From Humanitarian FOSS Summer Institute 2008

Jump to: navigation, search

Experiments

  • Using articles from the Pro-Med database, we obtained vectors that represented the weight of each word for each article. To do this we:
    • Parsed the articles into a readable format.
    • Created a dictionary with all the words in the list of articles, along with the frequency of each word for each document.
    • Obtained the inverse frequency of each word : Log(Number of Documents / Number of Documents Containing this Word)
    • Obtained the weight of each word for each document by multiplying the frequency with the inverse frequency
    • For each document, created vectors that can be read by libSVM.
    • Separated the vectors into a training group and a testing group, and fed them to libSVM
    • Calculated sensitivity and specificity based on the results returned by libSVM


  • To obtain accurate results we took the following precautions:
    • Only take words that are more than three characters long.


Results

  • Test 1
Sensitivity and Specificity results for Test 1


Training Vectors Sensitivity Specificity
2 1 0.7
5 0 1
10 0 1
15 0.62 0.98
20 0.92 0.97
30 0.67 1
40 1 1
50 1 1
60 1 1



  • Test 2
Sensitivity and Specificity results for Test 2


Training Vectors Sensitivity Specificity
2 1 0.69
5 0.71 1
10 1 0.98
15 1 0.98
20 0.88 0.98
30 0.86 0.97
40 0.75 0.96
50 0.75 1
60 1 1




  • Test 3
Sensitivity and Specificity results for Test 3


Training Vectors Sensitivity Specificity
2 1 0.55
5 0 1
10 0 1
15 0.85 0.98
20 0.9 0.97
30 0.83 0.97
40 1 0.96
50 1 1
60 1 1





  • Final Average Result
Sensitivity and Specificity average results


Training Vectors Sensitivity Specificity
2 1 0.64
5 0.24 1
10 0.33 0.99
15 0.82 0.98
20 0.9 0.97
30 0.79 0.98
40 0.92 0.97
50 0.92 1
60 1 1
Personal tools