About This Project
MarcXimiL is a free, flexible, fully standards-compliant and efficient bibliographic similarity analysis framework. MarcXimiL comes out of the box with generic predefined similarity strategies. However, strategies may be customized in a flexible way:
- the method of comparisons between or within collections
- including ways to skip probable useless comparisons
- for each field, selection of a parsing function (fields may be indexed in several ways [words, digrams, soundex, initials, shingles] and can be concatenated, regrouped, or conditionally extracted)
- for each field, selection of a comparison function amongst a wide selection: vectorial (Dice, Jaccard, Salton's cosine), probabilistic (OKAPI BM25), Levenshtein based, Authors, Date, and others.
- the way fields similarities are combined to obtain a records similarities (various weighted means and ad-hoc functions)
- the output format (XML, spreadsheet)
- thresholds at different levels