RMonto - Semantic Data Mining

RMonto is an ontological extension to RapidMiner, that provides possibility of machine learning with formal ontologies, and Semantic Web languages. RMonto is an easily extendable framework, currently providing support for unsupervised clustering with kernel methods and (frequent) pattern mining in knowledge bases. One important feature of RMonto is that it enables working directly on structured, relational data. Additionally, its custom algorithm implementations may be combined with the power of RapidMiner through transformation/extraction from the ontological data to attribute-value data.

For more information go to http://semantic.cs.put.poznan.pl/RMonto/.

RMonto can be downloaded from the Rapid-I Marketplace from within RapidMiner. Details are available here. Once installed, download one or more PutAPI plugins and put it to the lib/plugins directory in your rapidMiner installation directory.

Pattern mining and regression/ranking problems

Available operators:

  • Fr-ONT-Qu Pattern mining operator, implementing algorithm similar to Fr-ONT. Input: knowledge base (from “Build knowledge base”) and learning examples (with or without label, depending on selected measure). Parameters: list of classes, list of abstract properties (optionaly with concrete fillers as URIs) and list of concrete properties (with fillers), moreover number of patterns to be kept at each level (k), search depth (level), quality measure (how to validate patterns, currently only support (unsupervised!) is available)
  • Propositionalisation Converts list of patterns (typically from Fr-ONT–Qu) and examples (from e.g. SPARQL selector) into attribute-value format, where patterns are attributes and each value may be 0 (given example doesn't satisfy pattern) or 1 (given example satisfies pattern). Input: patterns, examples and knowledge base. Output: example set.
  • Add label from KB Extends URI vector (from i.e. SPARQL selector) with specified concrete role's value as a label. Input: unlabeled example set, knowledge base. Output: labeled and unlabeled example sets. Parameters: URI of the role.

Workflow example (trains going west) Screenshot of this workflow is below.

Using the RMOnto Extension for pattern mining

Billionares

Using the RMOnto Extension for Clustering
  1. Run RapidMiner with Ontological Extension and create new project.
  2. Add operator Load file and set it's parameters:
    • File to the dbpedia ontology, which can be downloaded from http://downloads.dbpedia.org/3.6/dbpedia_3.6.owl.bz2. Remember to decompress it after downloading, RMonto can not load bzipped files!
    • Language (it is expert parameter) to RDFXML. It is default value, so you probably does not have to change anything.
  3. Add operator Load from SPARQL endpoint and set it's parameters:
    • Queryto
      PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
      PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
      CONSTRUCT {
      ?x rdf:type ?z .
      ?z rdfs:subClassOf <http://dbpedia.org/class/yago/Billionaire110529684> .
      ?x rdf:type dbpedia-owl:Person .
      ?x dbpedia-owl:residence ?loc .
      ?loc geo:lat ?lat .
      ?loc geo:long ?long
      }
      WHERE {
      ?x rdf:type ?z .
      ?z rdfs:subClassOf <http://dbpedia.org/class/yago/Billionaire110529684> .
      ?x rdf:type dbpedia-owl:Person .
      ?x dbpedia-owl:residence ?loc .
      ?loc geo:lat ?lat .
      ?loc geo:long ?long
      }
  4. Add Build knowledge base and connect outputs of Load file and Load form SPARQL endpoint to inputs in Build knowledge base. If you want, you can change Reasoner ID parameter (it's also exper parameter).
  5. Add SPARQL selector and connect Build knowledge base output to it's input. Set Variable parameter to x and Query to the following SPARQL snippet. Of course, Variable parameter means which variable from query will be used to extract informations about individuals and we add limit 100to have results faster.
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    select ?x
    where
    { ?x rdf:type ?z .
    ?z rdfs:subClassOf <http://dbpedia.org/class/yago/Billionaire110529684> .
    ?x rdf:type dbpedia-owl:Person .
    ?x dbpedia-owl:residence ?loc .
    ?loc geo:lat ?lat .
    ?loc geo:long ?long
    }
    limit 100
  6. Insert Common classes operator, connect reasoner input to one of Build knowlede base outputs and SPARQL selector output to the examples input.
  7. Insert Aglomerative hierarchical clustering operator, connect:
    • Examples to the output of SPARQL selector, when RapidMiner shows you Cannot connect dialog select Insert IO multipliers as needed. on drop-down list and press OK.
    • Dissimilarity matrix to the output of Common classes.
    • Reasoner to the output of Build knowledge base.
    • Both outputs to the RapidMiner workflow outputs.
  8. Run the workflow and analyse the results.

Importing workflows to RDF store

With data prepared in the following way you can do metamining experiments as easily as with Trains toy example.

  1. Make sure that you have properly installed RMonto and PutAPI_Sesame plugins.
  2. Add operators Import WF and Build knowledge base
  3. In the Import WF properties click Import wizard button, you will see the window similar to the one presented on the screenshoot below
    tl_files/elico/software/rmx_rmonto/rmonto_1.png
  4. In the repository tree select the repository you want to import to the store. Double-click on it or select it and click button Do it.
  5. All available workflows and files with performance vectors are discovered automaticaly. Other files, such as datasets or saved models, are ignored. Matching workflows and their performance vectors is based on string-similarity measures, so make sure their names are similar.
  6. Check if everything have been matched correctly, if so click Ok button.
  7. Connect output of Import WF to the input of Build knowledge base.
  8. In the Build knowlegde base parameters, set
    • Reasoner ID to the Sesame Reasoner value
    • Set SesameReasoner:ruleset to the one supporting OWL2 Property Chains, i.e. owl2-rl-conf.
    • If you are going to use local OWLim repository, set SesameReasoner:type to local and enter path to the directory you want to use as repository to the address field.
    • Otherwise, set SesameReasoner:type to the remote, SesameReasoner:id to the repository id and SesameReasoner:address to the repository URL address. Both parameters can be obtained from Sesame Workbench or from your repository administrator.
  9. With that we are almost done. Make sure you did not missed any of following steps:
    1. setting up parameters of Import WF;
    2. setting up parameters of Build knowledge base;
    3. connecting both operators.
  10. Now you can click Run button in RapidMiner and wait while your workflows are imported. The time it takes may vary, because it depends on choosen ruleset, repository size, store's server performance and network performance.

Some example RMonto workflows may be found at RMonto pack at myExperiment.