INSTALL SOLR ON UBUNTU 14.04 OR 16.06 WITHOUT DATASTAX DSE

It’s quite tricky to install solr independently along with Cassandra Integration that doesn’t need any Datastax.

Let’s assume you installed cassandra standalone and you just want to integrate your cassandra to solr without datastax tools like dse

INSTALL SOLR

let’s install solr under /opt/ but keep in mind that the data folder will be mapped later to /var/solr/data/

$ cd /opt/
$ sudo wget http://apache.mirror1.spango.com/lucene/solr/6.6.0/solr-6.6.0.tgz
$ sudo tar xzf solr-6.6.0.tgz solr-6.6.0/bin/install_solr_service.sh --strip-components=2
$ sudo bash ./install_solr_service.sh solr-6.6.0.tgz
$ sudo service solr status
$ sudo service solr start

this is the most important part, we create a collection named “mycollection1” that will have its own config files

$ sudo su - solr -c "/opt/solr/bin/solr create -c mycollection1 -n data_driven_schema_configs"

Go to the console
http://yourip.com:8983/solr/

You should see the screen!

we created actually the data_driven_schema_configs folder under /var/solr/data/mycollection1/conf which contains the conf folder that holds other configuration files

let’s see what config files we have here

$ ls /var/solr/data/mycollection1/conf

currency.xml dataimport.properties elevate.xml lang managed-schema params.json protwords.txt solrconfig.xml stopwords.txt synonyms.txt

We’ll deal only with managed-schema and solrconfig.xml files to integrate cassandra to solr. We’ll also create later a new file called db-data-config.xml

INTEGRATE CASSANDRA TO SOLR

our cassandra userplaces table is created like this but we only need few columns to index

CREATE TABLE userplaces (
 placeid text,
 placename text,
 formattedaddress text,
 profilestatus text,
 job text,
 emp text,
 worktype text,
 ratevalue double,
 ratetype text,
 resumehidden Boolean,
 userid double,
 fullname text,
 userurl text,
 latitude double,
 longitude double,
 postdate timestamp,
 city text,
 state text,
 country text,
 PRIMARY KEY(placeid, postdate)
 ) WITH CLUSTERING ORDER BY (postdate DESC) and ID= '3a9c314e-b91f-1be5-7f77-de0ac0486b18';

First we need to download the latest versions of the following libraries

cassandra-all-3.11.0.jar
libthrift-0.10.0.jar
cassandra-thrift-3.11.0.jar
mysql-connector-java-6.0.6.jar
cassandra-jdbc-1.2.5.jar
cassandra-jdbc-driver-0.6.4-shaded.jar
solr-dataimporthandler-extras-6.6.0.jar
solr-dataimporthandler-6.6.0.jar

go to the solr dist folder and download the jar files

$ cd /opt/solr/dist/
$ sudo wget http://central.maven.org/maven2/org/apache/cassandra/cassandra-all/3.11.0/cassandra-all-3.11.0.jar
$ sudo wget https://repo1.maven.org/maven2/org/apache/thrift/libthrift/0.10.0/libthrift-0.10.0.jar
$ sudo wget http://central.maven.org/maven2/org/apache/cassandra/cassandra-thrift/3.11.0/cassandra-thrift-3.11.0.jar
$ sudo wget http://central.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar
$ sudo wget http://central.maven.org/maven2/org/apache-extras/cassandra-jdbc/cassandra-jdbc/1.2.5/cassandra-jdbc-1.2.5.jar
$ sudo https://github.com/zhicwu/cassandra-jdbc-driver/releases/download/0.6.4/cassandra-jdbc-driver-0.6.4-shaded.jar

However the cassandra-jdbc-1.2.5 library might not work and give errors
That’s why we made sure to download another library cassandra-jdbc-driver-0.6.4-shaded.jar from https://github.com/zhicwu/cassandra-jdbc-driver

We’ll use later com.github.cassandra.jdbc.CassandraDriver as the jdbc driver main class path

we already have the solr-dataimporthandler-extras-6.6.0.jar and solr-dataimporthandler-6.6.0.jar files under dist folder.
In older solr versions, you had to download them as well.

So we are fine now with the libraries but we need to add them properly to the config files.

Remember that we created the config files before for the mycollection1

now first become root and then go to the config folder

$ sudo su
$ cd /var/solr/data/mycollection1/conf

solrconfig.xml
let’s point the libraries in solrconfig.xml file
$ sudo vi solrconfig.xml

<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-extras-\d.*\.jar"/>  <lib dir="${solr.install.dir:../../../..}/dist/" regex="mysql-connector-java-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="cassandra-all-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="cassandra-jdbc-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="cassandra-thrift-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="libthrift-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="cassandra-jdbc-driver-\d.*\-shaded.jar" />

edit solrconfig.xml file again with to add a new requesthandler

$ sudo vi solrconfig.xml

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
        <lst name="defaults">
           <str name="config">db-data-config.xml</str>
        </lst>
</requestHandler>

db-data-config.xml
let’s create the db-data-config.xml file from scratch and put the following

$ sudo vi db-data-config.xml

<dataConfig>
  <dataSource type="JdbcDataSource" driver="com.github.cassandra.jdbc.CassandraDriver" url="jdbc:c*://yourip:9042/oyvent"
 user="yourusername" password="yourpassword" autoCommit="true"/>
<document name="content">
     <entity name="userplaces" query="select placeid, profilestatus, postdate, fullname from userplaces" autoCommit="true"> 
            <field column="postdate" name="postdate" />
            <field column="placeid" name="placeid" />
            <field column="fullname" name="fullname" />
            <field column="profilestatus" name="profilestatus" />
    </entity>
</document>
</dataConfig>

don’t forget to make it writable

$ sudo chmod 777 db-data-confing.xml

now add the following fields to the managed-schema file

$ sudo vi managed-schema

<field name="placeid" type="string" indexed="true" stored="true" required="true" />
<field name="postdate" type="date"  indexed="true" stored="true" required="true" />
<field name="fullname" type="text_general" indexed="true" stored="true" required="true" />
<field name="profilestatus" type="text_general" indexed="true" stored="true" required="true" />

restart the solr

$ sudo service solr restart

string vs text_general

With string field you store words/sentences  as an exact string that does not perform tokenization. Let’s say “Mehmet Sen” fullname is string. In this case you can search for the ‘Sen’ but not for ‘sen’. string type of field is useful mostly to get exact matches

[fullname] => Mehmet Sen

text_general however performs tokenization along with secondary indexing (i.e lower-case). You can search for the ‘sen’ in this case

[fullname] => Array
 (
     [0] => Mehmet Sen
 )

Lookup here for detailed explanation: https://stackoverflow.com/questions/7175619/apache-solr-string-or-text 

HOW TO USE SOLR

Go back to the http://yourip:8983/solr
mycollection1->DataImport->Entity->userplaces
Click Execute

or go to Query->Execute Query

go to the logs, you might see a solr exception like this

Document is missing mandatory uniqueKey field: id

We decided to make the needed id field as random uuid key so that we have something for id field

According to the solution here, we follow the same steps
https://stackoverflow.com/questions/41143842/solr-uuid-with-error-document-is-missing-mandatory-uniquekey-field-id

make the following changes in solrconfig file

$ vi solrconfig.xml

......
<updateRequestProcessorChain name="uuid">
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in the incoming document -->
<processor class="solr.UUIDUpdateProcessorFactory" >
<str name="fieldName">id</str>
</processor>
.......
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>

finally restart the solo

$ sudo service solr restart

to run the import handler every 5 min

$ crontab -e
*/5 * * * * curl http://yourip:8983/solr/mycollection1/dataimport?command=full-import

PHP CLIENT FOR SOLR – SOLARIUM

Home page: http://www.solarium-project.org/
Tutorial: http://solarium.readthedocs.io/en/stable/

Assuming you setup your solr server and now need to query through your php client

For Mac Sierra (10.12.6)
go to the project folder. It’s a standard plain php project

$ cd  /Users/mehmetsen/PhpstormProjects/xtalkweb

make sure first you have composer installed

$ curl -sS https://getcomposer.org/installer | php

All settings correct for using Composer
 Downloading...

Composer (version 1.4.2) successfully installed to: /Users/mehmetsen/PhpstormProjects/xtalkweb/composer.phar
 Use it: php composer.phar

finally install solarium

$ sudo php composer.phar require solarium/solarium

Using version ^3.8 for solarium/solarium
 ./composer.json has been updated
 Loading composer repositories with package information
 Updating dependencies (including require-dev)
 Package operations: 2 installs, 0 updates, 0 removals
 - Installing symfony/event-dispatcher (v3.3.6): Downloading (100%)
 - Installing solarium/solarium (3.8.1): Downloading (100%)
 symfony/event-dispatcher suggests installing symfony/dependency-injection ()
 symfony/event-dispatcher suggests installing symfony/http-kernel ()
 solarium/solarium suggests installing minimalcode/search (Query builder compatible with Solarium, allows simplified solr-query handling)

since it suggests to install minimalcode/search

$ sudo php composer.phar require minimalcode/search

you can check out your composer.json file

$ vi composer.json

For Ubuntu 14.04 or 16.06

go to the project folder. It’s a standard plain php project

$ cd /usr/share/nginx/html/xtalk/

install solarium and suggested minimalcode/search

$ sudo composer require solarium/solarium
$ sudo composer require minimalcode/search

Here is the unit test code how we use the Solarium

<?php
/**
 * Created by PhpStorm.
 * User: mehmetsen
 * Date: 8/2/17
 * Time: 12:54 PM
 */

namespace xtalk\test;

use Solarium\Exception\InvalidArgumentException;

require_once(dirname(dirname(__FILE__)) . "/vendor/autoload.php");

define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />');

class TestSolarium extends \PHPUnit_Framework_TestCase
{
 var $config = NULL;

protected function setUp()
 {
 //create the config
 $this->config = array(
 'endpoint' => array(
 'localhost' => array(
 'host' => 'yourip.com',
 'port' => 8983,
 'path' => '/solr/mycollection1/',
 )
 )
 );
 }

public function ignore_testVersion(){

// check solarium version available
 echo 'Solarium library version: ' . \Solarium\Client::VERSION . ' - ';
 }

public function ignore_testPing(){

$client = new \Solarium\Client($this->config);

$ping = $client->createPing();

// execute the ping query
 try {
 $result = $client->ping($ping);
 echo 'Ping query successful';
 echo '<br/><pre>';
 var_dump($result->getData());
 echo '</pre>';
 } catch ( InvalidArgumentException $e) {
 echo 'Ping query failed';
 }

}

public function ignore_testBasicSelectQuery(){

// create a client instance
 $client = new \Solarium\Client($this->config);

// get a select query instance
 $query = $client->createQuery($client::QUERY_SELECT);
 // or you can use the following to create a select query
 // $query = $client->createSelect();

// this executes the query and returns the result
 $resultset = $client->execute($query);

//print the resultset
 $this->printResultSet($resultset);
 }

public function ignore_testSortQuery(){

// create a client instance
 $client = new \Solarium\Client($this->config);

// get a select query instance
 $query = $client->createSelect();

// sort the results by price ascending
 $query->addSort('postdate', $query::SORT_DESC);

// this executes the query and returns the result
 $resultset = $client->execute($query);

//print the resultset
 $this->printResultSet($resultset);
 }

public function testFilterFullnameQuery(){

$client = new \Solarium\Client($this->config);

// get a select query instance
 $query = $client->createSelect();

// sort the results by price ascending
 $query->addSort('postdate', $query::SORT_DESC);

// create a filterquery using the API
 $fq = $query->createFilterQuery('fullname')->setQuery('fullname: *Admin*');//Peerit Admin
 //add filterquery
 $query->addFilterQuery($fq);

// create another filterquery
 //Himmmm bla
 //Bla
 $fq = $query->createFilterQuery('profilestatus')->setQuery('profilestatus: *bla*');
 //add filterquery
 $query->addFilterQuery($fq);


// this executes the query and returns the result
 $resultset = $client->execute($query);

//print the resultset
 $this->printResultSet($resultset);

/********** The Output is *********
 profilestatus: Himmmm bla
 Bla
 placeid: ChIJC0Qj2L3nQIYRRPJpOAxqjIY
 postdate: 2017-08-02T19:44:06.137Z
 fullname: Peerit Admin
 id: f6e2c7f4-d644-475f-bb96-ca7e198f5d03
 _version_: 1574649836711444480
 score: 1
 **/

}

//General purpose print resultset
 public function printResultSet($resultset){

// display the total number of documents found by solr
 echo 'NumFound: '.$resultset->getNumFound(), EOL, EOL;

// show documents using the resultset iterator
 foreach ($resultset as $result) {

// the documents are also iterable, to get all fields
 foreach ($result as $field => $value) {
 // this converts multivalue fields to a comma-separated string
 if (is_array($value)) {
 $value = implode(', ', $value);
 }

echo $field . ': ' . $value . ' ',EOL;
 }

echo EOL, EOL;
 }
 }
}

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s