I have this client table in Oracle having more than 3 million rows. One of the column is the client name in which our users want to have full text search support.
Originally, Oracle context search feature was choosen by a group of consultants. May be we were maintaining the domain index correctly but its performance sucks.
I spend most of my time troubleshooting this issue and found out that we need to run “optimize” to make the index perform well. But anyway I am tired of having to hear users’ complaints
everyday so I got rid of Oracle full text search and refactor the search service to use Apache Lucene.
Here is how is done:
Connect to the client table and get the rows need to be indexed. Of course I did this in batch of 20K rows each.
You could gain significant performance of this data loading using a database connection pool and pulling data in parallel.
Use Lucene to index the client name and store the client IDs for DB retrieval later on
importorg.apache.lucene.index.IndexWriter;importorg.apache.lucene.document.Document;importorg.apache.lucene.document.Field;importorg.apache.lucene.analysis.standard.StandardAnalyzer;importgroovy.sql.Sql;importcom.mchange.v2.c3p0.ComboPooledDataSource;importjava.util.concurrent.*ComboPooledDataSourceds=newComboPooledDataSource()ds.driverClass='oracle.jdbc.driver.OracleDriver'ds.jdbcUrl=''ds.user=''ds.password=''ds.minPoolSize=2ds.maxPoolSize=4ds.acquireIncrement=1defstepCount=20000BlockingQueuequeue=newArrayBlockingQueue(8);defpool=Executors.newFixedThreadPool(3);defecs=newExecutorCompletionService(pool);defsubmit={c->ecs.submit(casCallable)}defsql="""selectid, namefrom(select /*+ FIRST_ROW(${stepCount}) */id, name, rownum rnumfromatlas_clientwhereid > 0and rownum <= ?order byid asc)wherernum > ?"""defmaxCount=newSql(ds).firstRow('select count(1) as maxCount from atlas_client where id > 0'').maxCount;println "Max count: ${maxCount}"def range = (0..maxCount).step(stepCount)if (range[-1] < maxCount) range << maxCountdef indexDir = new File('atlas_client_index');IndexWriterwriter=newIndexWriter(indexDir,newStandardAnalyzer(),true);writer.setUseCompoundFile(false);defquery={low,hi->println"Querying from ${low} to ${hi}"defrows=[]newSql(ds).eachRow(sql,[hi,low]){row->rows<<row.toRowResult()}println"Finishing query. Return ${rows.size()} rows from ${low} to ${hi}"returnrows;}defcreateIndex={array->array.each{Documentdoc=newDocument();doc.add(newField("id",it.id.toString(),Field.Store.YES,Field.Index.NO));doc.add(newField("name",it.name,Field.Store.YES,Field.Index.ANALYZED));writer.addDocument(doc);}}for(inti=0;i<range.size()-1;i++){println"Submitting query for clients record from ${range[i]} to ${range[i+1]}"deflow=range[i]defhi=range[i+1]submit{query(low,hi)};}for(inti=0;i<range.size()-1;i++){defresults=ecs.take().get();println"Results (${results.size()}) ready for clients record query, indexing..."createIndex(results);println"Indexing done."writer.commit();}writer.close();pool.shutdown();