Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Create a Robust Document Management System with Oracle Text : Page 3

Oracle Text has a powerful text-search engine that allows users to search and manage documents and URLs as quickly and easily as any other data. Learn to use these features to deliver a robust content management search engine.


advertisement
Step 2: Create Table and Import Data
As user CMDEMO, create the main table to house your content:

CREATE TABLE CMDEMO.CONTENT_INVENTORY ( CONTENT_INVENTORY_ID NUMBER NOT NULL, FILE_NAME VARCHAR2(1024), KEYWORDS VARCHAR2(2048), URL SYS.HTTPURITYPE, TEXT BLOB DEFAULT empty_blob(), TEXT_LOADED CHAR(1) DEFAULT 'N' NOT NULL, CONSTRAINT UNQ_CONTENT_INVENTORY UNIQUE(CONTENT_INVENTORY_ID) USING INDEX TABLESPACE USERS );

Several fields within this routine are worth noting and understanding:

  • Notice that you set your BLOB field to default to empty_blob(). This avoids uninitialized blob errors.
  • The FILE_NAME field is the name of your binary documents (e.g., Word docs, PDFs, XLS, etc.).
  • The KEYWORDS field, as mentioned earlier, gives you the ability to add words that are synonymous with the document but are not part of the text within the document (an example of using this comes later).
  • The URL field allows you to harvest text from internal and external Web sites.
  • The TEXT field is where the binary documents are loaded into the database via your PL/SQL routine, which I will cover in just a moment.
  • The TEXT_LOADED field is optional in this example, but it is useful in determining whether the document has been loaded into the database yet.
Load several rows into your CONTENT_MANAGEMENT table. This may seem redundant, but you are loading different file types in each row for demonstration purposes:


--PDF Document INSERT INTO CMDEMO.CONTENT_INVENTORY (CONTENT_INVENTORY_ID, FILE_NAME, KEYWORDS) VALUES ('1', 'install.pdf', 'encompassing'); --Word Document INSERT INTO CMDEMO.CONTENT_INVENTORY (CONTENT_INVENTORY_ID, FILE_NAME, KEYWORDS) VALUES ('2', 'Readme.doc', ''); --Excel Spreadsheet INSERT INTO CMDEMO.CONTENT_INVENTORY (CONTENT_INVENTORY_ID, FILE_NAME, KEYWORDS) VALUES ('3', 'OO4O_ORA.XLS', ''); --Powerpoint Presentation INSERT INTO CMDEMO.CONTENT_INVENTORY (CONTENT_INVENTORY_ID, FILE_NAME, KEYWORDS) VALUES ('4', ' SalesPresentation.ppt ', ''); --Internal website (Intranet) INSERT INTO CMDEMO.CONTENT_INVENTORY (CONTENT_INVENTORY_ID, URL, KEYWORDS) VALUES ('5', sys.httpuritype.createuri('http://www.your-intranet.com'), ''); --External website INSERT INTO CMDEMO.CONTENT_INVENTORY (CONTENT_INVENTORY_ID, URL, KEYWORDS) VALUES ('6', sys.httpuritype.createuri('http://otn.oracle.com/index.html'), ''); COMMIT;

Notice you gave an additional keyword in the KEYWORDS column on the first record. You will use this as additional metadata searching later in the example.

Now that you have some rows of data, you must load the files specified in FILE_NAME into the TEXT column of your CONTENT_INVENTORY table. You can load this data in a variety of methods, including:

  • DBMS_LOB.LOADFROMFILE – the method you use in this article
  • SQL INSERT statements
  • OCI – Oracle Call Interface
  • Ctxload executable
  • SQL*Loader

The following routine creates a cursor of your data in the CONTENT_MANAGEMENT table and loads the documents into the BLOB field (only for the records that have a FILE_NAME; you'll see how to index records that have URLs defined separately in the next step):

CREATE OR REPLACE PROCEDURE loadLOBFromBFILE IS Dest_loc BLOB; Src_loc BFILE := BFILENAME('DIR_TESTCASE', ''); Amount INTEGER := 0; cursor ContentInv_cursor is --create a recordset of the file names --that we need to load from the filesystem into the --Oracle database. SELECT CONTENT_INVENTORY_ID, FILE_NAME FROM CMDEMO.CONTENT_INVENTORY WHERE FILE_NAME IS NOT NULL; varContentInv_id CONTENT_INVENTORY.CONTENT_INVENTORY_ID%TYPE; varFileName CONTENT_INVENTORY.FILE_NAME%TYPE; BEGIN open ContentInv_cursor; loop varFileName := NULL; varContentInv_id := NULL; fetch ContentInv_cursor into varContentInv_id, varFileName; exit when ContentInv_cursor%notfound; Src_loc := BFILENAME('DIR_TESTCASE', varFileName); IF DBMS_LOB.FILEEXISTS (Src_loc) = 1 THEN SELECT text INTO Dest_loc FROM CONTENT_INVENTORY WHERE CONTENT_INVENTORY_ID = varContentInv_id FOR UPDATE; /* Opening the LOB is mandatory */ DBMS_LOB.OPEN(Src_loc, DBMS_LOB.LOB_READONLY); DBMS_LOB.LOADFROMFILE(Dest_loc, Src_loc, dbms_lob.getlength(Src_loc)); /* Closing the LOB is mandatory */ DBMS_LOB.CLOSE(Src_loc); /* Optional update statement */ UPDATE CMDEMO.CONTENT_INVENTORY SET TEXT_LOADED = 'Y' WHERE CONTENT_INVENTORY_ID = varContentInv_id; COMMIT; END IF; end loop; EXCEPTION WHEN OTHERS THEN RAISE_APPLICATION_ERROR (-20001, 'Error occurred while loading LOBs.'); close ContentInv_cursor; END; / That created the procedure. Now, execute it and load the documents into the database: SQL> set serveroutput on; SQL> EXECUTE loadLOBFromBFILE; PL/SQL procedure successfully completed.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date