Saturday, February 07, 2015

ORA-06512 and ORA-29283 in RAC


Came across an interesting scenario in UTL_FILE operation, here it goes 

Following error happens 

ERROR at line 1:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at line 5

only when
  1. You have created a directory to which the file operation should happen,  under a different user, 
  2. Permission is granted to oracle aka rdbms owner,  via addition of group,  which has write permission to the above directory  
  3. DB is on Grid Infrastructure
  4. Database instance and listener is managed via srvctl/ and started via srvctl. 
  5. Cluster has not been bounced since #2 
  6. Connection is made from a remote client.

--- > Creating the directory 

[root@myexaadm011 ~]# dcli -g dbs_group -l root mkdir -p /pncsni/apps
[root@myexaadm011 ~]# dcli -g dbs_group -l root chown apps:oinstall /pncsni/apps
[root@myexaadm011 ~]# dcli -g dbs_group -l root chmod -R 770 /pncsni/apps
[root@myexaadm011 ~]# dcli -g dbs_group -l root usermod -g dborgrid -G oinstall,dba,dborgrid  orpncsni
[root@myexaadm011 ~]# dcli -g dbs_group -l root id orpncsni
myexaadm011uid=3982(orpncsni) gid=11001(dborgrid) groups=11001(dborgrid),10005(oinstall),8500(dba)
myexaadm012: uid=3982(orpncsni) gid=11001(dborgrid) groups=11001(dborgrid),10005(oinstall),8500(dba)
myexaadm013: uid=3982(orpncsni) gid=11001(dborgrid) groups=11001(dborgrid),10005(oinstall),8500(dba)
myexaadm014: uid=3982(orpncsni) gid=11001(dborgrid) groups=11001(dborgrid),10005(oinstall),8500(dba)

[root@myexaadm011 ~]# su - orpncsni

--- > Verifying  we can write to this Directory 

[orpncsni@myexaadm011 ]$  cd /pncsni/apps
[orpncsni@myexaadm011 ]$  touch a.txt; cat a.txt 
[orpncsni@myexaadm011 ]$  

--- > Bouncing  Database and Listener using srvctl 

[orpncsni@myexaadm011 ]$  srvctl stop database -d PNCSNI
[orpncsni@myexaadm011 ]$  srvctl stop listener -l PNCSNI
[orpncsni@myexaadm011 ]$  srvctl start listener -l PNCSNI
[orpncsni@myexaadm011 ]$  srvctl start database -d PNCSNI


--- >  Creating the directory object in the Database 

SQL> show con_name

CON_NAME
------------------------------
PDBCST001

SQL> create directory TESTLIST as  '/pncsni/apps'; 
SQL> grant read,write on directory TESTLIST to public; 



--> Connected locally from DB Server itself 



SQL> DECLARE
  f1 utl_file.file_type;
  buf_out VARCHAR2(20000) := 'abcdefg';
BEGIN
  f1 := utl_file.fopen('TESTLIST','navwriting1.txt','w',32767);
  utl_file.put_line (f1, buf_out);
  utl_file.fclose(f1);
END;
/  SQL>   2    3    4    5    6    7    8    9

PL/SQL procedure successfully completed.

SQL>


--> Usually DBAs are done here and they leave it for Developers / Application Admins / Users 

--> Connected from remote client (Developers or Users) 

SQL> DECLARE
  f1 utl_file.file_type;
  buf_out VARCHAR2(20000) := 'abcdefg';
BEGIN
  f1 := utl_file.fopen('TESTLIST','navwriting1.txt','w',32767);
  utl_file.put_line (f1, buf_out);
  utl_file.fclose(f1);
END;
/   2    3    4    5    6    7    8    9
DECLARE
*
ERROR at line 1:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at line 5

-- > now its the time for endless email chains, and tickets :) 

Usual things that DBA checks here - 
  1. Physical path / directory exists on all DB nodes , 
  2. oracle aka rdbms owner can write to it on all nodes in RAC 
  3. Database and Listeners are bounced many a times 
  4. Do the same execution from different DB Tier nodes multiple times  - All Success !!  but still clients cannot use the code  
Its been observed while doing all this troubleshooting  that it works fine when the listeners are started manually by lsnrctl on each node. 

Lets look the listener process

-- > When started via srvctl 
[orpncsni@myexaadm011 ]$ ps -ef |grep tns|grep PNCSNI
orpncsni  43740      1  0 18:32 ?        00:00:00 /pncsni/oracle/product/12102/bin/tnslsnr PNCSNI -no_crs_notify -inherit
[orpncsni@myexaadm011 ]$ grep Groups /proc/43740/status
Groups: 8500 11001

--> When started manually by lsnrctl 

[orpncsni@myexaadm011 ]$ ps -ef |grep tns|grep PNCSNI
orpncsni 111991      1  0 19:18 ?        00:00:00 /pncsni/oracle/product/12102/bin/tnslsnr PNCSNI -inherit

[orpncsni@myexaadm011 ]$ grep Groups /proc/111991/status
Groups: 8500 10005 11001

Here you can see that when listener is started via srvctl, the process does not have the group attributes which was recently added.  Hence the server processes it spawns also wont have the same and it cannot write to the directory
If the listener is started using SRVCTL then it does not inherit groupid 10005 (oinstall)  which has the write permission to the directory . If the listener is started using LSNRCTL then it does , and everything works.

The reasoning for thi is that the CRSD processes caches all the user information about all its resource owners at startup and is not refreshed in the middle.

This issue can be fixed via a quick CRSD bounce

[orgrid@myexaadm011 ~]$ crsctl stop resource ora.crsd -init
[orgrid@myexaadm011 ~]$ crsctl start resource ora.crsd -init

This can be done rolling and very quickly , does not affect other resources running on the server.  This bounce repopulates the user information about all  its resource owners.


No comments: