https://jhdba.wordpress.com/2009/05/19/purging-statistics-from-the-sysaux-tablespace/
It will show how to reduce space taken up in the SYSAUX tablespace by because you have many more AWR snapshots being retained than you think.
Firstly lets take an example database and we can see that we are using 92Gb of space
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | set linesize 120 set pagesize 100 col ash form a30 col retention form a30 col snap form a30 COLUMN "Item" FORMAT A25 COLUMN "Space Used (GB)" FORMAT 999.99 COLUMN "Schema" FORMAT A25 COLUMN "Move Procedure" FORMAT A40 SELECT occupant_name "Item", space_usage_kbytes/1048576 "Space Used (GB)", schema_name "Schema", move_procedure "Move Procedure" FROM v$sysaux_occupants WHERE occupant_name = 'SM/AWR' ORDER BY 1 / |
Item Space Used (GB) Schema Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR 91.88 SYS
Now we are looking to see how long we should be retaining AWR snapshots for and what we actually have
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | select sysdate - a.sample_time ash, sysdate - s.begin_interval_time snap, c.RETENTION from sys.wrm$_wr_control c, ( select db.dbid, min(w.sample_time) sample_time from sys.v_$database db, sys.Wrh$_active_session_history w where w.dbid = db.dbid group by db.dbid ) a, ( select db.dbid, min(r.begin_interval_time) begin_interval_time from sys.v_$database db, sys.wrm$_snapshot r where r.dbid = db.dbid group by db.dbid ) s where a.dbid = s.dbid and c.dbid = a.dbid; |
ASH SNAP RETENTION
—————————— —————————— ——————————
+000000875 22:22:41.045 +000000030 12:22:28.323 +00030 00:00:00.0
We want to keep 30 days worth of snapshots – we have set the retention period to that – but we have 875 days worth of active session history.
I rather smugly said let’s show an example and then I produce a schema with nearly 3 years worth of snapshots being maintained but there is no trickery involved. Looking around at other systems I can find a few similar examples and it is always down to the same issue. Let’s look further.
1 2 3 4 5 6 | select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS' group by table_name order by 1; |
TABLE_NAME COUNT(*) ------------------------------ ---------- WRH$_ACTIVE_SESSION_HISTORY 2 WRH$_DB_CACHE_ADVICE 2 WRH$_DLM_MISC 2 WRH$_EVENT_HISTOGRAM 2 WRH$_FILESTATXS 2 WRH$_INST_CACHE_TRANSFER 2 WRH$_INTERCONNECT_PINGS 2 WRH$_LATCH 2 WRH$_LATCH_CHILDREN 2 WRH$_LATCH_MISSES_SUMMARY 2 WRH$_LATCH_PARENT 2 WRH$_OSSTAT 2 WRH$_PARAMETER 2 WRH$_ROWCACHE_SUMMARY 2 WRH$_SEG_STAT 2 WRH$_SERVICE_STAT 2 WRH$_SERVICE_WAIT_CLASS 2 WRH$_SGASTAT 2 WRH$_SQLSTAT 2 WRH$_SYSSTAT 2 WRH$_SYSTEM_EVENT 2 WRH$_SYS_TIME_MODEL 2 WRH$_TABLESPACE_STAT 2 WRH$_WAITSTAT 2 24 rows selected.
There is the problem, all the WRH$ data is held in 2 partitions and the overnight house-keeping job does not have time to finish it’s tidy-up before it’s 15 minutes of fame is over again for another 24 hours.
Two components of Server Manageability (SM) components that reside in the SYSAUX tablespaces cause the problem. The components involved are the Automatic Workload Repository (AWR) and Optimizer Statistics History (OPTSTAT). Both of these components have retention periods associated with their data, and the MMON process should run nightly, as part of the scheduled maintenance tasks, to purge data that exceeds these retention periods. From version 11G onwards, the mmonpurging process has been constrained to a time-limited window for each of the purges, if this window is exceeded the purging stops and an ORA-12751 error is written to an m000 trace file.
For the AWR data, held in tables with names commencing with WRH$, the probable cause is due to fact that a number of the tables are partitioned. New partitions are created for these tables as part of the mmon process. Unfortunately, it seems that the partition splitting process is the final task in the purge process. As the later partitions are not split they end up containing more data. This results in partition pruning within the purge process becoming less effective.
For the OPTSTAT data, held in tables with names commencing with WRI$, the cause is more likely to be related to the volume of data held in the tables. WRI$ tables hold historical statistical data for all segments in the database for as long as specified by the stats history retention period. Thus, if there database contains a large number of tables with a long retention period – say 30 days, then the purge process will have an issue trying to purge all of the old statistics within the specified window.
I also think that two scenarios lead to this problem
1) SYSAUX running out of space, which I know has happened on at least one occasion and the volume of data left in a partition is too much to handle in a constrained time-window and so continues to grow in the same partition.
2) If the database is shut down over the period of the maintenance task and again the volume in the partition becomes too large to handle in a short-time.
The resolution to this problem is split into two stages.
Firstly a temporary manual fix is required to rectify the partition splitting failure. A job needs to scheduled on the database, to be run as the sys user which executes the following code:This job will force the splitting of partitions for the WRH$ tables, and should be scheduled to run on a daily basis until the number of partitions per table exceeds the AWR retention period.
1 2 3 | begin execute immediate 'alter session set "_swrf_test_action" = 72'; end; |
What I do is drop the retention period to 8 days ( or maybe less) and then run this procedure once a day. It can take up to an hour but afterwards you can see the additional partitions have been created.
TABLE_NAME COUNT(*) ------------------------------ ---------- WRH$_ACTIVE_SESSION_HISTORY 3 WRH$_DB_CACHE_ADVICE 3 WRH$_DLM_MISC 3 WRH$_EVENT_HISTOGRAM 3 WRH$_FILESTATXS 3 WRH$_INST_CACHE_TRANSFER 3 WRH$_INTERCONNECT_PINGS 3 WRH$_LATCH 3 WRH$_LATCH_CHILDREN 2 WRH$_LATCH_MISSES_SUMMARY 3 WRH$_LATCH_PARENT 2 WRH$_OSSTAT 3 WRH$_PARAMETER 3 WRH$_ROWCACHE_SUMMARY 3 WRH$_SEG_STAT 3 WRH$_SERVICE_STAT 3 WRH$_SERVICE_WAIT_CLASS 3 WRH$_SGASTAT 3 WRH$_SQLSTAT 3 WRH$_SYSSTAT 3 WRH$_SYSTEM_EVENT 3 WRH$_SYS_TIME_MODEL 3 WRH$_TABLESPACE_STAT 3
Once the correct number of partitions have been created the old data may be automatically purged by the mmon process. If this does not occur, it will be necessary to manually purge the data from the AWR tables.
Firstly turn off the production of snapshots for the duration of this activity:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | get the dbid for this database select dbid from v$database; <strong>-- get current snap interval (in minutes)</strong> select extract( day from snap_interval) *24 *60 + extract( hour from snap_interval) *60 + extract( minute from snap_interval) snap_interval from wrm$_wr_control where dbid = ; -- disable snaps by setting interval to 0 exec dbms_workload_repository.modify_snapshot_settings(interval=> 0, dbid => <dbid>) |
Then determine the maximum snapshot to be purged from the database by querying the maximum snap-id outside of the retention period:
1 2 3 4 5 6 7 8 9 10 11 | select max(snap_id) max_snap from wrm$_snapshot where begin_interval_time < (sysdate - (select retention from wrm$_wr_control where dbid = )); |
This will return the maximum snapshot to be purged
The AWR tables will not purge if baselines exist, so check and remove as required. The following should return only 1 row called SYSTEM_MOVING_WINDOW:
1 2 3 | select baseline_name,creation_time from dba_hist_baseline; |
If other baselines are found they will need to be removed:
1 | exec dbms_workload_repository.drop_baseline(baseline_name => <baseline>); |
If we look at the current retention period in this code it is set to 15 days, so we set it to 8 days
1 2 3 4 | select d.dbid,w.snap_interval,w.retention from DBA_HIST_WR_CONTROL w, v$database d where w.dbid = d.dbid; Begin DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(11520,0); end; begin</strong> DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(129600,0); <strong>end</strong>; |
Then we purge the snapshots ( – in ranges if there a are a lot as in this example). This can take several hours
1 2 3 | EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>100000, high_snap_id=>114159); EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>116261, high_snap_id=>116265); |
Once all that is complete it is just a matter of tidying up the indexes
ASH SNAP RETENTION —————————————————————————
+000000008 23:47:01.421 +000000008 23:46:55.865 +00008 00:00:00.0
Item Space Used (GB) Schema Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR 93.70 SYS
1 | select 'alter index '||segment_name||' rebuild online parallel (degree 4);' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type='INDEX' order by segment_name; |
double check for unusable indexes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | select 'alter index '||s.segment_name||' rebuild online parallel (degree 4);' from dba_segments s, dba_indexes i where s.tablespace_name= 'SYSAUX' and s.segment_name like 'WRH$_%' and s.segment_type='INDEX' and i.status = 'UNUSABLE' and i.index_name = s.segment_name order by s.segment_name select 'alter table '||segment_name||' move tablespace sysaux ;' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type = 'TABLE' order by segment_name |
Item Space Used (GB) Schema Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR 10.05 SYS
I hope that has been useful and I would be pleased to hear if others see the same issues on some databases whereby the actual data retained is much lonher that the snapshot retention period would suggest
출처 : https://jhdba.wordpress.com/tag/purge-wrh-tables/
SYSAUX tablespace maintenance
Problem description
The SYSAUX tablespace in our IPCC environments are growing permanently. The analyse of the problem shows, that the AWR snapshots are deleted after retention period as designed. Problematic are the history tables, WRH$_ACTIVE_SESSION_HISTORY and WRH$_EVENT_HISTOGRAM, also other. The older snapshots history will be kept by oracle. The data will be stored in only one single partition. The partition will be split after the partition size achieved the threshold, what is never happened in our environment.
The reason of the problem described by oracle:
Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.
If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.
Solutions
The workaround solution suggested by oracle, not documented officially
A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.You will still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:
alter session set “_swrf_test_action” = 72;
to perform a single split of all the AWR partitions.
Test in our environment:
SELECT owner, segment_name, partition_name, segment_type, bytes/1024/1024/1024 Size_GB FROM dba_segments WHERE segment_name='WRH$_ACTIVE_SESSION_HISTORY'; SEGMENT_NAME ------------------------------------------------------------ PARTITION_NAME SEGMENT_TYPE SIZE_GB ------------------------------ ------------------ ---------- WRH$_ACTIVE_SESSION_HISTORY WRH$_ACTIVE_3551044952_19447 TABLE PARTITION .010742188 WRH$_ACTIVE_SESSION_HISTORY WRH$_ACTIVE_SES_MXDB_MXSN TABLE PARTITION 6.1035E-05 2 rows selected.
The table has only two partitions: maxvalue and single data partition containing all the data.
alter session set “_swrf_test_action” = 72;
The execution can take few minutes. After the execution of the previous select, we can see, that one more partition has been created:
SEGMENT_NAME ------------- PARTITION_NAME SEGMENT_TYPE SIZE_GB -------------------- ------------- ------- WRH$_ACTIVE_SESSION_HISTORY WRH$_ACTIVE_3551044952_19447 TABLE PARTITION .010742188 WRH$_ACTIVE_SESSION_HISTORY WRH$_ACTIVE_3551044952_25147 TABLE PARTITION 6.1035E-05 WRH$_ACTIVE_SESSION_HISTORY WRH$_ACTIVE_SES_MXDB_MXSN TABLE PARTITION 6.1035E-05 3 rows selected.
Note that this command will split partitions for ALL partitioned AWR objects. It also initiates a single split; it does not need to be disabled and will need to be repeated if multiple splits are required.
After the partition split we can pray, that the partitions will be aged out and dropped by the nightly job.
The workaround solution suggested by oracle, documented officially
We can realize, that in the history tables are much more older snapshots that in snapshot tables. In the actual tables:
select count(*), max(snap_id), min(snap_id) from DBA_HIST_SNAPSHOT order by end_interval_time asc; COUNT(*) MAX(SNAP_ID) MIN(SNAP_ID) ---------- ------------ ------------ 180 25144 24965
In the history tables:
select count(distinct(snap_id)), max(snap_id), min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY ; COUNT(DISTINCT(SNAP_ID)) MAX(SNAP_ID) MIN(SNAP_ID) ------------------------ ------------ ------------ 5674 25144 19447 1 row selected.
To get read of the old entries we use DBMS Package:
begin DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE( low_snap_id => 19447, high_snap_id => 24964 ); end; /
After ca. 32 Minutes processing, the result is:
select count(distinct(snap_id)), max(snap_id), min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY ; COUNT(DISTINCT(SNAP_ID)) MAX(SNAP_ID) MIN(SNAP_ID) ------------------------ ------------ ------------ 180 25145 24965 1 row selected.
For this solution we have to know:
* The old entries will be deleted in ALL AWR tables.
* The space will be not freed because of the High Water Mark
* The growth of the tablespace will be stopped for a while
The oracle permanent bugfix, documented officially
The problem has been reported as a bug in Oracle Metalink. And there is a interim patch for a Linux Oracle version.
The patch number is 14084247 and can be found in Metalink.
The test follows.
Results
# The both options have to be repeated after the reasonable period.
# They can be combined, but haven’t to
The workaround with splitting partition
The workaround with splitting partition works fine. The partitions will be dropped in the weekend maintenance window on friday in the evening.
The test databases results look like follows:
Y database:
THEDATE | GBSIZE | PREV_GBSIZE | DIFF |
---|---|---|---|
02.09.2014 | 1,86 | ||
03.09.2014 | 1,86 | 1,86 | 0 |
04.09.2014 | 1,87 | 1,86 | 0,01 |
05.09.2014 | 0,5 | 1,87 | -1,37 |
06.09.2014 | 0,5 | 0,5 | 0 |
07.09.2014 | 0,51 | 0,5 | 0,01 |
08.09.2014 | 0,52 | 0,51 | 0,01 |
09.09.2014 | 0,52 | 0,52 | 0 |
Database X:
THEDATE | GBSIZE | PREV_GBSIZE | DIFF |
---|---|---|---|
02.09.2014 | 1,91 | ||
03.09.2014 | 1,91 | 1,91 | 0 |
04.09.2014 | 1,92 | 1,91 | 0,01 |
05.09.2014 | 0,39 | 1,92 | -1,53 |
06.09.2014 | 0,4 | 0,39 | 0,01 |
07.09.2014 | 0,41 | 0,4 | 0,01 |
08.09.2014 | 0,42 | 0,41 | 0,01 |
09.09.2014 | 0,42 | 0,42 | 0 |
Workaround with deleting data
After the stored procedure has been started for a while the SYSAUX tablespace doesn’t grow. It was tested on Y database:
THEDATE | GBSIZE | PREV_GBSIZE | DIFF |
---|---|---|---|
01.09.2014 | 1,99 | ||
02.09.2014 | 1,99 | 1,99 | 0 |
03.09.2014 | 2,01 | 1,99 | 0,02 |
04.09.2014 | 2,01 | 2,01 | 0 |
05.09.2014 | 2,01 | 2,01 | 0 |
06.09.2014 | 2,01 | 2,01 | 0 |
07.09.2014 | 2,01 | 2,01 | 0 |
08.09.2014 | 2,01 | 2,01 | 0 |
Permanent Patch
The test results after the weekend maintenance window will come.
출처 : http://deputydba.kosherdev.com/?p=9