티스토리 뷰

반응형

Posts Tagged ‘purge WRH$ tables’

Tidying up SYSAUX – removing old snapshots which you didn’t know existed

Posted by John Hallas on July 9, 2014

This post runs in parallel with a post I made 5 years ago and which is still the most read one on  this blog

https://jhdba.wordpress.com/2009/05/19/purging-statistics-from-the-sysaux-tablespace/

It will show how to reduce space taken up in the  SYSAUX tablespace by because you have many more AWR snapshots being retained than you think.

Firstly lets take an example database and we can see that we are using 92Gb  of space

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
set linesize 120
set pagesize 100
col ash form a30
col retention form a30
col snap form a30
 
COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40
 
SELECT  occupant_name "Item",
space_usage_kbytes/1048576 "Space Used (GB)",
schema_name "Schema",
move_procedure "Move Procedure"
FROM v$sysaux_occupants
WHERE occupant_name = 'SM/AWR'
ORDER BY 1
/

 

Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------

SM/AWR                             91.88                                      SYS

Now we are looking to see how long we should be retaining AWR snapshots for and what we actually have

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
select sysdate - a.sample_time ash,
sysdate - s.begin_interval_time snap,
c.RETENTION
from sys.wrm$_wr_control c,
(
select db.dbid,
min(w.sample_time) sample_time
from sys.v_$database db,
sys.Wrh$_active_session_history w
where w.dbid = db.dbid group by db.dbid
) a,
(
select db.dbid,
min(r.begin_interval_time) begin_interval_time
from sys.v_$database db,
sys.wrm$_snapshot r
where r.dbid = db.dbid
group by db.dbid
) s
where a.dbid = s.dbid
and c.dbid = a.dbid;

 

ASH                           SNAP                           RETENTION

—————————— —————————— ——————————

+000000875 22:22:41.045       +000000030 12:22:28.323       +00030 00:00:00.0

We want to keep 30 days worth of snapshots – we have set the retention period to that  – but we have 875 days worth of active session history.

I rather smugly said let’s show an example and then I produce  a schema with nearly 3 years worth of snapshots being maintained but there is no trickery involved. Looking around at other systems I can find a few similar examples and it is always down to the same issue. Let’s look further.

1
2
3
4
5
6
select table_name, count(*)
from dba_tab_partitions
where table_name like 'WRH$%'
and table_owner = 'SYS'
group by table_name
order by 1;

 

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             2
WRH$_DB_CACHE_ADVICE                    2
WRH$_DLM_MISC                           2
WRH$_EVENT_HISTOGRAM                    2
WRH$_FILESTATXS                         2
WRH$_INST_CACHE_TRANSFER                2
WRH$_INTERCONNECT_PINGS                 2
WRH$_LATCH                              2
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               2
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             2
WRH$_PARAMETER                          2
WRH$_ROWCACHE_SUMMARY                   2
WRH$_SEG_STAT                           2
WRH$_SERVICE_STAT                       2
WRH$_SERVICE_WAIT_CLASS                 2
WRH$_SGASTAT                            2
WRH$_SQLSTAT                            2
WRH$_SYSSTAT                            2
WRH$_SYSTEM_EVENT                       2
WRH$_SYS_TIME_MODEL                     2
WRH$_TABLESPACE_STAT                    2
WRH$_WAITSTAT                           2

24 rows selected.

There is the problem, all the WRH$ data is held in 2 partitions and the overnight house-keeping job does not have time to finish it’s tidy-up before it’s 15 minutes of fame is over again for another 24 hours.

Two components of Server Manageability (SM) components that reside in the SYSAUX tablespaces cause the problem. The components involved are the Automatic Workload Repository (AWR) and Optimizer Statistics History (OPTSTAT). Both of these components have retention periods associated with their data, and the MMON process should run nightly, as part of the scheduled maintenance tasks, to purge data that exceeds these retention periods. From version 11G onwards, the mmonpurging process has been constrained to a time-limited window for each of the purges, if this window is exceeded the purging stops and an ORA-12751 error is written to an m000 trace file.

For the AWR data, held in tables with names commencing with WRH$, the probable cause is due to fact that a number of the tables are partitioned. New partitions are created for these tables as part of the mmon process. Unfortunately, it seems that the partition splitting process is the final task in the purge process. As the later partitions are not split they end up containing more data. This results in partition pruning within the purge process becoming less effective.

For the OPTSTAT data, held in tables with names commencing with WRI$, the cause is more likely to be related to the volume of data held in the tables. WRI$ tables hold historical statistical data for all segments in the database for as long as specified by the stats history retention period. Thus, if there database contains a large number of tables with a long retention period – say 30 days, then the purge process will have an issue trying to purge all of the old statistics within the specified window.

I also think that two scenarios lead to this problem

1) SYSAUX running out of space, which I know has happened on at least one occasion and the volume of data left in a partition is too much to handle in a constrained time-window and so continues to grow in the same partition.

2) If the database is shut down over the period of the maintenance task and again the volume in the partition becomes too large to handle in a short-time.

 

The resolution to this problem is split into two stages.

Firstly a temporary manual fix is required to rectify the partition splitting failure. A job needs to scheduled on the database, to be run as the sys user which executes the following code:This job will force the splitting of partitions for the WRH$ tables, and should be scheduled to run on a daily basis until the number of partitions per table exceeds the AWR retention period.

1
2
3
begin
 execute immediate 'alter session set "_swrf_test_action" = 72';
 end;

What I do is drop the retention period to 8 days ( or maybe less) and then run this procedure once a day. It can take up to an hour but afterwards you can see the additional partitions have been created.

TABLE_NAME                       COUNT(*)
------------------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY             3
WRH$_DB_CACHE_ADVICE                    3
WRH$_DLM_MISC                           3
WRH$_EVENT_HISTOGRAM                    3
WRH$_FILESTATXS                         3
WRH$_INST_CACHE_TRANSFER                3
WRH$_INTERCONNECT_PINGS                 3
WRH$_LATCH                              3
WRH$_LATCH_CHILDREN                     2
WRH$_LATCH_MISSES_SUMMARY               3
WRH$_LATCH_PARENT                       2
WRH$_OSSTAT                             3
WRH$_PARAMETER                          3
WRH$_ROWCACHE_SUMMARY                   3
WRH$_SEG_STAT                           3
WRH$_SERVICE_STAT                       3
WRH$_SERVICE_WAIT_CLASS                 3
WRH$_SGASTAT                            3
WRH$_SQLSTAT                            3
WRH$_SYSSTAT                            3
WRH$_SYSTEM_EVENT                       3
WRH$_SYS_TIME_MODEL                     3
WRH$_TABLESPACE_STAT                    3


Once the correct number of partitions have been created the old data may be automatically purged by the mmon process. If this does not occur, it will be necessary to manually purge the data from the AWR tables.

Firstly turn off the production of snapshots for the duration of this activity:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
get the dbid for this database
 
select dbid from v$database;
 
<strong>-- get current snap interval (in minutes)</strong>
 
select extract( day from snap_interval) *24 *60
 
       + extract( hour from snap_interval) *60
 
       + extract( minute from snap_interval) snap_interval
 
       from wrm$_wr_control where dbid = ;
 
-- disable snaps by setting interval to 0
 
exec dbms_workload_repository.modify_snapshot_settings(interval=> 0, dbid => <dbid>)

Then determine the maximum snapshot to be purged from the database by querying the maximum snap-id outside of the retention period:

1
2
3
4
5
6
7
8
9
10
11
select max(snap_id) max_snap
 
from wrm$_snapshot
 
where begin_interval_time
 
< (sysdate - (select retention
 
from wrm$_wr_control
 
where dbid = ));

This will return the maximum snapshot to be purged

The AWR tables will not purge if baselines exist, so check and remove as required. The following should return only 1 row called SYSTEM_MOVING_WINDOW:

1
2
3
select baseline_name,creation_time
 
from dba_hist_baseline;

If other baselines are found they will need to be removed:

1
exec dbms_workload_repository.drop_baseline(baseline_name => <baseline>);

If we look at the current retention period in this code it is set to 15 days, so we set it to 8 days

1
2
3
4
select d.dbid,w.snap_interval,w.retention from DBA_HIST_WR_CONTROL w, v$database d where w.dbid = d.dbid;
 
Begin DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(11520,0); end;
begin</strong> DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(129600,0); <strong>end</strong>;

Then we purge the snapshots ( – in ranges if there a are a lot as in this example). This can take several hours

1
2
3
EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>100000, high_snap_id=>114159);
 
EXEC dbms_workload_repository.drop_snapshot_range(low_snap_id=>116261, high_snap_id=>116265);

Once all that is complete it is just a  matter of tidying up the indexes

ASH    SNAP    RETENTION —————————————————————————

+000000008 23:47:01.421    +000000008 23:46:55.865    +00008 00:00:00.0

Item                     Space Used (GB) Schema                   Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             93.70 SYS
1
select 'alter index '||segment_name||' rebuild online parallel (degree 4);' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type='INDEX' order by segment_name;

double check for unusable indexes

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
select 'alter index '||s.segment_name||' rebuild online parallel (degree 4);'
 
from dba_segments s,  dba_indexes i where s.tablespace_name= 'SYSAUX'
 
and s.segment_name like 'WRH$_%'
 
and s.segment_type='INDEX'
 
and i.status = 'UNUSABLE'
 
and i.index_name = s.segment_name
 
order by s.segment_name
 
&nbsp;
 
select 'alter table '||segment_name||' move tablespace sysaux ;' from dba_segments where tablespace_name= 'SYSAUX' and segment_name like 'WRH$_%' and segment_type = 'TABLE'   order by segment_name
Item                     Space Used (GB) Schema                  Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             10.05 SYS

 

I hope that has been useful and I would be pleased to hear if others see the same issues on some databases whereby the actual data retained is much lonher that the snapshot retention period would suggest

출처  : https://jhdba.wordpress.com/tag/purge-wrh-tables/



SYSAUX tablespace maintenance

By  | September 10, 2014

Based on Metalink 387914.1 entry:

Problem description

The SYSAUX tablespace in our IPCC environments are growing permanently. The analyse of the problem shows, that the AWR snapshots are deleted after retention period as designed. Problematic are the history tables, WRH$_ACTIVE_SESSION_HISTORY and WRH$_EVENT_HISTOGRAM, also other. The older snapshots history will be kept by oracle. The data will be stored in only one single partition. The partition will be split after the partition size achieved the threshold, what is never happened in our environment.

The reason of the problem described by oracle:

Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.

If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.

Solutions

The workaround solution suggested by oracle, not documented officially

A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.You will still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:
alter session set “_swrf_test_action” = 72;
to perform a single split of all the AWR partitions.

Test in our environment:

SELECT owner,
  segment_name,
  partition_name,
  segment_type,
  bytes/1024/1024/1024 Size_GB
FROM dba_segments
WHERE segment_name='WRH$_ACTIVE_SESSION_HISTORY';

SEGMENT_NAME                                                                    
------------------------------------------------------------
PARTITION_NAME                 SEGMENT_TYPE          SIZE_GB
------------------------------ ------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_3551044952_19447   TABLE PARTITION    .010742188
                                                                                
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_SES_MXDB_MXSN      TABLE PARTITION    6.1035E-05
                                                                                

2 rows selected.

The table has only two partitions: maxvalue and single data partition containing all the data.

alter session set “_swrf_test_action” = 72;
The execution can take few minutes. After the execution of the previous select, we can see, that one more partition has been created:

SEGMENT_NAME                                                                    
-------------
PARTITION_NAME                 SEGMENT_TYPE          SIZE_GB
--------------------           -------------         -------
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_3551044952_19447   TABLE PARTITION    .010742188
                                                                                
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_3551044952_25147   TABLE PARTITION    6.1035E-05
                                                                                
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_SES_MXDB_MXSN      TABLE PARTITION    6.1035E-05
                                                                                

3 rows selected.

Note that this command will split partitions for ALL partitioned AWR objects. It also initiates a single split; it does not need to be disabled and will need to be repeated if multiple splits are required.

After the partition split we can pray, that the partitions will be aged out and dropped by the nightly job.

The workaround solution suggested by oracle, documented officially

We can realize, that in the history tables are much more older snapshots that in snapshot tables. In the actual tables:

select  count(*), max(snap_id), min(snap_id) from DBA_HIST_SNAPSHOT  order by end_interval_time asc;


  COUNT(*) MAX(SNAP_ID) MIN(SNAP_ID)
---------- ------------ ------------
       180        25144        24965

In the history tables:

select count(distinct(snap_id)), max(snap_id), min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY  ;

COUNT(DISTINCT(SNAP_ID)) MAX(SNAP_ID) MIN(SNAP_ID)
------------------------ ------------ ------------
                    5674        25144        19447
1 row selected.

To get read of the old entries we use DBMS Package:

begin
  DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE( 
  low_snap_id => 19447, 
  high_snap_id => 24964 
 );
end;
/

After ca. 32 Minutes processing, the result is:

select count(distinct(snap_id)), max(snap_id), min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY  ;

COUNT(DISTINCT(SNAP_ID)) MAX(SNAP_ID) MIN(SNAP_ID)
------------------------ ------------ ------------
                     180        25145        24965
1 row selected.

For this solution we have to know:

* The old entries will be deleted in ALL AWR tables.
* The space will be not freed because of the High Water Mark
* The growth of the tablespace will be stopped for a while

The oracle permanent bugfix, documented officially

The problem has been reported as a bug in Oracle Metalink. And there is a interim patch for a Linux Oracle version.
The patch number is 14084247 and can be found in Metalink.
The test follows.

Results

# The both options have to be repeated after the reasonable period.
# They can be combined, but haven’t to

The workaround with splitting partition

The workaround with splitting partition works fine. The partitions will be dropped in the weekend maintenance window on friday in the evening.
The test databases results look like follows:

Y database:

THEDATEGBSIZEPREV_GBSIZEDIFF
02.09.2014
1,86
03.09.2014
1,86
1,86
0
04.09.2014
1,87
1,86
0,01
05.09.2014
0,5
1,87
-1,37
06.09.2014
0,5
0,5
0
07.09.2014
0,51
0,5
0,01
08.09.2014
0,52
0,51
0,01
09.09.2014
0,52
0,52
0

Database X:

THEDATEGBSIZEPREV_GBSIZEDIFF
02.09.2014
1,91
03.09.2014
1,91
1,91
0
04.09.2014
1,92
1,91
0,01
05.09.2014
0,39
1,92
-1,53
06.09.2014
0,4
0,39
0,01
07.09.2014
0,41
0,4
0,01
08.09.2014
0,42
0,41
0,01
09.09.2014
0,42
0,42
0

Workaround with deleting data

After the stored procedure has been started for a while the SYSAUX tablespace doesn’t grow. It was tested on Y database:

THEDATEGBSIZEPREV_GBSIZEDIFF
01.09.2014
1,99
02.09.2014
1,99
1,99
0
03.09.2014
2,01
1,99
0,02
04.09.2014
2,01
2,01
0
05.09.2014
2,01
2,01
0
06.09.2014
2,01
2,01
0
07.09.2014
2,01
2,01
0
08.09.2014
2,01
2,01
0

Permanent Patch

The test results after the weekend maintenance window will come.

출처 : http://deputydba.kosherdev.com/?p=9




반응형
댓글
반응형
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함