Pages

Monday, April 25, 2011

TSM SQL

Tivoli Storage Manager

Tivoli is part of IBM, and they provide a suite of management products for distributed systems. TSM used to be called ADSM and works on the 'incremental backup forever' principle, which takes a bit of getting used to, compared to the more traditional 'weekly full backup and daily incremental' system. With TSM, you backup all your clients to one central data backup server. This makes administration a lot easier, but means you have to limit the amount of data you send over the network. That's why TSM needs to be incremental. A central database records all backup information, and can be used to recreate a whole server if necessary. TSM always keeps at least one backup of every on-line file.

TSM excels at 3 functions
1. File restores
2. Disaster recovery planning
3. Archiving

As TSM works on the 'incremental forever' principle, it is not as fast at recovering whole servers or disks as a local tape backup would be. How important is this?
Even as recent as 5 years ago, disk failures, or full server failures were quite common, and file restores quite rare.
Today, disks are very reliable, and if configured as RAID, hardly ever fail. However, as more and more critical applications run from file servers, file restore requests are now very common. Typically, you can expect to restore several files a day, and a disk once a year if you are unlucky.
Products which backup each server individually to a local tape will recover a full server faster than TSM, but file recoveries will take longer, and take a lot more effort.
This pageset looks at a few hints and tips, wormholes through the black hole of TSM performance, and looks at new features available in later releases of TSM
1. TSM Tips
2. TSM Performance Tuning
3. TSM tape management
4. New features in TSM 5.3 and 5.2
5. TSM SHOW commands
6. TSM TDP for MSSQL databases
It is possible to get a qualification that demonstrates your TSM skills. Quoting IBM, 'An IBM Certified Deployment Professional - Tivoli Storage Manager V5.3 is an individual who has demonstrated the ability to implement and support IBM Tivoli Storage Manager. It is expected that this person is able to perform the following tasks independently a majority of the time, and in some situations, take leadership and provide mentoring to peers. It is expected that this person will be able to perform these tasks with limited assistance from peers, product documentation and vendor support services.'
It is often a challenge to find the IBM TSM manuals. The best place these days seems to be found here http://publib.boulder.ibm.com/infocenter/tivihelp/v1r1/index.jsp
Use the left hand frame to navigate.

TSM SQL Hints
SQL tips and syntax
Listing the TSM tables, some special tables
SQL output formatting
Using Indexed Columns to speed up queries
TSM SQL date formats
How to combine two TSM tables in one query
How many client nodes are registered by domain?
How many client nodes are registered by platform?
Query all tapes for a node
What tapes were used today?
Library inventory
Which volume has my file
List all volumes that are not in READWRITE status
How many scratch tapes are there?
How many tapes can I reclaim by changing the reclamation threshold?
Which tape is in a slot?
How many tapes are used by each node?
How many active files are there?
What's happened in the last hour?
What happened yesterday?
How long did each of last night's backups run for?
How much data did each node backup yesterday?
How much data is stored for each filespace?
How did last nights backups go?
Produce a list of restores, how much data they processed, and their status
How many volumes does a storage group use?
Query the number of Volumes in use and available by Device Class
How many volumes are going offsite?
The following scripts were donated by Ron Delaware of IBM
Total data stored and tapes used per node in all storage pools
Total data stored and tapes used per node
Display the number of nodes on each tape
A script to unique node names on a tape
A script to move data from low utilised tapes
Offsite tapes needed to restore a node
________________________________________

SQL tips and syntax
TSM versions before 6.1 used a sligntly non-standard SQL sysntax, as the database was not really relational. The new database is DB2 and uses standard SQL. All queries have been tested using version 6.1 and I've indicated any differences in syntax. TSM 6.1 forces you to become command proficient, and often a simple query command will produce the same result as an SQL query. However you get much more control over the exact format of your results using SQL queries, so they are still useful.
There are a couple of major issues with TSM 6.1 - these may be fixed in 6.2
If you are running TSM 6.1 and you use ORDER BY against the SCRIPTS, CLIENTOPS or PATHS tables your query will fail with RC=11

Some table joins that ought to work will fail with 'No match found using this criteria'. An example is a join between the volumeusage and the auditocc tables. This IBM technote explains the details.

The basic SQL syntax is
SELECT ALL | DISTINCT |columnname(,columnname)
FROM tablename(, tablename)
WHERE 'column selection conditions'
GROUP BY columnname
HAVING 'row selection conditions'
ORDER BY columnname ASC| DESC
The standard SQL operator, '=', is case sensitive when comparing strings. You can use SELECT DISTINCT, to eliminate duplicate records, and get data from more than one table using the JOIN operator. You can also combine WHERE statements using the AND operator.
The first thing you need, is to know what TSM tables are available to query. These are described in the next section. After that, the best way to learn SQL is to try it out. It's a read only query language! The examples below might help explain what the syntax means, otherwise, try the following sites.

www.sqlcourse.com
www.sqlcourse2.com
www.dcs.napier.ac.uk/~andrew/sql

Prior to vesion 6.1 the TSM SQL query command did not support the full SQL language syntax. The following operations did not work:
• UNION
• INTERSECT
• EXCEPT
• subqueries that return multiple values
• You cannot use a semicolon as a command terminator

You can do maths in SQL statements, for example
SELECT AVG (total_mb/1024) -
AS "Average Total GB" -
FROM auditocc

You can select several columns, or items from a table by separating them with commas, like
select platform_name,count from nodes
or you can join items together either of the two examples below will work.
select concat(FILESPACE_NAME,HL_NAME, LL_NAME) from backups
select filespace_name || hl_name || ll_name from backups

You can combine two tables together and select columns from each like this
SELECT nodes.domain_name,summary.activity FROM nodes, summary
or you can simplify the expression by giving the tables an alias
SELECT nn.domain_name,ss.activity FROM nodes nn, summary ss

Note that the aliases have two characters. For some reason TSM does not always like a single character. It seems to really object if you abbreviate 'summary' to 's'.
If you invoke SQL from a script then it may ask for confirmation to proceed, for example if it may check that you are happy to process a lot of output. You can suppress the confirmation messages with the option -noconfirm

You could do fuzzy comparisons with TSM 5.5 using the LIKE parameter. This has been replaced by IN with version 6.1, IBM Technote 1380830 explains the issue.
________________________________________
What are the TSM tables?
To find out what TSM tables exist and what they contain, run the following queries
select * from syscat.tables
or to just list all tables try select distinct(tabname) from syscat.tables
select * from syscat.columns
select * from syscat.enumtypes

This sounds good, but if you run these under TSM version 6.1 several tables are missing from the SYSTABLES list. Queries against these tables still work and APAR IC60909 may fix the issue.
The SUMMARY table contains a lot of useful entries for general statistics. A couple of fields are SUMMARY.ACTIVITY AND SUMMARY.SUCCESFUL. The activity field currently contains; 'TAPE MOUNT', 'FULL_DBBACKUP', 'INCR_DBBACKUP', 'EXPIRATION', 'RECLAMATION', 'MIGRATION', 'MOVE DATA', 'STGPOOL BACKUP', 'BACKUP', 'RESTORE', 'ARCHIVE' and 'RETRIEVE'. The successful field can be 'YES' or 'NO'. However you cannot rely on the summary table to report on the success of client events like backup and restore as it just reports on progress so far.
For TSM 5.5, The DATE field in the EVENTS table does not support expressions like 'scheduled_start >= current_timestamp-24 hours'. If you issues this query at 14:00 it will return all events that started after midnight today, but not those between 14:00 and midnight yesterday. You can get the correct results by combining the relative time stamp with a constant timestamp, for example if yesterday was the 25th March then this will work.
scheduled_start > '2006-03-25' and scheduled_start >= current_timestamp-24 hours
This is no good if you want to schedule a query, but in that case you can simply use a very early fixed timestamp date, for example
select node_name, schedule_name, scheduled_start, status -
from events -
where scheduled_start >= '1900-01-01' and -
scheduled_start >= current_timestamp - 24 hours
The date field works fine in TSM 6.1 so there is no need for this convoluted code.

________________________________________
SQL output formatting
SETSQLDISPLAYMODE does not seem to work with TSM 6.1
If you enter SQL queries from the command line in the browser, you get the results in tabular format. It is possible to execute SQL from a host command line, and then you can pipe the command to a file and get the results in comma delimited format for importing to an Excel spreadsheet or similar.

The command is
dsmadmc -id=adminid -password=adminpassword -commadelimited
'select etc ' > filename
Command output direction can be a bit complicated as it works differently for different operating systems. In general the '>' symbol will direct output to a file, but it is also a valid SQL mathematical operator. If the '>' symbol has spaces on both sides of it it will be considered as output redirection. If it has no space on either side, it will be considered as mathematical greater than.

So for example
select * from summary > summary.txt will direct lots of output text to a summary file, while
select * from summary where date>current_timestamp - 24 hours will look for events that happened today. Of course you can combine these as
select * from summary where date>current_timestamp - 24 hours > summary.out
If you run these commands in batch, the operating system might try to interpret the redirection command as greater than even if it is surrounded by spaces. In UNIX and LINUX you can put a slash before the command /> but the easier way is to put the whole command in quotes. "select * from summary where date>current_timestamp - 24 hours > summary.out"

________________________________________
Using Indexed Columns to speed up queries
Indexed columns does not apply to TSM release 6.1 or higher
TSM SQL queries can run for a long time, and use up a lot of resource. This is usually because you are searching the whole database to get the data you want. You can reduce the amount of database searching by selecting specific data from an indexed column using a WHERE statement. To find out which columns are indexed, use the query
select * from syscat.columns
A partial result looks like
TABSCHEMA: ADSM
TABNAME: MEDIA
COLNAME: LRD
COLNO: 9
INDEX_KEYSEQ:
INDEX_ORDER:
TYPENAME: TIMESTAMP
LENGTH: 0
SCALE: 0
NULLS: TRUE
REMARKS: Last Reference Date

TABSCHEMA: ADSM
TABNAME: MGMTCLASSES
COLNAME: DOMAIN_NAME
COLNO: 1
INDEX_KEYSEQ: 1
INDEX_ORDER: A
TYPENAME: VARCHAR
LENGTH: 30
SCALE: 0
NULLS: FALSE
REMARKS: Policy Domain Name
This tells you that the DOMAIN_NAME column in the MGMTCLASSES table is indexed, but the LRD column in the MEDIA table is not. So if you run a query like
SELECT * FROM MGMTCLASSES -
WHERE DOMAIN_NAME = 'DO_TDP'
Then you can expect your query to be quite fast.
________________________________________
TSM SQL date formats
The timestamp format is:
'yyyy-mm-dd hh:mm:ss.nnnnnn'

yyyy = year
mm = month
dd = day
hh = hours
mm = minutes
ss = seconds
nnnnnn = fraction of a second
'ss' and 'nnnnnn' are both optional. When referring to a timestamp, put it in single quotes, for example to select records that started after 12:00 on July 21st you would specify start_time = '2010-07-21'. Start_time needs to be a valid date type column name in your table, for example scheduled_start in the EVENTS table. .
If you just want records that started after 21:00, you would add time(start_time) <= '21:00:00'. for example
select date(scheduled_start) as Date, -
time(scheduled_start) as Time, nodename -
from events, -
where node_name=your_node_name -
and date(scheduled_start)='2010-07-21'
________________________________________
How to combine two TSM tables in one query
It is possible to combine two TSM tables in one query, but be aware that the TSM database in version 5.5 and lower was not really relational, so table joins take ages and use a lot of resource. It may be faster to do two queries, copy out the results then combine the data with an external program. The process seems much faster with TSM 6.1, but many of these joins do not work. For example a join between the volumeusage and the auditocc tables does not work with TSM 6.1. APAR IC63474 and IBM technote 1403575 describe the problem. It may be fixed in TSM 6.2

The key to database joins is
• Both tables must have one column that contains the same data
• It it easier if you give each table an alias name for reference purposes using 'table name alias' in the FROM statement
• You select the common column in both tables using aliasname.table name
• You then join the data with WHERE alias1.column2=alias2.column2
For example
SELECT pct_utilized, node_name, -
vm.volume_name, vu.volume_name -
FROM volumes vm,volumeusage vu -
WHERE node_name='NODE01' -
AND vm.volume_name=vu.volume_name
This combines the percent volume utilised column from the volumes table with the nodename column in the volumeusage table, combined with the volume column from each. Be aware that this is a really CPU intensive query.
Be careful to make the joins specific or you could end up joining every record in table A with every record in table B. If both these tables hold just 300 records, the result would be 90,000 records and will probably run out of memory. Always include a WHERE clause to limit the records down.
________________________________________
How many client nodes are registered by domain

Works with TSM 6.1
select domain_name,num_nodes -
from domains
Result -
DOMAIN_NAME NUM_NODES
------------------ ---------
DO_AIX 39
DO_HOLAN 61
DO_HOSAN 2
DO_LOTUSNOTES 34
DO_TDP 32
DO_TSMSERV 4
DO_UDB 7
DO_WINNT 126
STANDARD 0

________________________________________
How many client nodes are registered by platform?
Works with TSM 6.1
select platform_name,count(*) as "Number of Nodes" -
from nodes -
group by platform_name
Result -
PLATFORM_NAME Number of Nodes
---------------- ---------------
16
AIX 57
AIX-RS/6000 4
DB2 2
Mac 2
NetWare 59
OS/2 1
SUN SOLARIS 6
TDP Domino 3
TDP Domino NT 2
TDP Oracle AIX 6
WinNT 147
________________________________________
Query all tapes for a node
Works with TSM 6.1
How do I find all the tape volsers associated with a specific node?

select distinct node_name,volume_name,stgpool_name -
from volumeusage -
where node_name='xxxxx'
________________________________________
What tapes were used today?
Works with TSM 6.1
How do you find out what tapes were used on a specific day.
select volume_name,last_write_date - from volumes -
order by last_write_date
________________________________________
Library inventory
Works with TSM 6.1
How can I display an inventory of my library in order of slot number
select home_element, volume_name -
from libvolumes - order by home_element

________________________________________
Which volume has my file
This search took too long with TSM 6.1 and was cancelled
How can I find out which volume contains a specific file?
select volume_name,node_name,filespace_name,file_name -
from contents - where node_name='nodename' -
and filespace_name='filespace' -
and file_name='filename'

________________________________________
List all volumes that are not in READWRITE status
Works with TSM 6.1
select VOLUME_NAME,ACCESS from volumes where access !='READWRITE'
Result -
VOLUME_NAME ACCESS
------------------ ----------
QZ1039 READONLY
QZ1170 READONLY

________________________________________
How many scratch tapes are there?
Works with TSM 6.1
How do I tell how many scratch tapes we have?
select count(*) as Scratch_count - from libvolumes -
where status='Scratch'
If you have more than 1 library, you can find all your scratch tapes using the query
select LIBRARY_NAME,count(*) as 'scratches' from libvolumes where -
status='SCRATCH' group by LIBRARY_NAME
Thanks to Sven Neirynck of Compu-mark for that tip
TSM has a MAXSCRATCH parameter which is set independently for each storage pool. This defines the maximum number of tapes that each tape pool can contain. The following query will display how close each pool is to its limit.
SELECT STGPOOLS.STGPOOL_NAME, STGPOOLS.MAXSCRATCH, -
Count(STGPOOLS.MAXSCRATCH) as "Allocated_SCRATCH", -
STGPOOLS.MAXSCRATCH-count(STGPOOLS.MAXSCRATCH) as "Remaining_SCRATCH" -
FROM STGPOOLS,VOLUMES -
WHERE (VOLUMES.STGPOOL_NAME = STGPOOLS.STGPOOL_NAME) -
AND ((STGPOOLS.DEVCLASS="3590_CLASS")) -
GROUP BY STGPOOLS.STGPOOL_NAME, STGPOOLS.MAXSCRATCH
Typical output looks like
STGPOOL_NAME MAXSCRATCH Allocated_SCRATCH Remaining_SCRATCH
------------------ ----------- ----------------- -----------------
ARCHTAPEPOOL 100 5 95
CARTPOOL 1340 932 408
VIRTCARTPOOL 200 13 187

________________________________________
How many tapes can I reclaim by changing the reclamation threshold?
Works with TSM 6.1
select count(*)from volumes -
where stgpool_name='poolname' -
and upper(status)='FULL' -
and pct_utilized 1)
________________________________________
What happened yesterday?
How do I get the prior day events
select * -
from events -
where days(current_timestamp)-days(scheduled_start)>1
A simple query command will produce the same result.
Q EV * * BEGIND=TODAY-1
________________________________________
How long did each of last night's backups run for?
This query was donated by Francois Swanepoel of South Africa and it gives you the running time of your backups. It uses a built in DB2 function timestampdiff, so it will only work with TSM versions 6.1 and upwards
select entity as Node_name, -
cast(sum(bytes/1024/1024) as decimal (10,3)) as Total_MB, -
substr(cast(min(start_time) as char(26)),1,19) as Date_Time, -
(timestampdiff(4,char(max(end_time)-min(start_time))))as Length_min -
from summary -
where start_time>=current_timestamp - 24 hours and activity='BACKUP' -
group by entity
Typical output looks like -
NODE_NAME TOTAL_MB DATE_TIME LENGTH_MIN
TSM_P11904DB2P07 92.000 2010-08-20-00.30.15 0
TSM_P18413PRW321 756.000 2010-08-20-01.00.19 0
TSM_P18413PRW321_ORA 43854.000 2010-08-20-03.01.45 61
TSM_P18413PRW321_P2_BASPC75 5304.000 2010-08-20-01.00.14 4
TSM_P18414PRW323 1026.000 2010-08-20-01.00.29 0
TSM_P18414PRW323_ORA 43282.000 2010-08-20-03.35.38 33
TSM_P18414PRW323_P2_BASPC76 88319.000 2010-08-20-01.00.21 114
________________________________________
What was the times and data transferred for yesterday?
This select query will get the amount of data backed up in the previous 24 hours, along with the start and end times of the backups. This query is TSM 6.1 compatible.
SELECT entity AS "Node name", -
CAST(sum(bytes/1024/1024) AS decimal(8,2)) AS "MB xfer", -
SUBSTR (CAST(min (start_time) AS char(29)),1,10) AS "start date", -
SUBSTR (CAST(min (start_time) AS char(29)),12,8) AS "start time", -
SUBSTR (CAST(max (end_time) AS char(29)),1,10) AS "end date", -
SUBSTR (CAST(max (end_time) AS char(29)),12,8) AS "end time" -
FROM summary -
WHERE activity='BACKUP' AND start_time>=current_timestamp - 24 hours -
GROUP BY entity
_______________________________________
How much data is stored for each filespace?
This easiest way to get this information is to use the 'query occupancy' command, which will return output like
Node Name Type Filespace FSID Storage Number of Physical Logical
Name Pool Name Files Space Space
Occupied Occupied
(MB) (MB)
---------- ---- ---------- ----- ---------- --------- --------- ---------
SRFFCQ04 Bkup SRFFCQ04- 1 CARTPOOL- 20 0.02 0.02
SYS:
SRFFCQ04 Bkup SRFFCQ04- 3 CARTPOOL- 3,631,963 169,579.1 169,548.4
CS04: 3 7

or you could try an SQL query like
SELECT node_name,filespace_name, -
physical_mb,stgpool_name -
FROM occupancy -
and optionally
WHERE node_name='nodename' -
AND type='Bkup'

The output looks like
NODE_NAME FILESPACE_NAME PHYSICAL_MB STGPOOL_NAME
--------- --------------- ------------ -------------
node01 node01BS01: 367781.61 CARTPOOL
node01 node01BS01: 0.64 DISKPOOL

________________________________________
How much data is stored for each node by copy type?
This query assumes that you have three types of storage pools, one called with 'TAPEPOOL' which is used for standard backups, one called 'ARCHPOOL' which is used for Archives and one called 'TDPPOOL' which is used for SQL TDP backups. This query will summarise the amount of backup space used by each node in each pool. This is not the same as the query above, which reports on space usage by filespace.
select node_name as NODENAME, -
sum(case when substr(stgpool_name,1,3) in ('ARC') -
then logical_mb else 0 end) as ARC_OCC, -
sum(case when substr(stgpool_name,1,3) in ('TAP') -
then logical_mb else 0 end) as BCK_OCC, -
sum(case when substr(stgpool_name,1,3) in ('TDP') -
then logical_mb else 0 end) as TDP_OCC -
from occupancy group by node_name
The output looks like
NODENAME: ARC_OCC: BCK_OCC: TDP_OCC:
SRVGLUAY 339.28 5523.63 1988.79
The following query is more general and should work for any site.
SELECT NODE_NAME,TYPE,SUM(LOGICAL_MB) as MBytes -
FROM OCCUPANCY -
GOUP BY NODE_NAME,TYPE
________________________________________
How did last nights backups go?
You can use query event to get this information but the SQL query below will provide more detail. It can run for some time.
select Entity,Successful,Bytes,Examined,Affected,Failed -
from summary -
where activity='BACKUP' -
and cast((current_timestamp-start_time)hours -
as decimal(8,0)) 0
at the end of the 'where' statement
If you want to know all the amount of data processed by all events, try the query below.
select nodes.domain_name,summary.activity, -
sum(cast(summary.bytes/1024/1024/1024 as decimal(6,2))) as GB -
from nodes, summary -
where (end_time between current_timestamp - 24 hours and current_timestamp) -
and (activity='BACKUP' or activity='RESTORE' -
or activity='ARCHIVE' or activity='RETRIEVE') -
and ((nodes.node_name=summary.entity)) -
group by domain_name,summary.activity -
order by activity,domain_name asc

TSM 6.1 has issues with this query; it does not seem to like the nodes table. James R Deyo suggested it be changed to use the "nodesview" table and that works fine.
select nodesview.domain_name,summary.activity, -
sum(cast(summary.bytes/1024/1024/1024 as decimal(6,2))) as GB -
from nodesview, summary -
where (end_time between current_timestamp-24 hours and current_timestamp) -
and (activity='BACKUP' or activity='RESTORE' -
or activity='ARCHIVE' or activity='RETRIEVE') -
and ((nodesview.node_name=summary.entity)) -
group by nodesview.domain_name,summary.activity -
order by summary.activity,nodesview.domain_name asc
Typical ouput looks like
DOMAIN_NAME ACTIVITY GB
------------------ ------------------ -----------
DO_AIX ARCHIVE 0.14
DO_AIX BACKUP 49.51
DO_HOLAN BACKUP 81.69
DO_LOTUSNOTES BACKUP 145.05
DO_TDP BACKUP 507.57
DO_UDB BACKUP 0.97
DO_WINNT BACKUP 127.43
DO_HOLAN RESTORE 0.02
DO_LOTUSNOTES RESTORE 0.20
DO_TDP RESTORE 225.53

_______________________________________
Produce a list of restores, how much data they processed, and their status
This query works fine in TSM 6.1 and is -
SELECT entity,start_time,end_time,bytes FROM SUMMARY WHERE ACTIVITY='RESTORE'
This query could take a while and produces a lot of output. Sample output for one restore is
ENTITY START_TIME END_TIME BYTES
-------------- -------------------------- -------------------------- ---------
TDPOCL_UX04PRD 2004-05-06 00:02:36.000000 2004-05-06 00:03:13.000000 514654219

________________________________________
How many volumes does a storage group use?
How can you can determine how many volumes are used by each storage group? - works fine in TSM 6.1
select stgpool_name,count(*) as count -
from volumes -
group by stgpool_name'
________________________________________
Query the number of Volumes in use, and available by Device Class
This query, which works with 6.1, will find every storage pool that has a device class of 3590_class, and return the storage pool name, the maxscratch value for the storage pool and how many volumes are in that pool. Note that the 3590_class is site specific, you may have different tape device classes at your site so substitute your own value.
SELECT a.stgpool_name,a.maxscratch,count(*) AS Volumes -
FROM stgpools a, volumes b -
WHERE a.stgpool_name = b.stgpool_name and a.devclass = '3590_CLASS' -
GROUP BY a.stgpool_name,a.maxscratch
Typical output look like
STGPOOL_NAME MAXSCRATCH VOLUMES
------------------ ----------- -----------
ARCHTAPEPOOL 100 3
CARTPOOL 1500 1119
VIRTCARTPOOL 200 9
________________________________________
How many volumes are going offsite?
How can I can tell how which tapes are offsite?

SELECT volume_name,stgpool_name,access -
FROM volumes -
WHERE (stgpool_name='offsite_pool_name') -
AND (access='offsite')

________________________________________
Total data stored and tapes used per node in all storage pools
This select uses a join between the volumeusage and the auditocc tables and does not work with TSM 6.1. APAR IC63474 and IBM technote 1403575 describe the problem. It may be fixed in TSM 6.2
This select will show NODE_NAME, TOTAL_MB that is amount of data stored in TSM for this node, TAPES that is the amount of tapes that contain data of this node (of any storage pool), and AVG MB/tape that is the average of MB by tape (TOTAL_MB divided by number of tapes with node data). It's sorted by worst data distribution. This query will pickup data stored on any storage pool, including data on a disk pool pending migration. That can skew the results.
select vu.node_name, ao.total_mb, count(distinct vu.volume_name) -
as tapes, ao.total_mb/count(distinct vu.volume_name) -
as "AVG MB/tape" from volumeusage vu, auditocc ao -
where vu.node_name=ao.node_name -
group by vu.node_name, ao.total_mb order by 4
Typical output
NODE_NAME TOTAL_MB TAPES AVG MB/tape
--------------- --------- --------- -----------
DEC_XL34RT2B 3394 207 16
XLF3LV02 88796 2 44398
XLFFAF01 51080 1 51080
XLF3AF02 544846 9 60538
________________________________________
Total data stored and tapes used per node in one storage pool
This select uses a join between the volumeusage and the auditocc tables and does not work with TSM 6.1. APAR IC63474 and IBM technote 1403575 describe the problem. It may be fixed in TSM 6.2
This select will show NODE_NAME, TOTAL_MB that is amount of data stored in TSM for this node, TAPES that is the amount of tapes that contain data of this node located in the specified storage pool, and AVG MB/tape that is the average of MB by tape (TOTAL_MB divided by number of tapes with node data). It's sorted by worst data distribution.
select vu.node_name, ao.total_mb, count(distinct vu.volume_name) -
as tapes, ao.total_mb/count(distinct vu.volume_name) -
as "AVG MB/tape" from volumeusage vu, auditocc ao -
where vu.stgpool_name='YOUR_POOL_NAME' and vu.node_name=ao.node_name -
group by vu.node_name, ao.total_mb order by 4
Typical output
NODE_NAME TOTAL_MB TAPES AVG MB/tape
------------------ ----------- ------- -----------
DEC_XL34RT2B 0 1 0
NODEL81 92 1 92
NODEL265 294 1 294
XLF3AF03 119524 2 59762
XLF3AF02 544846 9 60538

________________________________________
Display the number of nodes on each tape
This select will show how many nodes a tape contains, sorted by tapes with higher number of nodes. It works with TSM 6.1 but takes a while to run.

select volume_name, stgpool_name, -
count(distinct node_name) as Nodes -
from volumeusage -
group by volume_name, stgpool_name -
order by 3 desc
Typical output
VOLUME_NAME STGPOOL_NAME NODES
------------------ ------------------ -----------
DZ2070 ARCHIVEPOOL 11
DZ1426 ARCHIVEPOOL 9
DZ1776 CARTPOOL 1
DZ1778 CARTPOOL 1


________________________________________
A query to display the names of the nodes with data on a tape
This select statement will display unique node names located on tape and works with TSM 6.1
select distinct node_name from volumeusage -
where volume_name='DZ1778'

typical output

NODE_NAME
----------
XLF3AF02

________________________________________
A script to move data from low utilised tapes which works fine with TSM 6.1
This select statement will create a script that will move data from low utilized tapes. This process is used to supplement Reclamation as it does not look at expired data and it is multi-streaming. The percent utilized is adjustable.

select 'move data ',volume_name, ' wait=yes', status -
from volumes where stgpool_name='pool_name' -
and pct_utilized>0 and pct_utilized file.name
after the command
i.e.
q volume > /tmp/tapevols

in your script the output will be saved in the tapevols file.
Under UNIX, you can't do this from within TSM, you have to run the command externally as dsmc -pa=xxxx q volume > /filename.
If you want to use '> ' as comparator symbol don't put spaces around it, then it won't be considered a redirection command. i.e. a> 2 will report on values of 'a' greater than 2. a > 2 will try to redirect the output from the 'a' command to a file called 2, and will fail with a syntax error.
Redirecting output works with Netware depends on the TSM client level, as clients at TSM 5.3.2.0 client or higher use a dynamic C library, whereas older clients utilised a static SMS library named SMSUT.lib. Novell no longer supports this static library, which required TSM to switch to the dynamic C library at version 5.3.2
For 5.3.2.0 and higher TSM clients use the standard re-direction method;

q backup data:/accounts/ > output.txt
For 5.3.1.x and lower TSM clients use the re-direction method;
q backup data:/accounts/ (CLIB_OPT)/>output.txt

________________________________________
Suppressing TSM information on queries
If you want to run a query then feed it into a program or script, then you do not want all the TSM/IBM copyright information, the date and time. the session established message, and all the other information that comes at the start of a session. If your client is running TSM 5.2 or higher, you can suppress these messages by launching the admin command line interface with the new -DATAOnly=Yes option. You do not need a 5.2 server for this. This option will not suppress error messages.

________________________________________
Controlling the amount of data going to the TSM scheduling and error logs
The DMSMSCHED.log and the DSMERROR.log are usually the first point of call when investigating problems. They are usually found in the CLIENT/BA/ or BACLIENT/ directory. TSM will update both these files every time it runs a scheduled backup and will record every backed up file. The problem is that if they are not controlled, the logs will quickly become too big to manage.
You have two parameters in your dsm.opt file that control the data held in these files, schedlogretention and errorlogretention. The default values are schedlogretention N and errorlogretention N, which means never prune the logs. Other options are
ERRORLOGRetention 7 D
which means keep the errors for 7 days then discard them, or

ERRORLOGRetention 7 S
which means after 7 days move the data to DSMERLOG.PRU. The schedlog retention syntax is the same. You can select how many days you want to keep your logs for. You can also add a QUIET parameter in your DMS.OPT file, which will suppress most of the messages, but this is not recommended as you lose most of your audit trail.
A further pair of parameters were introduced With the 5.3 baclient,
schedlogmax nn
errorlogmax nn
This parameters cause the logs to wrap, so when they reach the end of the file, they start to overwrite the data at the beginning. The end of the current data is indicated by an 'END OF DATA' record. The nn value is the maximum size of the log in megabytes, with a range from 0 to 2047. 0 is the default and means do not wrap the log.

________________________________________
Querying data transfer rates
how do you find out yesterdays data transfer rate for client sessions?
Use the command -
q ac begind=today-2 begint=23:59 endd=today-1
endt=23:59 originator=client

obviously, you can manipulate the relative dates to get data from other days

________________________________________
Encrypting the backup data
It is generally considered that TSM backup data is secure, as it cannot be read without a copy of the database. However if you have a legal requirement for full data encryption then standard DES 56-bit encryption is available. When you turn on encryption, you will be prompted to create a unique key. Without this key, you won't be able to restore your data. It is very important that you keep a copy of this key someplace other than the computer that is being backed up. If you forgot the encryption key, then the data cannot be restored or retrieved under any circumstances.
To enable encryption you add an encryptkey parameter to the dsm.opt file on the client, and add include.encrypt and exclude.encrypt statements as required. TSM will not encrypt anything unless at least one include.encrypt statement is present.
The encryptkey options are -
encryptkey prompt
With the prompt option you should see the following message every time you run a backup
User action is required, file requires an encryption key
And you need to provide password twice
encryptkey save
You should only see the password required prompt once, and the password is saved in the tsm.pwd file
The manual states that with the encryptkey option set to Prompt you will be prompted for the password on every backup or restore, but it in my experience it appears that TSM stores this password on the local client, probably in a file called tivinv/tsmbacnw.sig so you can then run further backups or restores from that client without having to specify a password and TSM encrypts or decrypts the data as required. If you delete that sig file, or presumably try to recover to a different client, then you will be given a selection screen with the following options
1 Prompt for password
2 Skip this file
3 Skip all encrypted files
4 Abort the restore
Taking option 1, you will be prompted for the encryption password twice, then the restore runs as normal.

________________________________________
Specifying 2 or more servers from 1 client
The first question is, why would you want to run several servers in the same client? The answer is that you typically do this when you want to handle different parts of you client data differently, maybe for databases or maybe for shared resources in client clusters. In this case you would define some virtual TSM servers in the same dsm.sys file.
On a Windows client, if you want to define TSM servers, and you want to be able to specify either server from a TSM client, add the following lines to the dsm.opt file
servername s2 tcpserveraddress ip_address or domain_name

you then have a 'primary' server which you pick up by default, and you can invoke the secondary server using
dsmadmc -se=s2 etc
On an AIX client you would define a different dsm.opt file for each server, For example, suppose you want a basic client to backup the rootvg, an oracle client for database backups, and an HACMP client for the non-database data on the shared resource. You need three opt files which for example could be defined like this
dsm.opt (basic client)
servername base_server

dsm_Oracle.opt
servername oracle_server

dsm_HACMP.opt
servername hacmp_server
Then in your dsm.sys file you would code
servername base_server
tcpserveraddress tsm_server
schedlogname tsm_base_server_schedlog
errorlogname tsm_base_server_errorlog
nodename clientname_primary_node
.. more parameters

servername oracle_server
tcpserveraddress tsm_server
schedlogname tsm_oracle_server_schedlog
errorlogname tsm_oracle_server_errorlog
nodename clientname_oracle_node
.. more parameters

servername hacmp_server
tcpserveraddress tsm_server
schedlogname tsm_hacmp_server_schedlog
errorlogname tsm_hacmp_server_errorlog
nodename clientname_hacmp_node
.. more parameters
Note that the tcpserveraddress is the same for each 'server' and is the dns name of the real tsm server. If you make each server stanza write to a different set of logs then that makes it easier to investigate issues. Each of the three nodenames is defined independently to the TSM server, so they can be scheduled independently. You would also define two symbolic links for extra dsmcads, so each stanza can be scheduled independently like this.
ln -s dsmcad /tivoli/tsm/client/ba/bin/dsmcad_oracle -optfile=/tivoli/tsm/client/ba/bin/dsm_Oracle.opt
ln -s dsmcad /tivoli/tsm/client/ba/bin/dsmcad_hacmp -optfile=/tivoli/tsm/client/ba/bin/dsm_HACMP.opt

________________________________________
Which nodes are associated with each schedule
The command syntax is 'query association domain_name schedule_name'. You can query all associations by using
q assoc * *
If you want to query a specific schedule, but do not know the domain, use
q assoc * sched_name

________________________________________
Synchronising your TSM server with the OS
If your TSM server is out of step with your hosting server, possibly due to a daytime savings change, you can easily change TSM to match the OS server by entering the following command at the admin command line
ACCEPT DATE

________________________________________
Canceling sessions and processes
To cancel server processes like migration you need to know the process number, which you get with the Q PROCESS command. However note that if you cancel a migration session then a new one will start unless you also reset the pool thresholds.
CANCEL PROCESS process-number
To cancel tape mount requests use the command
CANCEL REQUEST request-number
CANCEL REQUEST ALL
To cancel active sessions you either need to know the session numbers, which you get with the Q SESSION command, or you can cancel all active sessions
CANCEL SESSION request-number
To prevent new sessions or processes from starting use
DISABLE SESSION client
DISABLE SESSION Server
DISABLE SESSION Admin
These commands just prevent new sessions from starting. All active session will run to completion unless you cancel them with the cancel command. The disable session server command will stop new server to server sessions from starting, it will not stop expiration or migration.
To do an emergency cancel of all sessions use the command
DISABLE SESSION ALL
CANCEL SESSION ALL

TSM Backup Hints
Scheduling - controlling the start time
Scheduling - choosing specific weekdays
Scheduling - tips for scheduling commands
Compression space errors
how to determine when TSM will expire a backup
Incremental by Date backups; faster but less secure
An NT drive is refusing to allow a backup
Incremental backup of multiple directories using the objects field
Querying Backupsets
Adding a new Management Class
Using different Management Classes
Selecting and Excluding Drives
Including and Excluding data
TSM fails with out of memory errors on the backup client.
Selective backups of Windows directories with embedded spaces
Backing up the Windows System Object
Changing the System Object Management class
TSM and Windows ASR
Backing up the Netware NDS
Backing up a Netware Cluster
Backing up Databases with TSM
________________________________________
Scheduling - controlling the start time
Ever been in the situation where you have a tight backup window (and if you haven't, how do you get a job like that?) You schedule 8 backups to start at midnight, but some of them don't start until 01:00? Frustrating or what?
The problem is that by default, TSM tries to spread backups out in a schedule, unless you tell it not to. Issue the command QUERY STATUS on your TSM server, and look for the parameter Schedule Randomisation Percentage. By default, this will be set to 25, which means TSM will spread the start times of all the backups in a schedule, over the first 25% of the schedule window.
If you want all your backups to fire in right at the start of the schedule, then change this parameter to 0.
If you get a problem with the schedule, the error will be either 'failing' or being 'missed'. 'Failed' means that the scheduled event began, but encountered an error condition that prevented it from completing successfully. 'Missed' means that the scheduled event never began. For example, if the scheduler is not running on the client, then of course it can not execute its scheduled events; nor can it report that it did not run.
On the client machines that missed their schedule, verify that the scheduler service is started. Check the dsmsched.log and dsmerror.log to verify that the scheduler was able to successfully contact the server and query its next scheduled event.

________________________________________
Scheduling - choosing specific weekdays
If you define your client schedules using the GUI interface, you can chose to run a backup every day, weekdays only, or on a specific day of the week. You can only chose one of these options. If you are running TSM version 3 and you use a command line, you can be more selective. Say you want to define a schedule to run every Monday, Wednesday and Friday. Use a command like
DEF SCH domain_name schedule_name T=C ACT=I STARTT=22:00:00 DUR=2 DAY=Monday,Wednesday,Friday

________________________________________
Scheduling - tips for scheduling commands
When you define a schedule, if you set the ACTION paramater to C, then you can specify a command in the OBJECT parameter and TSM will schedule that command. You typically use this for database backups where you want to schedule a script that executes the backup.
What if you want to write the output of the backup script to a file? You can direct the output to a file using the > symbol, but remember that TSM will try to interpret that symbol as 'greater than' if you put spaces on both sides of it, you need to miss the space out on at least one side for the re-direction to work. So in other words, define your schedule something like this:

DEF SCH etc. ACT=C OBJ='/scripts/db2_full_backup.sh >/logs/db2backup.log'

What if you want to create a different log file each day, date stamped so you can tell when the backup happened? The following command works with AIX

DEF SCH etc. ACT=C OBJ='/scripts/db2_full_backup.sh >/logs/`eval date + "%d%B%y"`_db2backup.log'

________________________________________
Compression space errors
Prior to a client sending a file, the space (same as allocated on client) is allocated in the TSM server's disk storage pool. If caching is active in the disk storage pool, and files need to be removed to make space, they are. But if the file grows in compression (client has COMPRESSIon=Yes and COMPRESSAlways=Yes), the cleared space is insufficient to contain the incoming data.
Typically, this results in an error - 'ANR0534W Transaction failed for session ssss for node nnnn - size estimate exceeded and server is unable to obtain storage'
This commonly happens where client compression is turned on, and client has large or many compressed files: TSM is fooled as compression increases the size of an already-compressed file.
The only resolution is to take client compression off.

________________________________________
How to determine when TSM will expire a backup
The key to understanding TSM backup retention is to understand the difference between 'active' and 'inactive' backups. The 'active' backup is the most recent backup while all the older backups are 'inactive backups'. However, once a file is deleted from the client, it becomes inactive after the next incremental backup. The active backup of a file is never deleted, as it is needed to recreate a disk
TSM uses 4 parameters to retain backups. The version parameters and the retain extra parameter can take a value of 1-9999 or NOLIMIT, while the retain only parameter can take these values and also take a value of 0.
There is a fundamental difference between the versions parameters and the retain parameters. The versions parameters are controlled by the backup function, so changes to versions will not take effect until the next backup runs. The retain parameters are controlled by expiration, so changes to retention parameters take effect immediately.
• 'Versions Data Exists' is used to determine the maximum number of backup versions that will be retain for files that currently exist on the client.
• 'Versions Data Deleted' is used to determine the maximum number of retained backups for files that have been deleted from the client.
• 'Retain Extra Versions' specifies the number of days that inactive backups are kept.
• 'Retain Only Version' controls how long to keep the last backup of a file that has been deleted from the client.
So the pecking order goes like
If the file is deleted, then the most recent backup is kept for the number of days specified in 'retain only version', while older backups are retained by whichever of 'retain extra versions' and 'versions data deleted' is met first
If the file is not deleted, then the most recent backup is kept forever, while older backups are retained by whichever of 'retain extra versions' and 'versions data deleted' is met first
For example, you have RETEXTRA=31 and VEREXISTS=31. If you create 31 versions in the same day, and then (still on the same day) you create version 32, version 1 will expire, regardless of RETEXTRA, because the VEREXISTS criterion has been exceeded. Likewise, if you create version 1 today, then create version 2 a week later, then never create another version after that, then version 1 will expire 31 days after the creation of version 2, since the RETEXTRA criterion has been exceeded.
If you need to GUARANTEE data retention for 31 days you would need to code the parameters as below, but be aware that you could end up keeping a lot of backups.
Versions Data Exists = NOLIMIT
Versions Data Deleted = NOLIMIT
Retain Extra Versions = 31
Retain Only Version = 31
________________________________________
Incremental by Date backups; faster but less secure
You would use this type of backup if your backup window is not long enough during the week, but you have plenty of time at the weekend. An incremental by date backup uses the last updated timestamp on a file to decide if it needs to be backed up or not. The problems is that this field is not one hundred percent reliable on open Systems data as some applications can update data without changing the last update field. A normal TSM backup will compare the attributes of every file with the current active backup, and if they do not match, will take a new backup. Incremental by date simply looks at the last modification date, so it is much faster, and uses less memory. The downside is that is might not backup every changed file.
Also, it will not expire backup versions of files that are deleted from the workstation, it will not rebind backup versions to a new management class if the management class has changed and it ignores the copy group frequency attribute of management classes
To run an incremental by date backup, you add the parameter '-incrbydate' in the 'OPTIONS' box in the 'Define Client Schedule' GUI window.
If you use incremental by date, then to ensure you do get a full client backup, and correctly manage file expiration and management class changes, you should plan to take at least one full incremental backup every week.

________________________________________
An NT drive is refusing to allow a backup
This probably a permissions problem. You need to check what account the scheduler service uses to log in, then make sure the drive has permissions for that account.
If you are working with a Microsoft cluster, then for each logical node on the cluster you need to include CLUSTERnode Yes in the cluster.opt file. You also need to make sure that the registry replication is set up in the scheduler service, so the generated password is replicated to both nodes in the cluster. You can check this by issuing a command from either cluster node, and seeing if it connects to the TSM server
correctly.

________________________________________
Incremental backup of multiple directories using the objects field
To perform an incremental backup for several directories, code them in the objects field in the schedule like this.
OBJECTS='/path1/ /path2/ /etc/'
The terminating slash at the end of the directory name backs up the files within the directories, and they need to be separated by spaces. If you wish to back up subdirectories within these directories, then you also need
OPTIONS=-SUBDIR=YES

________________________________________
Querying Backupsets
This will give you the list of backupset volumes from volhistory:
select volume_name -
from volhistory -
where type='BACKUPSET'
This will give you the NUMBER of backupset volumes from volhistory:
select count(volume_name) -
from volhistory -
where type='BACKUPSET'
This will give you a list of the backupset volumes that are NOT checked in:
select volume_name -
from volhistory -
where type='BACKUPSET' and volume_name not in -
(select volume_name from libvolumes)
This will give you the NUMBER of backupset volumes that are NOT checked in:
select count(volume_name) -
from volhistory -
where type='BACKUPSET' and volume_name not in -
(select volume_name from libvolumes)

________________________________________
Adding a new Management Class
Management Classes are used to determine how long backups are kept for, and where the backup data goes. To understand Management Classes you need to know how they fit in.
The top level construct is a Policy Domain. Generally, you will define a Policy Domain for each class of data. For example you may have Windows, UNIX, DB2, Oracle and MSSQL Policy Domains.
Each policy domain will contain one or more Policy Sets. Only one Policy set will be active in each Policy Domain and it is actually called 'ACTIVE'. You cannot update the active policy set. The other Policy Sets are used for version control.
Each policy set contains one or more Management Classes. One Management Class must be defined as the default, the others are picked up using the methods described in the next section. Typically you will have a standard retention default management class, then you may have others for longer retentions or LAN free.
Each Management Class can have two Copy Groups, one for Backup, one for Archive. It is the Copy Groups that dictate the retention times.
So suppose you want to add a new Management Class called MCDB2_TAPE that will allow DB2 backups to go LAN free direct to tape, and you want that to be the default Management Class. You already have an exisiting DB2 Policy Domain called PSDB2 with Existing Management Classes. The first step is to make a copy of the Active Policy Set called PSDB2_MAY2010, to make sure you keep the existing classes.
(create a working copy of the policy set)
COP PO PDDB2 ACTIVE PSDB2_MAY2010

(Define the new management class to that set)
DEF MG PDDB2 PSDB2_MAY2010 MCDB2_TAPE DESC=&DB2 TAPE Management Class&

(Define a backup copygroup. The retention parms leave retention control
with DB2)

DEF CO PDDB2 PSDB2_MAY2010 MCDB2_TAPE TYPE=BACKUP DEST=&PRIMARY_TAPE_POOL& -
VERE=1 VERD=0 RETE=0 RETO=0

(Assign the new management class as the default)
AS DEFMG PDDB2 PSDB2_MAY2010 MCDB2_TAPE

(Validate the changed Policy Set)
VAL PO PDDB2 PSDB2_MAY2010 MCDB2_TAPE

(Activate the new Policy Set)
ACT PO PDDB2 PSDB2_MAY2010 MCDB2_TAPE
Of course there is a lot more to LAN free than just defining a Management Class, but LAN free will not work unless the target files are directed straight to tape.

________________________________________
Using Different Management Classes
One way to bind a set of backups to a different management class is to add an include statement into the client options file with a statement like
INCLUDE "C:Program FilesMicrosoft SQL ServerMSSQLBackup...*" MCSQLBK
means bind all files and files in subdirectories in the Backup directory to special management class MCSQLBK. If you add this statement, you will bind all previous backups of these files to the new management class. The '...' means scan subdirectories
The rebind happens next time a backup runs. This will work for every backup version of the file, not just for the active one. The file must be examined again to get a new backup class, so you cannot change management classes for files that have been deleted from the client. part of the statement means include subdirectories.
Another way is to define a client option set on the TSM server that contains INCLUDE and DIRMC statements that binds all files and directories to the desired management class, then update the client node to use that client option set.
Finally you could define a domain and policy set that contains only the single management class by which you want to manage the client node data, then assign the desired nodes to that domain.

________________________________________
Selecting and Excluding Drives
You can select and exclude drives from scheduled backups by placing an entry in the DSM.OPT file on the client, with a line which looks like this for a Netware client
DOMain DATA: APPLICS: SYS:
or for a windows client
DOMAIN c: d:
The problem with these approaches is that you need to remember to update the dsm.opt file if you add new drives
DOMAIN ALL-LOCAL
will backup everything. A good variant, if you never wanted to backup your SYS: drives for example is
DOMAIN ALL-LOCAL -SYS:
This means that all drives are backed up except the SYS: drive, and you do not need to change the dsm.opt file as drives are added or removed. Specific selection criteria for the Windows System Object, or the Netware eDirectory (NDS) are given below.
However, this approach will only work if you are backing up at domain level, which is the normal approach with a scheduled backup. If you specifically select a volume that is excluded in the dsm.opt file, then it will be backed up.
dsmc incremental sys: -subdir=yes
will backup the entire SYS: volume, even though it is excluded in the domain statement. This is not necessarily a bad thing, it means that you can exclude SYS: from scheduled backups, but when you really want a backup, you can do it manually. This approach will also allow you to backup selected files or directories from an excluded disk
dsmc selective sys:tivolitsm* -subdir=yes
If you absolutely never want to let anyone backup the SYS: volume in any circumstances, then use EXCLUDE statements as these will always apply. To exclude an entire disk you need two commands, one to exclude the directories and one to exclude the files in the root folder, like this (Technically, this does not quite exclude the whole disk as the root of the volume itself will be backed up once).
exclude sys:*
exclude.dir sys:*

or

exclude c:*
exclude.dir c:*
So now if you try to backup with the command
dsmc selective sys:tivolitsm* -subdir=yes
nothing will happen, because those files are always excluded.
________________________________________
Including and Excluding data
There are two types of INCLUDE and EXCLUDE commands, INCLUDE, EXCLUDE, INCLUDE.DIR and EXCLUDE.DIR. The first two commands will include or exclude files, the second two commands include or exclude directories. Syntax is
INCLUDE c:filespacename..*
EXCLUDE.DIR c:filespacenametemp
Note that the directory exclude does not end with a ''. The inclexcl list is normally processed from the bottom up but EXCLUDE.DIR is processed before the regular INCLUDE or EXCLUDE statements so the position of EXCLUDE.DIR does not matter. However it is best to put the DIR processing statements at the bottom of the list to make it more obvious how the processing works.
If the data path you are describing includes spaces, you must include the full statement in quotes, i.e.

INCLUDE"C:Program FilesMicrosoft SQL ServerMSSQLBackup...*"

The ... means include subdirectories. If you just coded ServerMSSQLBackup* then the include would only apply to files within the backup directory, and not any files in subdirectories.
Remember, the EXCLUDE statements must come before the INCLUDE
EXCLUDE filespec*.* will only exclude files which have a period in the name. If you want to exclude all files in a path, you need EXCLUDE filespec*
Recent versions of the TSM client software provide an exclude.fs statement that will exclude an entire file system. It is more efficient than using a plain exclude statement. An exclude.fs suppresses any examination of the directories within the file system. An exclude covering an entire file system does not. The TSM client will still read every file name in every directory within the file system. It will check each file name against the include and exclude statements. It will then decide not to back up that file (assuming the exclude for the entire file system is in the right place in the sequence of include and exclude statements).
Another recent addition is the
query inclexcl
command, which lets you check your syntax. This was introduced in TSM 4.1 It was still possible to check syntax on older releases, using the unsupported show inclexcl command.
Another variant is to use ranges within EXCLUDE commands, for example, say you wanted to backup a large disk with three backup streams, but you did not want to have to change include statements when adding new directories (it would be more likely that you would not know when new directories were added, so they would be missed). You could define three clients, each with its own dsm.opt file, with exclude statements as shown below. You would need to ensure that the ranges covered every possible directory name.
Content of dsm_a.opt
NODENAME Servername_A
EXCLUDE.DIR \Servernamediskname[K-O]*
EXCLUDE.DIR \Servernamediskname[P-Z0-9$]*

Content of dsm_b.opt
NODENAME Servername_B
EXCLUDE.DIR \Servernamediskname[A-J]*
EXCLUDE.DIR \Servernamediskname[P-Z0-9$]*

Content of dsm_c.opt
NODENAME Servername_C
EXCLUDE.DIR \Servernamediskname[A-J]*
EXCLUDE.DIR \Servernamediskname[K-O]*
Be aware that if you are running TSM version 5.4.0.0 or 5.4.0.1 and change your include-exclude options using the client setup wizard or the Windows backup-archive GUI client preferences editor the order in which INCLUDE and EXCLUDE options occur in the options file will be reversed. You will almost certainly end up missing files that should be backed up. Your best option is to install version 5.4.0.2

________________________________________
TSM fails with out of memory errors on the backup client.
If you are having problems with TSM running out of client memory then you have a number of options to fix it. TSM uses about 300 bytes in memory for every object it backs up in a file system. This means that a 2GB virtual memory will support about 7 million objects. Note that every operating system will have a maximum virtual memory limit which places an upper limit on the number of objects that can be backed using default parameters.
One solution is to specify memoryefficientbackup yes in the dsm.opt file. This changes the processing so now TSM only needs enough memory to process all the objects in one directory, not the whole file system. However this option does have a performance overhead.
Another option is to use incrbydate, described elsewhere on this page.
Other options include using Windows journalling backups, TSM image backups or on a UNIX system, define multiple virtual mount points within one file system. Each mount point would be backed up independently.

________________________________________
Selective backups of Windows directories with embedded spaces
It is possible to select a number of directories from a Windows command line interface by listing them as in the example command below.
dsmc inc c:dir1* &d:dir2sub dir1*& d:dir3 -subdir=yes
This command will incrementally backup all files in directories
c:dir1
d:dir2sub dir1
d:dir3
and any subdirectories underneath them. Note that the asterisk is required in the second directory because 'sub dir1' contains an embedded space and needs to be enclosed in quotation marks. The Windows command processor treats a " combination in a special way, but it will parse a *" combination as expected

________________________________________
Backing up the Windows System Object
See the Windows section for details of what Microsoft call the System State , and TSM calls the System Object.
You backup the Windows System object by either using the command
backup systemobject
Or by specifying an ALL-LOCAL domain in the dsm.opt file. The ALL-LOCAL domain includes the system object. Exactly what you backup will depend on the release of Windows, and what Windows components are installed.
The full list of system objects is
• Active Directory (domain controller only)
• Certificate Server Database
• Cluster Database (cluster node only)
• COM+ database
• Registry
• System and boot files
• System volume
• Event logs (system, security and application)
• Removable Storage Management Database (RSM)
• Replicated file systems
• Windows Management Instrumentation (WMI) repository
If you expand the System Object container in the Windows B/A GUI, you can see which system objects are active for a given client. Alternatively, you can issue the undocumented command
SHOW SYSTEMOBJECT
One way to check that the system object is being backed up, is to look for a 'SYSTEM OBJECT' file space on the TSM server.
The main issue with system object backups is that they are always backed up every night, and the amount data can be about 2GB. You may want to restrict the number of backups held, by binding the system object files to a management class that keeps relatively few versions. You this with the following include statement in the dsm.opt file, or in include/exclude file if you keep these separate
INCLUDE.SYSTEMOBJECT ALL yourmgmtclassname
If you don't want to backup your system object every day, you can exclude the system object in the dsm.opt file by prefixing it with a '-' (minus) sign. (If you are running TSM 5.1 or higher).
DOMAIN ALL-LOCAL -SYSTEMOBJECT
Then once a week, you would run a special schedule to execute the 'backup systemobject' command. You can't do this directly, you need to create a macro file that contains the command
BACKUP SYSTEMOBJECT
in a file called something like C:Program FilesTivoliTSMbaclientbacksobj.mac and then create a schedule like

DEFINE SCHEDULE STANDARD BACKUPSOBJ ACTION=MACRO
OBJECTS=&C:Program FilesTivoliTSMbaclientbacksobj.mac&
Since version 5.2.2, TSM will use the Microsoft Volume Shadowcopy Service (VSS) on Windows Server 2003 to back up all system state components as a single object. This will provide a consistent point-in-time snapshot of the system state.

________________________________________
Changing the Management class for System Objects
If you want to change the management class on System Objects, you need to add a line to dsm.opt. ALL system objects must be bound to the same management class.
include.systemobject ALL new_class_name
If the system object won't rebind to new mgt class, try deleting the filespace, and it should rebind on the next backup. The command to do this from the server is
del fi nodename &SYSTEM OBJECT& nametype=unicode
Why would you want to do this? Well the system state is large, so if your default management class holds 40 backup versions say, then you will use lots of TSM backup space for the system state for every server. You can assign a management class to the system state that keeps fewer versions for less time, to save space.

________________________________________
TSM and Windows ASR
TSM can work with Windows 2003 ASR to simplify bare metal recovery, provided the TSM client is at 5.2.0 or above.
First, you must configure the TSM backup client for Online Image support and Open File support. Once you do that, you can take an ASR backup. This is not the same as creating an ASR recovery CD, that is discussed in the Windows section.
You take an ASR backup using the TSM client GUI, check the box that says 'Automated System Recovery' then click on the Backup button. Windows will then write system information to ASR state files called NTDLL.asr and SMSS.asr.
Then you need to backup the System State, by checking the boxes against System Services and System State and clicking on backup
Next, take an incremental backup of the local drive that contains the windows operating system. Once the backup is complete, select 'utilities' from the top line menu in the TSM GUI then select 'Create ASR diskette' from the drop down menu. You will be asked to confirm the floppy drive, and to insert a floppy diskette. If you did not take an ASR backup as in step one above, then this step will fail.

________________________________________
Backing up Databases with TSM
Database backups are usually a bit special as a database usually consists of a number of physical files that all need to be backed up as an entity, often with consistent time stamps. Databases also have transaction logs to ensure that the data stored in a database is consistent, even after a hardware failure. Databases usually have internal catalogs which record these files, so when you do a restore, you need to make sure that the catalogs hold the correct information too. To help with this lot, databases have a Database Management System (DBMS), which tracks physical database files, transaction logs and backups. A DBMS will usually be able to run a backup while the database is active, which effectively means no backup window is required.
This means that it is not a good idea to use TSM simply to backup databases as a collection of flat files, but what you want is for TSM to use the DBMS facilities to get a good, secure backup. TSM does this by providing (chargeable) add-on modules called TDPs or Tivoli Data Protection modules. TDPs exist for DB2/UDB, Oracle, Informix and MS-SQL databases; and also Lotus Domino and MS-Exchange e-mail databases.
It is also possible to interface TSM with DBMS systems for other databases using ADSMPIPE. This is described in the red paper
http://www.redbooks.ibm.com/abstracts/redp3980.html

TSM Restore Hints
Date options on restores
Using the GUI for PIT restores
Finding the tapes needed to restore a node
How do you recover files between 2 different clients?
Cross-client Windows restores
How to recover a file with spaces in the name
What is the difference between a no query restore and a standard restore?
How do you exclude files from a restore?
How to recover the Windows System Object
Restoring a Windows 2003 server using ASR
Windows 2000/NT Bare Metal Recovery Procedure
Restoring Netware Cluster files
________________________________________
Date options on restores
Date options on restores can be a bit confusing. The first thing to remember, especially if you move between companies, is that there are various different date formats available in TSM. The date format is set in the TSM OPT file used by the TSM server. The default format is MM/DD/YYYY, other possibilities are DD-MM-YYYY, YYYY-MM-DD, DD.MM.YYY & YYYY.MM.DD.
There are three different types of date available for restores
-TODATE will restore all ACTIVE and INACTIVE files backed up BEFORE the date specified
-FROMDATE will restore all ACTIVE and INACTIVE files backed up AFTER the indicated date.
-PITDATE will restore only the files that were ACTIVE on the day specified.
To use these effectively, you have to consider when a backup was taken. For example, you want to recover d:/mydir, as it was on July 4th, 2001. The backups run overnight, and usually complete by 06:00. The command to use would be
res d:/mydir/* -pitdate=07/05/2001 -pittime=07:00:00 -subdir=yes
You specify 07:00 on July 5th., to catch backups of files changed on July 4th., and that will get your directory back to as it was on the evening of July 4th.
PITDATE is slower than -FROMDATE & -TODATE, though it restores less data. That is because the -PITDATE option uses a different 'No-Query' Restore protocol. Your copygroup retention parameters determine how far back you can got with PITDATE. If you want to be able to take a server, or a directory back in time for 60 days, then you need to code
VEREXISTS=NOLIMIT
VERDELETED=NOLIMIT
RETEXTRA=60
RETONLY=60

________________________________________
Finding the tapes needed to restore a node
Try the following SQL query
select distinct node_name,volume_name,stgpool_name
from volumeusage
where node_name='xxxxx'
You need to change the xxxxx to your node name, and it will be case sensitive.

________________________________________
Using the GUI for PIT restores - the files are missing!
There are two ways to run Point-in-time (PIT) restores; from the command line and from the GUI. However if you use the GUI you might find that it does not display any files for a given date, even though you know that backups exist.
The problem is down to the way the GUI works. It does not display a list of all available backups and versions for a client, as this would take too long to build and probably cause memory shortages on the client. You are initially presented with a list of available file spaces, and you have to drill down through the file spaces and directories to get a list of backups available in a given directory.
However if there is no retained backup of a directory then you cannot click on it to see the backups within it. Directory backup retention does not depend on the number of backup versions kept, but they get the management class within the policy domain that has the highest 'Retain only version' setting. It is possible that the management class (MC1) with the longest RETONLY setting will only keep one backup version, while you may have another class (MC2) that keeps twenty versions, but with a smaller RETONLY setting.
So consider the following scenario. You backup your directories to MC1 by default, but you bind all files in the /audit/ directory to the MC2 management class
Apr 01 - run the your first backup, the directories will all get MC1 management class and all the audit files get he MC2 management class.
Apr02-06, further backups of audit files, up to 6 backup versions retained, depending on how often the files change
Apr07, change the AUDIT directory and it gets backed up again, the first version expires
Apr08-12, more backups of audit files, up to 12 backup versions now retained.
Apr13, decide to recover the AUDIT directory back to Apr 04, run a PIT restore through the GUI, but you can't see anything as the directory backup from Apr01 has expired but you know that backups of files exist as they have a 20 day retention!
The short term solution is to use the command line as this does not rely on the ability to display directories before it can display its files and subdirectories.
If you want to use the GUI then the long term strategy is to make sure you keep enough backup versions of your directories by assigning them to a suitable management class. Set one up with VEREXISTS=NOLIMIT, VERDELETED=NOLIMIT, and RETEXTRA='n' where 'n' is the number of days that you must be able to go back with the GUI to do a PIT restore. You could set RETEXTRA to NOLIMIT of course but that could be expensive in terms of database usage

________________________________________
How do you recover files between 2 different clients?

The first thing to note is that the clients must both be the same platform. You cannot recover a file backed up from a UNIX box to an NT box, for example.
After that, you have 2 options, you can GRANT permissions to recover between servers, or you can fool TSM by changing the dsm.opt file. Say you want to restore client A's filespaces to client B
Just go into the dsm.opt of clientB, and change the NODENAME parameter from ClientB to ClientA. You can then kick off a normal restore.
However, remember to make sure no backups of clientA run while you are busy, or the wrong data will get backed up. And, of course, remember to change the dsm.opt file back again when you are finished.
If that method looks too hairy, then the safe way is to grant access. Issue this command from the DSM prompt in serverA
SET Access Backup * Server_B
the recovery command is
res -fromnode=serverA serverA/filename serverB/filename etc...
________________________________________
Cross-client Windows restores
First, you need to set restore permissions as above
With Windows, start the client command line with start dsm -nodename=xxxxxx where xxxxxx is the name of the node you are recovering from, and you are logged onto the node you are recovering to. Windows seems to be very fussy about having the exact file space name. Typically, this will be '\servernamedisk' . For example '\CL12482d$' You can get the exact file space name by running a 'query filespace' from the TSM server. Technically, you don't need quotes in these examples, but I like to give windows restores all the help I can.
You will be prompted for the TSM userid and password for the source server, before the restore will start.

________________________________________
How to recover a file with spaces in the name
Enclose the filepath in quotes. For example

restore "\server namec$\PROGRAM FILESCOMMON FILESfile name.ext"
This applies in any circumstance where you have to specify files by name.

________________________________________
What is the difference between a no query restore and a standard restore?
In a standard restore, the client queries the server for all objects that match the restore file specification. The server sends this information to the client, then the client sorts it so that tape mounts will be optimized. However, the time involved in getting the information from the server, then sorting it (before any data is actually restored), can be quite lengthy.
A 'No query' restore lets the TSM server do the work: the client sends the restore file specification to the server, the server figures out the optimal tape mount order, and then starts sending the restored data to the client. The server can do this faster, and thus the time it takes to start actually restoring data is reduced. For a given restore, TSM mounts each needed tape once and only once, and reads it from front to back.
The most visible benefit of no query restore is that data starts coming back from the server sooner than it does with 'classic' restore.

________________________________________
How do you exclude files from a restore?
It is possible to do this using an undocumented command. All the usual caveats for undocumented commands apply: Test it first before you use it for real, and it it goes wrong, don't expect to get any support.
The command is
EXCLUDE.RESTORE file.name

________________________________________
How to restore the Windows System Object
See the Windows section for details of what Microsoft call the System State , and TSM calls the System Object.
The command is simply

restore systemobject
It will restore ALL of the System objects which were backed up, but exactly what was backed up will depend on your Windows version and setup. See the TSM backup tips section for details.
The restore systemobject command is only valid for Windows 2000, XP, and Windows.NET.
If you just want to restore the Windows NT registry system object, use the command
restore registry entire
This command has a number of parameters which allow you to restore just part of the registry
To restore the active Windows NT eventlog, use the command
restore eventlog entire

________________________________________
Restoring a Windows 2003 server using TSM ASR
Recovery pre-requisites
The main physical requirement for BMR is that the new server must have the same number of disks as the original, and each disk must be the same size, or larger than the original disks.
To run the restore, you need
• An ASR diskette, as described in the Windows section
• a Windows CD that is at the same operating system level, and with the same service packs, as the operating system was at when you took the ASR backup
• a TSM client CD that is at the same level or higher as the one that was used for the backup. It must be 5.2.0 or above.
• A network connection that supports DHCP. (Dynamic Host Configuration Protocol is an Internet protocol for automating the configuration of computers that use TCP/IP)
• The TSM node and password for the original client
The recovery process
1. Boot the server using the windows install CD. You will need to reply to the prompt 'Press any key to boot from CD .. ' quite quickly
2. Once setup commences you will see a message at the bottom of the screen that says 'Press F2 to run Automated System Recovery (ASR)' Again, you need to press F2 quickly or the setup will continue
3. Insert the ASR diskette that you created with TSM into the floppy drive when you see the prompt 'Please insert the disk labeled Windows Automated System Recovery Disk' TSM will actually have labeled this disk TSMASR
4. Windows will then check and reformat the disk partitions and the boot volume as necessary to match the original server setup, then copy some file across.
5. When prompted by the message 'Insert the CD labeled TSMCLI into your CD-ROM drive', do just that. This will copy TSMCLI.EXE over.
6. You will then be prompted to mount the TSMASR diskette again, so Windows can copy over three TSMASR files.
7. The server will reboot, you will be prompted to remove the TSMASR diskette first.
8. You then need to put the windows install CD back into the CD drive to allow the install to continue.
9. ASR will install the TSM client, then ask you if you want to recover from a remote TSM server, or a local backupset, and ask for the TSM userid and password.
10. TSM will then recover the system disks from the TSM backup, the system will be rebooted, and the server will be back to its former state, except that you will have to run normal TSM restores for all the data and application drives.
Of course, all this depends on you having an up to date TSM ASR diskette. What if you didn't create one? If you are stuck, you could create an ASR diskette on another (working) machine following the instructions in the backup section. You will need to edit the tsmasr.opt and tsmasr.cmd files on the diskette to change the nodename from the working server to the one you want to recover.

________________________________________
Windows 2000/NT Bare Metal Recovery Procedure
I have used the procedure below to recover Windows 2000 servers, but I strongly recommend that you test and refine it to suit your environment before you have to use it for real.
1. Build a base W2K system from same service pack level as the system you are restoring, and use the same hostname and IP address.
2. Create the data partition(s) and change the drive letters as necessary then reboot. This all assumes that you have all this information recorded and up to date.
3. Install TSM client and reboot.
4. Create a new folder in %system% called 'Sysfiles'. Copy hal.dll, kernel32.dll, ntdll.dll, ntoskrnl.exe, win32k.sys and winsrv.dll to this folder. Back them up using ntbackup.exe to a file named sysfiles.bkf. Copy %system%system32driversatapi.sys to the Sysfiles folder and rename it to the same as the live server's RAID driver (eg;- pedge.sys). Back it up using ntbackup.exe to a file in the same folder called driver.bkf.
5. Use TSM to restore the system partition and system state with the exception of boot.ini. Use 'replace all' and DO NOT REBOOT when prompted at the end of the restore.
6. Using ntbackup.exe restore the contents of sysfiles.bkf to %system%system32 and driver.bkf to %system%system32drivers, reboot.
7. Boot from the W2K install CD and start the setup routine; select Install and then Repair - if Windows doesn't autodetect your SCSI or RAID card you'll need a diskette with the drivers and will need to watch for the 'press the F6 key' prompt when setup starts. Let the setup routine redetect hardware and reinstall the system files
8. Check the LAN connections in Network Places and note the name of the working one. Run Regedit and open the following key;
9. HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlNetwork
10. Export a copy of this key to the Sysfile folder. Open the first GUID entry under this key, which should contain further GUIDs for each network adapter's connection. The first sub-key GUID entry should be for the current functioning NIC's connection. The name is noted in the connection sub-key of each entry.
11. Delete all other GUID entries but do not delete the Connections key.
12. Bind and configure IP correctly, if any errors are encountered uninstall and re-install the NIC first. Re-install the TSM client and restore the application and user data.
________________________________________
Data Archiving
The Archive Command

Data archiving is intended to preserve a copy of a related set of files as they stood at a point in time for legal or compliance purposes. This set of files might consist of tax records, end-of-project reports or similar.
An Archive will typically be requested by the customer as a one-off process, it would not normally be a scheduled event. An Archive will usually be retained for several years.
If an Archive is required again, then normally the entire set of files is brought back to a new location, and the process is called 'Retrieve'.
An Archive is not the same as Backup, which typically involves copying an unrelated set of changed files to tape and retaining them for a relatively short time. Also a Backup does not affect the source data, whereas an Archive can delete the source data. An Archive is not the same as HSM migration, which involves moving older files off primary disk to cheaper storage. Migration is about managing disk space, Archive is about retaining data.

The TSM ARCHIVE command has the following options
ARCHIVE pathname.filename
Will simply archive a file
ARCHIVE pathname/* -deletefiles -subdir=yes
Will archive files in a directory including subdirectories then delete them from source
ARCHIVE -filelist=textfilename
You create a list of files to be archived and put them in a text file. The Archive command reads this list and archives the data off. The file list must include the full path name, so this can be used to archive selected files from different paths
ARCHIVE -filelist=textfilename -archmc=mc7yrs
-deletefiles -description="End of year data for inland revenue, requested by Colin Green"
Archive a list of files, bind the archive to a seven year management class (that you have previously set up with a seven year retention), delete the originals and give it a meaningful description (254 chars max)
ARCHIVE pathname.filename -v2Archive
Use the v2archive option to generate secondary description tables - see the performance section below
It is possible to archive data using the Web client GUI, but you get fewer options than with the command line.

The Retrieve Process
The following command line options are available to retrieve the data. The different options can be used in combination
RETRIEVE pathname.filename
Will simply retrieve an archived file to the original location. You will be prompted if the data already exists
RETRIEVE pathname/* newpathname/
Retrieve all files in a directory to a new location
RETRIEVE pathname/* -pick
Get a pick list of archived files from a specified directory. You can then select those files you want retrieved.
RETRIEVE -filelist=textfilename
Retrieve a list of files that are specified in a file
It is also possible to retrieve files with the GUI as shown below

Finding Archived files
To find individual archive runs you could use an SQL query
select NODE_NAME,ARCHIVE_DATE,CLASS_NAME,DESCRIPTION
from ARCHIVES
where NODE_NAME='node'
One easy way to find archived files is to use the TSM Web Client GUI as shown above. Point your browser to http://servername:1581 and select the retrieve option. This will list all archives for that server, and you can drill down into each archive to find individual files.
If you need to provide a list of archived files, then from the command line you can use the QUERY ARCHIVE command and pipe the output into a file for perusal by a user.

Archive Performance Improvements
Tivoli introduced a couple of new archive commands in the April '06 update to TSM5.2 and 5.3. These are really just intended for sites that run thousands of archive operations and as a result have storage and performance issues.
The GUI and Web clients use the archive description as the primary way to navigate to a specific archive but as these descriptions are in text format and were held in the same primary archive table as the archived file path names the search can take a long time. To speed up search performance some of these search items are also held in secondary description tables. Archives that are invoked from the Web Client or GUI always use the secondary tables. Command line archives can be forced to use the secondary tables if they are 'converted' by using a CONVERT ARCHIVE command.

CONVERT ARCHIVE
Use this command convert archives run from the command line that should be forced to store filespace and description data in secondary tables to speed up searches. This is just appropriate if you run repeated archives over the same set of data, giving each archive a different description. If you use the command line for archives and recalls, and do not use the description to identify archives; then do not convert the archives to save on database size, and use the -v2Archive option with subsequent archive requests. Syntax:

CONVERT ARCHIVE nodename

UPDATE ARCHIVE
Use this command to save on database space if your database has large numbers of archive entries, where large means 100,000 or more. This command should not be used if anyone uses, or may use the Web Client or GUI to work with archived files.
UPDate ARCHIve node_name -SHOWstats -RESETDescriptions -DELETEDirs
SHOWstats
statistics include the number of directory and file entries, the number of entries for directories with the same path specification but different descriptions, and whether the node is converted.
RESETDescriptions
resets the description field to the same description for all archive entries for a node. This means that every archive for a given directory will belong to the same package. Once the descriptions are changed, they cannot be restored.
DELETEDirs
deletes all archive directory entries for the node. This means that the original access permissions cannot be provided when files are retrieved. This might not be as important as saving the database space. Once the directory entries are removed, they cannot be restored.
UNDO ARCHCONVERSION
This command empties out the secondary description tables. It does not lose the archive directory or file data as that is held in the primary archive table entries. You can either use this command on its own to free up the database space used by the secondary description tables, or you can follow it with the CONVERT ARCHIVE command to audit and refresh the secondary description tables. The syntax is
UNDO ARCHConversion node_name

TSM Scripting Hints
Send offsite and recall Copy tapes
Free up tapes which are almost empty
Multiple macro commands
Exporting scripts
Updating Netware Client Passwords
________________________________________
Send offsite and recall Copy tapes
To move tapes offsite schedule this command daily
Command: update vol * access=offsite
where access=readwrite
where stg=copy_pool location='offsite_location'
where status=full
{just send out full tapes}
To recall back onsite schedule this command weekly
Command: update vol * access=readwrite
where access=offsite
where stg='offsite_location='
where status=empty
{recall empty tapes}

________________________________________
Free up tapes which are almost empty
The script is
for volume in
$(dsmadmc -id=adminname -pass=adminpass -outfile
"select 'MARKER', volume_name from volumes
where pct_utilized >20 "
| grep '^MARKER' | cut -b 12-) ;
do dsmadmc -id=adminname -pass=adminpass -outfile
'move data ${volume} options..' ; done

Explanation : Use an SQL query to generate a list of almost empty tapes
SELECT 'MARKER', volume_name stgpool_name,pct_utilized from volumes
WHERE PCT_UTILIZED < 10

will output something like :

MARKER VOLUME_NAME STGPOOL_NAME PCT_UTILIZED
---------- ------------------ ------------------
MARKER A40012 pool_name 8.1
MARKER A40015 pool_name 2.0

Use the grep command to extract lines we want
grep 'MARKER'

gives

MARKER A40012
MARKER A40015

use cut to extract the tape volsers
cut -b 12-

gives

A40012
A40015


________________________________________
Multiple macro commands
If you specify several commands in a macro, sometimes the macro does not seem to get past the first command. Try using 'commit' after each statement:

q vol access=destroyed
commit
q vol access=unavail
commit
________________________________________
Exporting scripts
The query script command will list out the names of all your scripts and their descriptions like this
Name: QUERY_BACKUPS
Description: Query Daily Backups
Managing profile:
If you want to export your scripts, say to build another server, then you need to know the details. The command to use is
Q script f=macro
which will produce output in a format that can be executed as a macro. If you pipe that command to a file, the file can be copied to another server for execution.

________________________________________
TSM LAN free backups
Why would you want to use LAN free? Generally speaking, backing up and restoring over a SAN is much faster that over a LAN, but that really depends on how fast those two items are in your site. Another benefit for traditional TSM is that LAN free avoids clogging up backup disk pools with large backups, as this uses a front end disk pool that is offloaded to tape each day. LAN free is also very suitable for big databases. However many TSM installations do not use tape, or they may use active copy storage pools. So LAN free is not an automatic choice, you need to study the costs and the benefits to see if it will fit into your site. Also, tape drives are finite resource and too much LAN free will quickly use them up.
Assuming you decide to install LAN free, this is how you would do it on AIX.
Connecting the Tape Drives
First you need to organise cabling up your tape drives. If your backups are critical you will need two fiber cards installed in each client for resilience. Then you need them cabled up to your SAN switches and zoned in so your client can access the tape drives. It is possible to rename your tape drives to something more meaningful that the rmtnn names that UNIX provides by default, but to do that you need to install an Atape driver, supplied free by IBM. Once you do this, you rename the drives with smitty, or with the chdev command as below. It is best to give your drives a name that contains the WWWN then they are unique. When you rename them, several files will be created in the /dev directory. The device parameter in your path name in TSM must match these names. The chdev command in full is:
chdev -l rmt0 -a new_name=T_AA02450

Defining the Storage Agent
Define at the TSM Server
Use a command like the one below to define a storage agent to the TSM server. If you want to look at existing storage agents to see how they are defined, you will find them by using the 'q server' command or the 'other servers' tab in the GUI.
define server nnnnn serverpassword=password hla=x.xx.xx.xx lla=xxxx
• nnnnn is the name of the storage agent. Each agent name needs to be unique, so come up with a good naming standard. A simple standard that works is nodename_agent.
• password is the password for the storage agent and must match the password supplied in the client definition below
• xx.xx.xx.xx is the ip address (or DNS entry) of the client machine
• xxxx is the port you use in the dsm.sys for the storage agent. The example below uses 1510.
define the tape paths
To define the tape paths to the TSM server, use commands like
define path agent.name tape.name SRCT=server destt=drive library=LIBNAME device=/dev/tape_name
If you use alternate pathing, add a '0' onto the end of your tape name in the device parameter above.

Define at the Client
The Storage Agent is usually found in the /opt/tivoli/tsm/StorageAgent/bin/ for TSM 6.1 clients and above, and /usr/tivoli/tsm/StorageAgent/bin/ in TSM 5.5 clients and below. A Storage Agent is basically just a cutdown version of a TSM server with a reduced command set. It needs a dsmsta.opt options file just like a real server and typical values could be as follows.
Commmethod tcpip
Devconfig devconfig.sta
txngroupmax 512
enable3590library yes
commtimeout 7200
idletimeout 120
tcpport 1510
You need to work out what timeout parameters are best for your site. Commtimeout is in seconds and idletimeout in minutes, so they are both set here to two hours. If you are backing up Oracle databases with incremental RMAN, then Oracle can spend some time searching its catalog to workout what needs to be backed up. Without high timeout values the backup could fail, so the numbers above could be reasonable. However this does mean that if you hit a problem you are locking out tape drives for a long time, so smaller values could be more appropriate for standard backups.
Once you have an options file you create the storage agent with the command
dsmsta setstorageserver myname=nnnn mypasswordpppp myhladdress=nnn
servername=TSM1 serverpassword=pppp hla=nnn lla=mmmm
This will create a file called devconfig.sta that contains the above details, with the passwords encrypted. The parameters in the command are
• myname is the name you call your storage agent, the same one you used to define the agent to the TSM server above.
• mypassword is the password for the storage agent and must match the password used when you defined the agent to the server.
• myhladdress is the TCPIP address of the client that is hosting this storage agent
• servername is the name of the TSM server
• serverpassword is the password for the TSM server
• hla is the tcpip address that you use for comms. with the TSM server, the same one that you use in the dsm.sys file. This address is not used on initial start up, so if you get this wrong, LAN free will appear to be working fine and the backups will work. However the tape dismount will fail and the tape drive will go into dismount retry failure mode. The only way to free the drive up is to stop the storage agent on the client.
• lla is the port name that you use to access the TSM server, the tcpport parameter in the dsm.sys file
Next you need to start your storage agent, which you can do by simply typing echo "dsmsta" from the command line. To stop the storage agent, you can log into it using 'dsmadmc -se=storageagentname then typing halt from the command line. Alternatively, just use the UNIX kill command.

Changes to dsm.sys
To use LAN free backups you need to make a few changes to the dsm.sys file. First you need to add a stanza for the storage agent like this, which is used to connect to the storage agent with the dsmadmc command. The node name must not be the one you use for normal backups.
servername nodename_agent
tcpserveraddress 127.0.0.1
tcpport 1510
commmethod tcpip
The tcpserveraddress could be the address of your real server, but it is shown here as the standard IP address of the localhost, as it is just used for internal communication.
The tcpport number does not have to be 1510, but it must match the value you used in dsmsta.opt. I avoid 1500 as that is the server default and like to use 1501 upwards for the webports, so it seems reasonable to standardise all the storage agents at 1510. You must make sure that you do not conflict with any address used by other software on your machine so this standard might not be suitable for you.
Next you need to add some lines to your existing backup stanzas like this
servername TSM1
commmethod tcpip
webports 1501,0
other parms .
lanfreecommmethod tcpip
lanfreetcpport 1510
enablelanfree yes
etc
The lanfreetcpport number must match the tcpport in dsmsta.opt and the tcpport in the storage agent stanza
Management Class
Finally you need to create a management class that writes direct to tape. The backup section discusses how to do this

TSM Database and Recovery Log
TSM Database version 6.1 and later
Basic database structure
TSM 6.1 uses a DB2 relational database. This means it can be much bigger that the legacy database, does not neeed database audits and will automatically re-organise itself if required. Software database mirroring is no longer supported.
One thing to note is that you cannot have any other DB2 applications running on the same server as the one hosting your TSM database, though you can have multiple TSM databases on one server.
Several of the commands we used to use to manage the database are different. Some of these are :-
New Command Effect Old command
EXTEND DBSPACE Increase the size of the database DEFINE DBVOLUME then EXTEND DB
QUERY DBSPACE Check the database size QUERY DBVOLUME
SET DBRECOVERY Used to define the database backup device class and must be run before the first backup. DEFINE DBBACKUPTRIGGER
SET DBREPORTMODE A new command used to decide how much diagnostic information to report. Options are; NONE, FULL, PARTIAL. None
Database sizing and tuning
The database does not consist of a collection of files or 'volumes' like the legacy database. Instead, the database can exist in up to 128 directories or 'containers' to use the correct DB2 term. The data is striped evenly across the directories and unline the legacy DB, they do not require an initial format before they can be used. The Q DBSPACE output below shows a database striped over 3 containers.
LOCATION TOTAL SIZE OF File System(MB) SPACE USED ON FILE SYSTEM (MB) FREE SPACE AVAILABLE (MB)
/tsm/tivoli/dbdir001 102,144 6,919.22 98,224
/tsm/tivoli/dbdir002 102,144 6,919.22 98,224
/tsm/tivoli/dbdir003 102,144 6,919.22 98,224
The database can be sized anywhere between 2.2GB and 1TB. The DB2 database will be between 35 and 50% bigger than the equivalent legacy database, partly becuase it hold sort space for SQL queries. The DB2 database is largely self tuning, so there is no requirement for DB2 tuning skills. A new parameter, DBMEMPERCENT, replaces the old BUFFPOOLSIZE. This set of buffers contains much more data than the old buffer so the recommendation is to set its size to unlimited. In fact, TSM/DB2 will try to change it to unlimited on startup.
Two other legacy features are not required now; database audits and indexed tables.
The database uses DB2 relational consistency rules to prevent incorrect data from entering, and is self auditing. The database will aslo run automatic 'runstats' from time to time. This is a DB2 feature that optimises storage paths through the database to improve performance.
The database also uses relational indices, so it does not require special index tables to speed up SQL queries.

Recovery log sizing and tuning
TSM 6.1 has two recovery logs.
The Active recovery log contains updates that have not been committed to disk yet and is used for roll-forward or roll-back in case of problems. Once a transaction is committed, the data is moved to the archive log. The default size for the Active log is 2GB and the size can be inceased by increments of 512MB right up to 128GB.
The Archive log contains committed transaction data and is used for PIT recovery of the database. The Archive log is cleared out by a full database backup. However it retains all data updates applied right back to the second last backup, so you need to size your archive log with that in mind.
The log files form part of the TSM database, and unlike the legacy TSM database there is no need to create and format log volumes. The logmode is equivalent to legacy roll-forward. In DB2 terms, these are archive logs, not circular logs. This means that the log files can fill up, so log file management is still required. You can specify a failover log for the Archive log to help prevent this, but the Active log cannot failover and the size is fixed between 2GB and 128GB, so don't allocate all the space that you have available for the Active log, keep some in reserve for emergencies.
If the Active log fills up and the server stops, the process to get your TSM server up again is:
DSMSERV DISPLAY LOG - to check the current log status
Update the Active log size parameter in dsmserv.opt

Start the server up
________________________________________
TSM Database version 5.5 and earlier

Recovery log processing
Database Defragmentation
Extending the TSM database under AIX
Formatting the TSM database and log
Auditing the TSM database
Database size and disk setup
Database and log Mirroring
Disk Storage Pools
Recovery log processing
The TSM database is quite sophisticated, and uses a transaction log, called the recovery log. Multiple updates are grouped together into 'transactions', which have to be applied to the database as a unit. If the server crashes, say, and the updates in a transaction have not been applied, then the partial updates must be backed out using the log records. This all or nothing approach protects database integrity during updates.
If the server cannot update the recovery log, because it is full, then the server crashes. So its worth knowing what makes the log fill up, and how to avoid it.
The log has two pointers, a 'head' pointer and a 'tail'. The head marks the position where the next update can take place, new updates are added at the head. The tail marks the position where the oldest transaction is still processing, and also where the last update can take place. Tail movement depends on how the 'logmode' is set up. If you define logmode=rollforward, then the tail will only move when a database backup is run. If you use logmode=normal, then the tail moves when the oldest transaction completes. When the pointers reach the end of the file, they start again from the beginning. Consider the logfile as being a circle, with the head and tail pointers being points on the circumference. The command Q STATUS will tell you which logmode you are using.
The tail is then 'pinned' by the oldest in-flight transaction, and if this is not cleared before the head catches up, then the file is full. Tivoli provided a new command with TSM 4.2.2.1, 'show logpinned', which will identify the transaction which is pinning the log.
The log file usually fills up due to a combination of two events. An old transaction hangs around and 'pins' the tail, while another process is causing the head to move rapidly, so it catches up.
Long running transactions can be caused by very large database backups, or smaller backups running over slow lines. A process which is trying to recover from a tape I/O error can also hang around for a long time.
Rapid head movement is caused by something which is doing large quantities of database updates, very fast. Expire Inventory is a good example of this. There are ways to manage this
• Don't schedule inventory expiration when large backups are running
• Make the log almost as large as possible, which is about 13GB at the moment. But, leave a bit of free space so you can extend the log if the server crashes.
• Consider clearing out your log before the backups start, by temporarily reducing the dbbackup trigger. UPDATE DBB LOGF=20 should force a backup. However, remember that if you are running with logmode=rollforward, and the tail is pinned, then the database backup will not clear out the log.
• Consider running with a smaller value of dbbackuptrigger during the backup run, to help prevent the log from filling. However, this can cause lots of backups to be triggered, so use with caution.
• Monitor the log utilisation, and alert support staff, if the log exceeds, say 80%. The support staff then need to look for an process which is holding the tail, and cancel it, or look for a process which is rapidly filling up the log and cancel that. Or, to be on the safe side, cancel them both.
• TXNGroupmax (maximum number of files sent to the server in a single transaction) and TXNBytelimit (total number of bytes in a single transaction) are usually set high to speed up backup performance. If you are getting problems with your log filling up, consider reducing these to force more frequent commit points.
Recovery log processing was enhanced in TSM 4.2. If the DB Backup Trigger is set correctly, and the LOGMODE is in ROLLFORWARD, then a database backup will start when the log reached 100% full. If the Recovery log hits 100%, then TSM will stop all processes except the database backup. When the backup completes, TSM issues the message
ANR2104I Activity log process restarted - recovered from an insufficient space condition in the Log or Database.
This should help us avoid some difficult recovery situations.

________________________________________
Database Defragmentation
This contentious issue applied to legacy databases only. The legacy TSM Server database has a b-tree format, and grows sequentially from the beginning to end. When file entries are deleted by expiration processing or file space/volume delete processing, this leaves spaces or holes in that database. These may be re-used later when new information is added, but they mean that the TSM database is using more space than it needs to. The only way you can compress the database so that the 'holes' left by deleted pages are not present is to use the database unload/reload utility.
The problem is that while the dump takes about an hour, the load utility can take several hours. Does it make a difference? I have seen performance improve after defragmenting a database, and I've also see an unload/reload make performance worse. A defrag will reduce the physical size of your database.
The Tivoli team supplied a new command with TSM 5.3 for to you to check to see what an unload/reload would achieve, called 'ESTIMATE DBREORGSTATS' This will estimate the amount of space that would be recovered by an unload reload.
For older releases of TSM use the QUERY DB to see if you need to defrag your TSM DB.

Available Assigned Maximum Maximum Page Total Used Pct Max.
Space Capacity Extension Reduction Size Usable Pages Util Pct
(MB) (MB) (MB) (MB) (bytes) Pages Util
--------- -------- --------- --------- ------- ---------- --------- ----- -----
50,208 49,004 1,204 9,412 4,096 12,545,024 9,544,454 76.1 76.1

Here a 49GB database can be reduced by 9.4GB = 19%, but it is only 76% used, so 5% could be reclaimed by defragging. Some people claim that TSM tries to allocate pages in a way that leaves you with as good as possible performance, and defragging the database will degrade performance. Its also possible that after a defrag, the database will quickly become defragmented again, as it inserts data into the tree. The following formula can be used to see how much space could be reclaimed by an unload/reload.
SELECT CAST((100 - (CAST(MAX_REDUCTION_MB AS FLOAT) * 256 ) /
(CAST(USABLE_PAGES AS FLOAT) - CAST(USED_PAGES AS FLOAT) ) * 100) AS
DECIMAL(4,2)) AS PERCENT_FRAG FROM DB
A high PERCENT_FRAG value can indicate problems. If you think your database needs a defrag, then if possible, take a copy and try that first. That will give you an indication of how much time is needed for the load.

________________________________________
Extending the TSM database under AIX
Create a new file system in AIX using SMITTY
make LV
make FS on existing LV
mount new-filesystem
THEN in TSM

dsmadmc ... define dbv /new-filesystem/filename
dsmadmc ... extend db
If you use incremental database backups, then remember that after an EXTEND DB the next DB backup must be a full backup.

________________________________________
Formatting the TSM database and log
Legacy TSM database files and log files have to be formatted before they can be used. There are two different commands for this, and it is vitally important that you know the difference. If you want to add a file to the database or recovery log, then you use the DSMFMT command to format the file. The DSMSERV FORMAT looks similar but that command will format the whole recovery log and database. So just make things clear, DSMSERV FORMAT will wipe all your existing database and log files, so if you want to make a complete fresh start, that's what you use. DSMFMT will just format the file that you specify. The syntax of DSMFMT is
dsmfmt -m -log tsmlogvol7 5
Which will format a 5 meg.log volume called tsmlogvol7. Size options are 'k' 'm' 'or 'g' and data type options are 'db' 'log' or 'data'

________________________________________
Auditing the TSM database
The Audit process only applies to legacy TSM databases.
Richard Sims has correctly pointed out that a database audit with FIX=YES is a dangerous procedure. "Correcting database problems without TSM Support direction can result in worse problems, including data loss. Structural problems and inconsistencies in any database system can be much more complex than a vanilla utility can properly deal with. If one has a reason to believe that their TSM database has problems, they need to contact TSM Support for assistance in dealing with them, rather than attempt amateur surgery. IBM repeatedly advises customers NOT to attempt to fix database problems themselves".
I'd also suggest that if you run an audit, you always make sure you have a full database backup available first.
Database Audits are used to fix inconsistency problems between the database and its storage components. A full database audit can run for several hours, but it is possible to run smaller audits on parts of the database. As a general rule of thumb, a full database audit takes about 3 hours per million pages, and a 4 GB utilised database holds about a million pages. The actual times will mostly depend on the processing power of your server. An audit will write a lot of log records so if you normally run with your recovery log in 'ROLL FORWARD' mode it is advisable to put the log into 'NORMAL' mode before running an audit, then put it back into 'NORMAL' mode when the audit completes.
/dsmserv auditdb fix=yes admin detail=yes
Is a very quick check of the admin data
/dsmserv auditdb fix=yes archstorage detail=yes
will audit the archive storage, and runs for 1-2 hours depending on your database size
/dsmserv auditdb fix=yes diskstorage detail=yes
will audit the disk storage pools, and takes about 30 mins, depending on the size of the database, and how full the disk pools are. Best done when all the data is migrated out to tape.
/dsmserv auditdb fix=yes inventory detail=yes
This is the long running one, 8-12 hours.
The following information was supplied by Maureen O'Connor of Fiserv Technology in April 2007. Maureen has provided some excellent detail on how to estimate how long an aufit will take, and how to run audits against multiple TSM servers on one AIX server.
Running an audit of the TSM database can be a very long and time-consuming process, and it is not well documented by IBM, so estimations can be difficult to make.
Generally speaking, the best way to run the audit is to run it against the whole database, not just a piece of it, but if the db is very large, this can mean an extensive outage, so it should be planned well in advance.

The audit follows 33 steps:
1. ANR4726I The ICC support module has been loaded.
2. ANR0990I Server restart-recovery in progress.
3. ANR0200I Recovery log assigned capacity is 1000 megabytes.
4. ANR0201I Database assigned capacity is 2500 megabytes.
5. ANR0306I Recovery log volume mount in progress.
6. ANR0353I Recovery log analysis pass in progress.
7. ANR0354I Recovery log redo pass in progress.
8. ANR0355I Recovery log undo pass in progress.
9. ANR0352I Transaction recovery complete.
10. ANR4140I AUDITDB: Database audit process started.
11. ANR4075I AUDITDB: Auditing policy definitions.
12. ANR4040I AUDITDB: Auditing client node and administrator definitions.
13. ANR4135I AUDITDB: Auditing central scheduler definitions.
14. ANR3470I AUDITDB: Auditing enterprise configuration definitions.
15. ANR2833I AUDITDB: Auditing license definitions.
16. ANR4136I AUDITDB: Auditing server inventory.
17. ANR4138I AUDITDB: Auditing inventory backup objects.
18. ANR4137I AUDITDB: Auditing inventory file spaces.
19. ANR2761I AUDITDB: auditing inventory virtual file space mappings.
20. ANR4307I AUDITDB: Auditing inventory external space-managed objects.
21. ANR4310I AUDITDB: Auditing inventory space-managed objects.
22. ANR4139I AUDITDB: Auditing inventory archive objects.
23. ANR4230I AUDITDB: Auditing data storage definitions.
24. ANR4264I AUDITDB: Auditing file information.
25. ANR4265I AUDITDB: Auditing disk file information.
26. ANR4266I AUDITDB: Auditing sequential file information.
27. ANR4256I AUDITDB: Auditing data storage definitions for disk volumes.
28. ANR4263I AUDITDB: Auditing data storage definitions for sequential volumes.
29. ANR6646I AUDITDB: Auditing disaster recovery manager definitions.
30. ANR4210I AUDITDB: Auditing physical volume repository definitions.
31. ANR4446I AUDITDB: Auditing address definitions.
32. ANR4141I AUDITDB: Database audit process completed.
33. ANR4134I AUDITDB: Processed 187 entries in database tables and 255998 blocks in bit vectors. Elapsed time is 0:00:10.
Each step is called based on the architecture; the DSMSERV utility runs several concurrently, 5-10 at a time, returning output as each step completes and picking up the next step in order. Steps 1-9 will finish almost immediately. Steps 10-16 will run next, and will take a slightly longer time, these follow definitions in order of creation. When Step 17 begins, it will trigger Step 33, and depending on how many entries there are in the database, the output from 33 will appear mixed with the output from Steps 18-32. Step 33 is reviewing all client data in the database, this is the longest running step in the audit process.
Typical output from Step 33 (from a large database) will look like this:
ANR4134I AUDITDB: Processed 8260728 entries in database tables and 0 blocks in bit vectors. Elapsed time is 1:05:00.
ANR4134I AUDITDB: Processed 9035641 entries in database tables and 0 blocks in bit vectors. Elapsed time is 1:10:00.
ANR4134I AUDITDB: Processed 9812999 entries in database tables and 0 blocks in bit vectors. Elapsed time is 1:15:00.
ANR4134I AUDITDB: Processed 10663992 entries in database tables and 0 blocks in bit vectors. Elapsed time is 1:20:00.
ANR4134I AUDITDB: Processed 11677212 entries in database tables and 0 blocks in bit vectors. Elapsed time is 1:25:00.
ANR4134I AUDITDB: Processed 12014759 entries in database tables and 0 blocks in bit vectors. Elapsed time is 1:30:00.
Note this output refers to 'entries'. Entries are not a standard reference in TSM, this is a parsed view of data files, part of the occupancy. To estimate how many entries will be scrolled through the audit, run this formula on a command line within TSM:
select sum(num_files)*3 from occupancy
The '3' refers to the three pieces to a file: the entry, a header for the entry, and an active/inactive flag. Remember that this is only an estimate, the reason for running the audit is possible corruption, there may be pieces missing or mis-filed.
Entries are read anywhere from 500K to 1 million every five minutes, so based on the output from this formula, this is how to estimate the time for the audit to complete.
Audits can be run on pieces of the database instead of the whole - a specific storage pool or the administrative portion - this can be a considerable time-saver, but if it is unknown what part of the database is corrupt, this may not be a worthwhile option.
To run an audit, the TSM server instance must be down. If there are multiple TSM instances on a server, the DSMSERV executable must be in the primary server directory, but if the audit is running on a secondary instance, for example, parameters must be passed to operating system so the utility will know where it is looking for the database:
AIX# export DSMSERV_DIR=/usr/tivoli/tsm/server/bin
AIX# export DSMSERV_CONFIG=/usr/tivoli/tsm/server/bin//dsmserv.opt
To run an audit just on the administrative portion of the database (the fastest, 10-15 minutes), start the utility this way:
AIX# at now
dsmserv auditdb fix=yes admin detail=yes > /tmp/tsmauditadmin.log
[ctl-D]
The process will run in the background, and a log will be kept; this log can be run with the tail -f command by multiple users to track the progress.
To run the audit on the archived data (1-2 hours, depending on size of archives), enter this:
dsmserv auditdb fix=yes archstorage detail=yes >/tmp/tsmauditarchive.log
To run the audit on the diskpool (very fast if all data is migrated), enter this:
dsmserv auditdb fix=yes diskstorage detail=yes > /tmp/tsmauditdisk.log
To run on the client data only, not including the archives (still the longest running), enter this:
dsmserv auditdb fix=yes inventory detail=yes > /tmp/tsmauditdata.log
Again, running on the inventory, while it can be run separately, it is almost a moot point.
If any data is found to be damaged, location messages as well as the fix (usually a deletion) will output to the log as follows:
ANR1777I afaudit.c(967: Object 0.62882489 is WINDOWSINFDSUP.PNF for node
(257), filespace \c$ (1).
ANR1777I afaudit.c(967: Object 0.62882490 is WINDOWSINFDSUPT.PNF for node
(257), filespace \c$ (1).
ANR1777I afaudit.c(967: Object 0.62882491 is WINDOWSINFDVD.PNF for node
(257), filespace \c$ (1).
ANR4303E AUDITDB: Inventory references for object(0.62882489) deleted.
ANR4303E AUDITDB: Inventory references for object(0.62882490) deleted.
ANR4303E AUDITDB: Inventory references for object(0.62882491) deleted.
Be sure sufficient outage time is scheduled. Once an audit begins, it is not good practice to halt the process, because the current location of the audit is not truly known - a data file could be open, and halting may actually cause further corruption.

________________________________________
Legacy Database size and disk setup
The TSM database is critical to open systems backup and recovery. It needs to be 100% available as without it, it is impossible to recover files. The 'incremental forever' philosophy behind TSM means that it is impossible to build a list of files needed to recover a server without the TSM database. If the TSM database setup is not designed correctly then the database will perform badly and this will affect your ability to fit backups within the overnight window.
TSM performance is very much dependent on the size of the database. TSM performance suffers if a database becomes too large, but there are no exact rules on how big too large is. The maximum possible size for a TSM database is 530GB. IBM recommend 120 GB as a general rule, with the caveat that 'when expiration, database restores, and other Tivoli Storage Manager admin processes take too long and client restores become too slow, it is too big'. Database backup and Expire Inventory are both CPU intensive processes that can be used to indicate server performance problems in general. The only sensible answer to 'how big should a TSM database be?' is to let you database grow until these processes start to become an issue. Expire Inventory should really run within 12 hours and should be processing 3 million pages an hour or more. Backups should run in 30 minutes and process 6 million pages per hour or more, but these are just general rules-of-thumb. The actual size will depend on how fast your hosting server is, how good your disks are and what level of service you need to provide.
A TSM Database consists of a number of files, called 'disks'. As TSM will schedule one concurrent operation for each database disk it makes sense to allocate a lot of small disks, rather than a few large ones. A disk file size of 2 GB seems to be about right (The maximum possible size for a disk volume is 8 TB). IBM recommends that these database disk files be spread over as many physical disks as possible. This makes sense for low or mid tier disk subsystems, as this means that multiple disk heads can be seeking, reading, and writing simultaneously, but as high tier subsystems perform most of their I/O in cache this is less of an issue.
Most operating systems allow you to stripe files over logical and physical disks, or partitions, and recommend that this be used for large performance critical files. It is very difficult to get any kind of consensus from the TSM user community on the benefits of disk striping. For example to quote two users:-
USERA; 250GB! database on a high tier EMC DMX disk subsystem. Disk striping introduced and database backup reduced by more than half.
USERB; 80GB database striped on a mid-tier IBM FASTT subsystem striping removed and database converted to RAID5. No impact on database backup times, expire inventory run times or client backup times.
TSM will allocate a default database and log file during a AIX usually in the server installation directory /usr/tivoli/tsm/server/bin These default files should be deleted and re-allocated to your strategic size and location.

________________________________________
Database and log Mirroring
There are three levels of mirroring, Hardware controlled, Operating Systems controlled and TSM controlled.
Mirroring protects the database from disk failure, and also subsystem or site failure if the mirroring is between subsystems or sites. Mirroring also offers some protection from system failure as the chance that at least one of the mirror writes was successful is much higher. TSM mirroring can detect if a partial write has occurred then a mirror volume can be used to construct valid images of the missing pages. TSM mirroring can complement hardware mirroring. It is best to mirror both the database and the recovery log to optimise availability and recoverability.
If you are using automatic database or logfile expansion with mirroring, then this will place both the primary file and the mirrored file in the same directory, as only one directory path can be specified. This means that the primary file and mirrored file could end up on the same disk, so they will need to be separated.
This sounds obvious, but the mirrors need to be on different disks. It is possible to place them on the same disk and that would be pretty pointless. It is also possible to mirror to three ways as well as two ways. With three-way mirroring you get three copies of the data.
Hardware mirroring (RAID1)
Most disk subsystems support RAID1 mirroring, which is expensive as it needs twice as much disk, and will not detect logical errors in the data. All data is mirrored even if it is corrupt.
Operating System Mirroring
IBM state that disk striping is suitable for a large sequential-access files that need high performance. AIX supports RAID0, RAID1 and RAID10. RAID0 is not really a good idea, as if a logical volume was spread over five physical volumes, then the logical volume is five times more likely to be affected by a disk crash. If one disk crashes, the whole file is lost. RAID1 is straight disk mirroring with two stripes and requires twice as much disk. RAID10 combines striping and mirroring, and also uses twice as much disk.
If AIX is mirroring raw logical volumes it is possible for it to overwrite some TSM control information, as they both write to the same user area on a disk. The impact would be that TSM would be unable to vary volumes online.
TSM mirroring
Software mirroring just applies to the legacy database. If TSM is managing the mirror and it detects corrupt data during a write, it will not write the corrupt data to the second copy. TSM can then use the good copy to fix the corrupt mirror. TSM also mirrors at transaction level, and hardware at IO level. Hardware will always mirror every IO, but TSM will only mirror complete transactions. This also protects the mirror from corruption.

________________________________________
Disk Storage Pools
TSM will only perform one concurrent IO operation to every storage pool volume, so it is better to have a number of smaller volumes than a single large volume in your disk storage pools. Also, it is easier to relocate a small volume to a different disk pool if space requirements change. However, every volume will use one TSM processing thread, and the TSM server will crash if too many threads are allocated.
The normal process is to initially write backup data to disk, then move it off to tape. It is possible to copy the data to tape but not delete it from the disk pool, if the disk cache setting is set to 'yes'. The TSM server would then use the disk data for restores and delete it as space is needed for fresh backups. This would speed up recovery, but slow down backups as the TSM server has to do extra work clearing out data, and would also make the TSM database bigger as TSM needs to store two locations for recently backed up data. It is your choice, faster restores or slower backups and a bigger database.

TSM Performance Tuning
General Angles
Windows Network Duplex Settings
Trace options
Buffer Parameters
Multi-streaming
Netware Directories
Server Operations
Indications of disk problems
Hardware Configuration
Tape Operations
General performance considerations
If your TSM performance is not too hot, there could be a lots of reasons why. Here's a list of some of them.
• size of files being backed up
• number of files being backed up
• rate that files can be read from disk
• concurrent read datastreams from the same disk
• rate that client can send data
• network topology between client/server
• rate that server can receive data
• concurrent data streams into the server
• tape drive speed (streaming, start/stop)
• bus speed to tape drive
• concurrent data streams for multiple tape drives sharing a single bus
• compressibility of data
• I/O capability of tsm server
• cpu speed of tsm server
• anti-virus software can slow backups down
There is no easy way to work out exactly what is causing your problem. A good starting point is to find out if the problem is with TSM, or with the hardware, or the network. FTP a big file from the affected client to the TSM server disk, and see how long it takes. FTP will always be a bit faster than TSM, as it has no database overhead. However if the FTP times are slow, the problem is probably outside TSM. A common network problem is mixed full duplex/half duplex environments.

________________________________________
Windows Network Duplex Settings
If the backup throughput from a Windows server suddenly dies, or if you start to backup a new server and it is going very slow, a very common cause is that the network speed and duplex settings are wrong. They should be set to 100MB full duplex, not AUTO. On a Windows 2000 server, go into
Start
Settings
Network and dialup connections
Click on the LAN symbol
click on properties
Click on Configure
Open the Advanced window
Page down to the 'Speed and Duplex' panel
Check the value

To check this from an NT server, from the windows server go into
Start
Settings
Network and dialup connections
Hit the 'Configure' button
Take the 'Advanced' window
Select the 'Speed and Duplex' option
Check that the setting in the drop down window is set to 100MB full


________________________________________
Trace Parameters
You can focus in on a problem by adding a trace parameter to your tsm.opt file. The parameter is

trace flags instr_client_detail

In addition to the summary information you usually get after backup
you'll find something like this:
------------------------------------------------------------------
Final Detailed Instrumentation statistics
Elapsed time: 1502.420 sec
Section Total Time(sec) Average Time(msec) Frequency used
------------------------------------------------------------------
Client Setup 15.081 15081.0 1
Process Dirs 331.908 191.4 1734
Solve Tree 0.000 0.0 0
Compute 1.021 0.0 76446
Transaction 43.535 0.2 244347
BeginTxn Verb 0.000 0.0 269
File I/O 733.274 8.2 89169
Compression 0.000 0.0 0
Encryption 0.000 0.0 0
Delta 0.000 0.0 0
Data Verb 474.198 6.2 76446
Confirm Verb 0.251 16.7 15
EndTxn Verb 195.729 727.6 269
Client Cleanup 1.612 1612.0 1
------------------------------------------------------------------
You need to bounce DSMCAD to start, stop or change trace parameters.
you can use this information to get an idea where most time gets wasted. If just some of your clients are performing badly, then compare traces between good and bad clients.
Other trace options are
tracefile output_file_name.txt
traceflag perform

________________________________________
Buffering
Efficient use of buffers can speed up TSM backups and restores by up to 30%. In this context, buffering means both preloading data into central storage, and grouping data together into larger chunks for efficient transfer. Do a 'q db f=d' command to view the cache hit percentage on your server during high activity. If it is below 98 percent or so, you need to investigate improving your buffering.
The TSM parameters which control buffering are -
SELFTUNEBUFPSIZE
BUFPOOLSIZE
These two parameters are complementary. You can set the size of your bufferpool yourself, using BUFPOOLSIZE, or you can let the system decide what to use by setting SELTUNEBUFPSIZE to 'YES' These parameters determine how much CPU cache to use on your TSM server. The recommendation is to use about 10% of physical memory, with a target of keeping the cache hit ratio at 98% or higher. The recommendation is to use SELFTUNEBUFPSIZE
Beware of setting BUFPOOLSIZE too high, as that can cause TSM to hold so much memory for its own use that the Operating System doesn't have enough. This will result in high paging, and poor TSM performance.
Use the command q db format=detail to see what you database cache hit rate is. If the system is paging, then the results of this command will be misleading. If the cache hit rate is too low, try raising the value of BUFFPOOLSIZE, 1MB at a time. You have to restart the server for the values to take effect. Check that the increase has not caused paging, then repeat the q db command. to see if the cache hit rate has risen.
TCPNOdelay
Set this to YES
USELARGEBUFFERS
The default setting is USELARGEBUFFERS YES, make sure its set on both the server and the clients
DISKBUFFSIZE, LARGECOMMBUFFERS
LARGECOMMBUFFERS is a client parameter which seems to be interchangeable with USELARGEBUFFERS, it should be set to YES. If only life were so simple. For the large buffers to take effect, every single link in your network must also be configured for large buffers. If you have fast ethernet then make sure you explicitly configure the speeds on the switch ports rather than setting them to autodetect, to prevent transfer size mismatches
LARGECOMMBUFFERS=YES was replaced by DISKBUFFSIZE=nn in TSM 5.3 DISKBUFFSIZE=32 seems to work well for Windows clients at least.
TXNGROUPMAX
TXNBYTELIMIT
These pair of parameters are used to batch up small file transfers, so the transfer overhead on an individual file is shared out. The default sizes are quite small. You should try experimenting with larger sizes, to find the optimum for your configuration. IBM Recommend setting TXNGroupmax at 256 and TXNBytelimit to 2048 when the primary storagepool is on Disk. For tape, txnbytelimit=2097152 seems to work well for LTO, DLT and 9940 drives, while 25600 seems best for 9840 and 3590 devices.
If you increase TXNGroupmax and TXNBytelimit, keep an eye on your recovery logs, as they will need more space. If you find that performance actually gets worse, it possible that this is due to faults on your network, which are causing a lot of retries. Retries will take longer with bigger data chunks, which can totally offset the benefits of lower transport overheads.
TCPWindowsize
TCPBUFFSIZE
Setting depends on your TSM server platform. 63 is best for an Windows servers, and 64 for a UNIX servers. If a Windows 2000 server is communicating with Windows 2000 clients only, then the TCPW parameter can be larger, as Win2k supports TCP window scaling. Try a value of 512 for TCPBUFFSIZE, this seems to work well for WIN2K clients.
________________________________________
Multi-streaming
RESOURCEUTILIZATION is a flag which you set in the client options file, which enables multiple backup streams. The resources are the number of control sessions (sessions that figure out what to back up) and the number of transfer sessions (sessions that actually back up or archive the data). If you set RESOURCEUTILISATION to 8 on a client, then it will use not necessarily use 4 concurrent data transfer sessions and 4 control sessions. RESOURCEUTILIZATION just provides a guideline for the number of resources the client should use. The number of concurrent sessions you get will be based on the real-time performance characteristics of the client, and the value of RESOURCEUTILIZATION. The higher the RESOURCEUTILIZATION value, the more producer/consumer sessions the client may use, but if the system is starved for other resources, or the number of files to process does not warrant it, then a larger number of sessions may not be used, even with a large RESOURCEUTILIZATION value.
________________________________________
Directory structure restores
Few sites can afford the luxury of keeping all their TSM backup data on Disk (Anyone out there?) However, if you're recovering a Netware server, or even a large directory structure, then the restore goes a lot faster if the directories are held in a separate, disk storage pool
Set up a disk storage pool for directories, and allocate a management class which sends the directories to it. This disk storage pool should not require a lot of space, since directories are typically very small.
Then specify option DIRMc directorymgmtclassname.

________________________________________
Server tasks
EXPinterval
This parameter specifies how long between automatic expiration of backup and archive files. This process is very CPU intensive, and needs to run at a quiet time. Its best to set EXPinterval to 0, and run expiration from an Admin schedule.
Logpool size
This parameter determines the size of the recovery log buffer pool. If the buffer pool is not big enough, transactions will wait while recovery records are written to the log. You can see if this is a problem by using the command q log format=detail The command will show the wait percentage, which ideally should be 0. If its not 0, try increasing the Logpoolsize parameter, but make sure it does not affect overall system memory usage. A logpoolsize of about 4096 is about standard.
TSM Server caching is designed to optimize restore times but sites have experienced slow migration times with caching active. If you are having problems with migration, consider turning caching off, but be aware that this could affect restore speeds.
________________________________________
Indications of disk problems
The following two SQL queries are based on an IBM white paper and are intended to help you decide if your TSM server disks need tuning. The basic idea is to look at how fast your database backups and expire inventory are going, and if they are below 'normal' figures then you might have disk issues.
Database Backups
Run the following SQL query on your server. The query is just shown as one long line so you can cut and paste it without having to remove end-of-line markers.
select activity, ((bytes/1048576)/cast ((end_time-start_time) seconds as decimal(18,13))*3600) "MB/Hr" from summary where activity='FULL_DBBACKUP' and days(end_time) - days(start_time)=0
output looks something like
ACTIVITY Date MB/Hr
------------------ ---------- -----------------
FULL_DBBACKUP 2005-01-30 31026
FULL_DBBACKUP 2005-02-06 33976
IBM state that if the backup is process less than about 28 GB per hour per hour then this might indicate a disk problem and further investigation is advised.
Another possible indication is expire inventory processing. Try the following SQL query
select activity, cast((end_time) as date) as "Date", (examined/cast ((end_time-start_time) seconds as decimal(24,2))*3600) "Objects Examined Up/Hr" from summary where activity='EXPIRATION'
output looks something like
ACTIVITY Date Objects Examined Up/Hr
------------------ ---------- ---------------------------------
EXPIRATION 2005-01-24 2086078.85587918800
EXPIRATION 2005-01-26 1430425.08811519200
EXPIRATION 2005-01-26 2309000.17643557200
EXPIRATION 2005-01-27 2343761.01158234400
EXPIRATION 2005-02-04 579753.49273113600
EXPIRATION 2005-02-04 64950.70455612000
EXPIRATION 2005-02-06 131093.51240872800
It is difficult to say what is acceptable with this query as so many factors can affect the throughput. If the throughput drops suddenly then this may indicate possible disk problems. The query above is clearly indicating a potential problem after Feb 02.
________________________________________
Hardware Configuration
Try to spread your database and log volumes across SCSI controllers.
Use several small volumes for disk pools rather than a small number of large volumes. Sessions lock volumes so more volumes means more simultaneous sessions.
Consider defining more TSM servers, to split your database. The hardware configuration dictates the size of the database that you can support. If your database backup is taking more than a couple of hours, then either you need a bigger TSM server, or two servers. Typically, an H70 RS/6000 will support a 70GB database, but that is too big for an R40.

________________________________________
Tape to Tape copy performance
The options which affect tape-to-tape copy most, are movebatchsize, movesizethresh and bufpoolsize. Bufpoolsize is explained above.
Movebatchsize and Movesizethresh determine how many files are grouped together and moved in a single transaction. Movebatchsize is the number of files which will be grouped, movesizethresh is the cumulative size of all the files that will be moved. Files are batched up until one of these thresholds is reached, then the files are sent. The default for movebatchsize is 40, but consider setting this to 1000 (the maximum), and set movesizethresh to 500. However, if the numbers are set high, then you will need more space in the recovery log. If you change the settings, keep an eye on the log for a while, and make sure it is not getting too full.

These parameters, and TXNGROUPMAX, can be dynamically changed by TSM, if SELFTUNETXNsize is set to YES.
There is a new (TMS4.2) parameter, TAPEIOBUFS, which can speed up access to 3590 tapes on AIX servers. The default value is 1, and it can be set up to 9. I have no experience to make a recommendation on this one.



ServerGraph - A TSM Management and Automation tool
Overview
Servergraph is a useful browser based reporting and automation tool for managing multiple TSM servers. It extracts data from TSM logs and databases and combines these together to give you a composite picture of your TSM estate. It can issue problem alerts, provide trend graphs, automate manual tasks and more. Originally a TSM tool, Servergraph has been recently enhanced to support Netbackup and Legato too. This page concentrates on TSM management.
Installation
Servergraph is reasonably easy to install. It comes supplied with detailed documentation so there is no point in reproducing it here. You have two install options; a normal fully automated install that will update your system files after taking backup copies, or a manual install with a -m switch. With a manual install you are fully in control of what gets updated, but it takes a lot longer and you have to take your own system backups.
Some third party software is required for the install, including MySQL DBMS, Apache web server and Perl. The automatic install will check for these and install them for you, if they do not exist already.
One advantage Servergraph has over other TSM reporting products is that it only needs to be installed on one machine, it does not need agents installed on every TSM server or client.
Features
Servergraph extracts data from the TSM databases and logs and formats that data into its own MySQL database. That minimises the performance overhead on TSM, as data processing and interpretation is carried out away from the TSM databases
ServerGraph Alerting traps all the TSM error messages and picks out the two hundred or so that are important. It can pass these to Tivoli TEC or other messaging products as SNPE alerts.
It is possible to tailor the Servergraph front logon screen, but the default provides lots of useful information, including a summary of the results from the previous day's backups, a 30 day backup history, a summary of the state of the databases and logs and a summary of the state of the tape volume and drives. This lets you get an instant impression of the state of the backups and backup infrastructure on one screen. You can then drill down into most of these summaries and get detail, and extract that detail as Excel or csv files

You can also expand the left hand menu to get details about the whole enterprise, and from individual servers. Expand the 'enterprise' menu then the 'nodes' menu and select 'top 50 storage abusers'. This will list fifty nodes sorted by their 'hog factor', which is the ratio of primary to local storage. I've found this report to be very useful to identify nodes where the retention period is incorrect, and so is keeping backups for far too long.
While the picture above just shows TSM servers, if you run Legato and / or NetBackup, ServerGraph gives you a composite picture of how all three are working from one screen
From the same Enterprise menu, select 'Long Term Trends' then highlight '90 days' and you see a report like this

The 'Predictions' option shown above displays charts that illustrate how the capacity of databases, logs and storage pools is changing and when they will run out. This is useful for both Capacity and Budget planning.
Tips
A particularly useful graph is one which shows each nodes behavior over 24 hours. This data can be used to detect nodes suffering from speed problems due to incorrect network card settings.
Some of the ServerGraph pages are wider than a screen and you have to scroll right to see the edge. This can be a bit annoying, but you can hide the Left side navigation menu by going into Main > Hide tree then you should not need to scroll. To see the navigation tree again, use Main > Show Tree.
Function key F5 will take you back to the primary menu from wherever you are in ServerGraph.
ServerGraph communicates with the TSM servers on port 1500 by default. You must ensure that this port is open and accessible through your firewalls.
ServerGraph 3.5 New Features
The following is a short summary of some of the changes and new features in ServerGraph 3.2
User Views allows you to group nodes and users together, so you can; restrict some user's views to a subset of nodes; group nodes by business unit, geographical area or any other grouping that makes sense to you.
The health check graphs now display the volume count in a pool.
The library volumes table has been enhanced to display volume utilisation, error rates, scratch status and DRM status on one screen.
A help box is available to explain the meanings of the column names on the node data table.

TSM Tape Management
Tapes seem to be the biggest problem area in TSM. Issues range from defining and using specific drives, to managing and deleting tape storage. Click on the buttons below to see different problem areas with tapes
Tape Hardware issues
discusses some of the issues with damaged or overwritten tapes, sharing tape libraries and 'virtual tape' systems. Tape drives are discussed elsewhere in this site.
Tape Library Management
discusses how TSM handles tape libraries
Freeing up old tapes
discusses how to check your scratch pool, and how to free up older tapes
General tape tips
talks about the effects collocation has on tapes, and how to calculate TSM tape requirements.

Hardware Issues
Library Sharing
Virtual Tapes
Finding Faulty Tapes
Overwritten Tapes
________________________________________
Library Sharing
How you define library sharing with TSM depends on whether its a SCSI library or a 3494 library, and whether or not you have a SAN.
Library Sharing for SCSI libraries requires that you define the library as follow :

for the library manager
DEFINE LIBRARY lib_name LIBTYPE=SCSI SHARED=YES DEVICE=drive_name
for the library client
DEFINE LIBRARY lib_name LIBTYPE=SHARED PRIMARYLIBMANAGER=server_name
Library Sharing for 3494 libraries does not use the library manager/client configuration as described above. It needs the '3494SHARED YES' server option instead.
You still need to use separate categories for the different servers otherwise you may end up with two servers having the same private/scratch volume in the library inventory. What 3494 library sharing brings is the ability to define all drives to all the servers sharing the 3494. The TSM server will detect if a drive is available and will retry based on the new retry options that were added in 4.1 (DRIVEACQUIRERETRY
and MPTIMEOUT options).

Virtual Tape Systems
The virtual tape section discusses some the the different virtual tape systems available. Basically there are two types, those that replace tape completely with disk, and those that front real tapes with a big disk cache. Most hardware implementations use post-process deduplication these days, which avoids storing duplicate data. TSM 6.1 will include software de-duplication.
The advantages of using VTS with TSM, is that you get a lot of virtual tape drives, and can run a lot of processes in parallel. One of the other advantages of a VTS is that it allows you to fill up big tapes with small files. TSM does this for you anyway. If you need to recall data from physical tape to VTS cache, this can add a considerable overhead.
A VTS that still uses physical tapes has another quirk - the logical vs physical implementation. VTS emulates physical tape in that when a file is expired, it still uses space on the virtual tape volume. A virtual volume must be re-written to reclaim scratched space. When virtual volumes become scratch they too occupy space on the physical tapes, which again must be re-written to reclaim the space.
The jury is really still out on this one. It seems that TSM can get a lot of benefit from disk only virtual tape, and for example the Sepaton product is used by quite a few customers. The benefits of a physical tape backend are less clear.

Finding faulty tapes
Follow these steps to identify and fix faulty tapes.
List unavailable tapes
q volume access=readonly and q volume access=unavailable
This will give you list of tapes that have been put in this state, probably because the system has identified an error. However, a tape will be marked as unavailable if TSM tried to mount it and it is not in library.
Look for tapes which have errors
select volume_name, read_errors,write_errors from volumes
where (read_errors > 0 or write_errors > 0)
This will give a list of tapes that have reported read and write errors. If you have a lot of these, consider upping the thresholds so you can concentrate on tapes with a lot of errors first.
To fix a problem, run the audit command
AUDIT VOLUME volser FIX=YES
If a part of a tape is faulty, the audit command will try to fix it. If cannot fix a problem file, and a copy exists on another tape, then you need to use the RESTORE VOLUME command. If there is no copy, then the AUDIT command just deletes the entry from the database.
If the tape is hopelessly trashed, and you do not have a copy, the only answer the 'delete volume discard=yes' command. However, its always worth trying a MOVE DATA command first to see if you can rescue something from the tape.

What happens if a tape is accidentally overwritten by another application?
If you have a copy pool, you can restore the tape, otherwise You have to tell TSM to throw the data away.
To restore the volume, use the command
restore volume volname preview=yes
and look at actlog after this process finishes. It will show you all the copytapes needed to recreate the primary tape. Get all these tapes back from your offsite copy group, and run the command again without the preview=yes. The old tape will be marked as destroyed and the data copied to a new tape. The old volume will then be deleted once all the data is restored.
To discard the data use the command
DELETE VOL volname DISCARDDATA=YES
or if that fails
AUDIT VOL volname FIX=YES

and the active data will be backed up again on the next run. Of course, if this is older backup versions, then they are gone forever.
If a tape appears to be assigned to a remote server and the remote server knows nothing about it, you can delete it from the volume history, and so free the tape up for reuse.
delete volhist todate=today type=remote volume=volumename force=yes

TSM - Library Tape Management

Moving Tapes offsite
Freeing up library slots
Reclaiming offsite tapes
Reporting on tape usage
Checkin and Cneckout
________________________________________
Moving Tapes offsite
There are two parts to volume movement, updating TSM so its knows what is happening with the volumes; and physically managing the automated library inventory by using checkin/checkout commands.
If you are planning to take tapes offsite to a vault, then the important step is to update the access mode of those volumes to 'OFFSITE'. This tells TSM that it can still so some data processing like Reclamation and Move Data commands, but it will NEVER request a mount of the actual volume (it uses the primary copy instead). Note that only copy pool volumes can be set to 'OFFSITE' - this is because TSM always expects to have its primary pool volumes available (i.e.:mountable). The checkout operation for the offsite copypool volumes is a necessary extra step to get the tapes out of the library inventory.

________________________________________
Freeing up library slots
If your library becomes full, you may need to free up some slots. The trick is to eject older tapes from the library that you are not likely to use for a while, and keep the active tapes in the library. The 'MOVE MEDIA' command gives us the ability to have a combination automatic/manual library. We can 'move' tapes outside of the library to a nearby 'location', but the tapes are still considered as mountable. The distinction is whether they have the media state of 'MOUNABLEINLIB' or 'MOUNTABLENOTINLIB', and this tells TSM whether to ask the robot for to mount the volume or to issue a manual mount request. When processing an manual mount request, you must use the 'Checkin Libvol' command to update the library inventory and tell TSM that the tape is back in the robot (since that is ultimately how the tape gets mounted).
TSM will automatically toggle the volume's access mode from ReadOnly to ReadWrite and back again as it is moved in and out of the library. This is to allow any read operations to proceed (e.g.: restore) and cause a manual mount request, but write operations will not attempt to access the volume.

________________________________________
Reclaiming offsite tapes
You don't have to bring your offsite tapes in to do reclamation.
Set your copypool reclamation to a reasonable level, say 60%. TSM knows what files are still valid on offsite volumes that are to be reclaimed. It finds the copies of those files in the primary storage pool (which is still in the library); it moves a scratch tape to the copy pool and copies the files from the primary tape pool to the new copypool tape. The new copy tape is then marked to go offsite, and the old one marked for return.

________________________________________
Reporting on Tape usage
The following query will produce a report of the space usage of all storage pool volumes, summarised by storage pool and status.
SELECT STGPOOL_NAME AS STGPOOL,
COUNT(VOLUME_name) AS COUNT, STATUS,
CAST(MEAN(EST_CAPACITY_MB/1024)
AS DECIMAL(5,2)) AS GB_PER_VOL
FROM VOLUMES
GROUP BY STGPOOL_NAME,STATUS
Example output is shown below
STGPOOL COUNT STATUS GB_PER_VOL
------------- ----------- --------- ----------
ARCHCOPYPOOL 417 FULL 1.10
ARCHIVEPOOL 2 ONLINE 2.78
ARCHTAPEPOOL 2 FILLING 27.58
ARCHTAPEPOOL 2 FULL 49.60
BACKUPPOOL 7 ONLINE 38.26
CARTPOOL 1 EMPTY 0.00
CARTPOOL 221 FILLING 22.01
CARTPOOL 1021 FULL 33.99

________________________________________
Checking and Checkout
TSM needs to know where it's tapes are if they are stored in tape libraries and to keep it informed you use CHECKIN and CHECKOUT commands
A tape library has a small compartment usually called an IO station with a door. You place your new cartridges into the IO station then run a checkin command. The syntax of the command varies slightly depending on what type of library you have. The command for a SCSI library is
Checkin libv Library_name search=bulk checklabel=yes status=scr
This will read in all the tapes in the IO station, read the labels and define them to TSM as scratch. To check in a single named tape that contains required data, maybe something you are importing from a different system, try
Checkin libv Library_name Volume_name checklabel=yes
However be aware that the robot will select the first tape from the IO station, it will not scan the IO station for your tape. I've had the Library Manager tell me the IO station was empty apart from a specific tape that I wanted, yet TSM kept selecting an incorrect tape on checkin. Eventually I tried a bulk checkin and then discovered there was four foreign tapes in the IO station.
Most of us have several TSM servers that share a physical library. This is done by partitioning the library into several logical libraries. Say you have two servers, TSM1 and TSM2 each with virtual libraries VLIB1 and VLIB2 In this case it is quite easy to transfer data between TSM servers. You run an Export, note the tape used (xyz123) then check it out VLIB1 like this
CHECKOUT LIBV VLIB1 xyx123 REMOVE=YES
This will place the tape into the IO station. You then log into the Library Manager (the control software for the tape library) and re-assign the tape from VLIB1 to VLIB2. Finally you check it back in with
CHECKIN LIBV VLIB2 tapeno xyz123 STAT=PRI CHECKL=YES
If you find the tape is rejected with an invalid label, try a bulk checkin.
If a volume has been removed from the library, but TSM has not been informed you can clear it from TSM with the command below. Checklabel=no and remove=no means that TSM will do no validation, it just removes it from the database.
Checkout libvolume Library_name Volume_name checklabel=no remove=no

TSM - Freeing up Old Tapes
Reuse delay
Tape Mount and Dismount issues
maxscratch parameter
How to query # scratch tapes
TSM thinks a tape contains data, but it is empty
Altering the MOUNTABLE state
Running Expire Inventory and Reclamation
________________________________________
Its not unusual for TSM to run out of tapes because of the way it is engineered. That can make it vital that tapes are returned to the 'scratch pool' for reuse quickly. So why does TSM seem to hang on to empty tapes?
Reuse delay
If a tape has no active files left on it, but it still does not become scratch then the ussue might be the reusedelay parm on the copystoragepool.
A tape is not necessarily released for scratch straight away. Imagine the worst has happened, you've had a disaster and have had to restore your database back 48 hours. If tapes have been reused in the past 48 hours, then the database will not be accurate, data will be missing. To prevent this, you have a parameter called REUsedelay. This specifies the number of days that must elapse after all files are deleted from a volume before the volume can be rewritten or returned to the scratch pool. The default value for this is 0, but it may have been set to 5, say, to avoid problems with database rollback. That's one reason why tapes do not get recycled quickly.

________________________________________
Tape Mount and Dismount issues
TSM will keep adding data to a 'filling' tape until it is full. However, it will sometimes mount a scratch tape even if there is a 'filling' tape available for that node. This is because TSM will not wait for a tape that is currently dismounting. The logic is that it is faster to ask for a new scratch tape than to wait while a filling tape is dismounted, stored, retrieved then remounted. There is no easy answer to this feature, except to juggle your KEEPMP and MOUNTRetention values to minimise the risk.

________________________________________
Maxscratch parameter
The name of parameter can be a bit confusing, as it limits the total number of tapes that a storage pool can contain, not the total number of scratch tapes. If your tape pool processing starts failing with insufficient space errors, then one cause can be that the maxscratch limit has been reached. You may have plenty of scratch tapes in your library, but TSM will not use them. The maxscratch sets the maximum number of volumes that can be used by each storage pool.

________________________________________
How to find out how many scratch tapes exist
The q vol command will only give you information about storage pool volumes, and so does not report on scratch volumes as they are not associated with s storage pool. You need to use the following SQL

select count(*) as Scratch_count
from libvolumes
where status='Scratch'

________________________________________
TSM thinks a tape contains data, but it is empty
You have a CopyPool volume, which is EMPTY and OFFSITE, but the tape does not change to scratch as normal. You cannot move the data off the tape because it is empty. You cannot delete the tape, because it contains data, not even with the discard data option. The tape needs to be audited, but to do this it must be on-site. recall the tape to your site and run an 'AUDit Volume VolName Fix=Yes'.

Altering the MOUNTABLE state
A volume is empty, but is not in scratch status because the volume STATE is mountablenotinlib. To change the STATE of the volume use the command
MOVE MED vol_name STG=pool_name WHERESTATE=MOUNTABLENOTINLIB
This will move the volume back into the scratch category

________________________________________
Running Expire Inventory and Reclamation
Expire Inventory deletes unwanted backups from the TSM catalog and marks the backups on tape as expired. Its best to run expire inventory daily. Once the data is expired, then reclamation can release partly used tapes. To do this, schedule the following command

EXPIRE INVENTORY
Its best to do run expire inventory at a time when no, or few backups are running, as it waits when it hits a filespace that is being backed up, and will hold onto the recovery log, which can cause the log to fill up. You also want to avoid running EXPIRE INVENTORY alongside your TSM database backups.
EXPIRE INVENTORY has two undocumented and unsupported parameters, BEGINNODEID=nn and ENDNODEID=nn where nn are decimal node numbers. These can be used to limit the amount of work the process does but please note, these parameters are UNSUPPORTED, you use them at your own risk.
The only really good reason to run a limited EXPIRE INVENTORY is if you've just deleted lots of data from a node and would like to get it deleted quickly from the database. The problem is that you need to find out the node number of the filespace you just deleted the data from.
The only way I know is to use undocumented SHOW commands. You need to start with the object number for the NODE table, then drill down the b-tree to find the Node number of the file space that you are after.
Start by using SHOW OBJDIR to find the node number of the Nodes table. This is normally 38.
Then use SHOW NODE 38 to see the top level tree structure of the NODE table.
SHOW NODE 38

(SERVVM57)
<- Subtree=
Record 9 (KeyLen=9, DataLen=4):
Then finally do a SHOW NODE number for the subtree that contains the node you are interest in.
In the SHOW NODE against the subkeys, the KEY is the NODE_NAME. Field 1 is the node number and field2 is PLATFORM_NAME.
SHOW NODE 8344593

Key:
->(SERVVM3V)(00000134)(WinNT)(plus lots more fields..)
So from that you see the the client node that you are interested in, SERVVM3V has a node number of 134.
Roger Deschner has pointed out an error in the original text of this page 'The node numbers you get out of the database with the SHOW NODE command are in hexadecimal. However, the node numbers you specify on the EXPIRE INVENTORY command must be decimal. You've got to convert them from hex to decimal yourself. To expire node SERVVM3V you must specify node number 308 (decimal), rather than 134 (hex)'.
So using the correct decimal number, to just expire data from that node you would use the command
EXPIRE INVENTORY BEGINNODEID=308 ENDNODEID=308
The EXPIRE INVENTORY process runs as an atomic transaction so if you need to cancel EXPIRE INVENTORY you should always use the CANCEL EXPIRATION command, as that will terminate the command cleanly and mark the transaction as complete. Next time you run the transaction, it will start up from where it left off. If you cancel with the 'cancel process_number ' command or cancel using the GUI, TSM rolls back the transaction to the beginning, so next time you run the command it will start again from the beginning.
Reclamation copies half empty tapes onto empty new tapes to consolidate the data and free up tapes. You probably want to control the times when reclamation can run, as it will use 2 tape drives. Schedule command
UPD STG cartpool RECLAIM=40
to start it, and switch it off again with
UPD STG cartpool RECLAIM=100
If you run command Q VOL F=D, this will show how many tapes have reclaimable space of 40% or more, and will be processed. The numbers on this command don't necessarily add up. Estimated Capacity (MB) shows what is the apparent full capacity of this tape devclass. This may be much lower than what can actually be put on tape. Pct Util shows how much of that estimated capacity is currently used. Pct. Reclaimable space shows how much of the space that was occupied by data is now free, not the percent of the ACTUAL capacity of the tape that is free.
Estimated Capacity (MB): 18,499.6
Pct Util: 47.9
Volume Status: Full
Access: Read/Write
Pct. Reclaimable Space: 54.0

TSM - General Tape Tips
Calculating Tape requirements
Collocation
Controlling wait time for tapes
Auditing Tape Libraries
Auditing Volumes
________________________________________
Calculating Tape requirements
A method to forecast future tape growth. This assumes that you had steady growth in the last month, and you expect this to continue. If you know you are going to add 200 more servers next month, then they represent a step growth, which will be over and above the incremental growth forecast here. This method also assumes that you copy your tape storage pools.
Query the TSM server to get statistics on data copied from the primary storage pool in the past month using
Q ACTL BEGINDATE=-30 SEARCH=ANR1214
add up the daily archive and backup copy from primary to copypool storage and divide by 30; now you have a per day average that is being backed up. Double the value because if you have primary and copy storage pools. Call this value 'DailyData'.
now find out how much data is held per full tape, on average, with the following query
SELECT STGPOOL_NAME AS STGPOOL,
CAST(MEAN(EST_CAPACITY_MB/1024) AS DECIMAL(5,2))
AS GB_PER_FULL_VOL
FROM VOLUMES
WHERE STATUS='FULL'
GROUP BY STGPOOL_NAME
add the average of each storage pool and divide by the number of storage pools; this should give you an average capacity for a tape from all data types and using compression. Call this value 'AvCap'
DailyData / AvCap = TAPES TSM WANTS PER DAY!
Now we need to calculate how many tapes TSM frees up each day. Use the query
Q ACTL BEGINDATE=-30 SEARCH=ANR1341
Add these lines up and divide by 30 days. This will give you the number of tapes reclaimed a day.
The difference between the number of tapes used, and the number of tapes reclaimed is your growth rate. Its unlikely to be a negative number.

________________________________________
Collocation
Setup
With collocation on, when TSM starts a backup, or migration for a given client, it tries to put the data on a 'filling' tape first, where data already exists for that client. If there isn't one, it selects a scratch tape unless the pool has already reached MAXSCRATCH, in which case it puts the data on the least-full tape available in the pool.
Setup? Just turn on collocation and set a MAXSCRATCH for your existing tape pools. New data will be collocated, and existing tapes will gradually get collocated as they go through the reclaim process.
Restores run significantly faster with collocated tapes, as fewer tape mounts are required.
Problems
One of the problems with collocation is that you can end up with very little data on very high capacity tapes. If TSM has a new client, and a scratch tape available, it will use the scratch tape, rather than a 'filling' tape with very little data on it. To efficiently fill your tapes, you need MAXSCRATCH set to fewer tapes than you have clients.
The other side of this is that your tapes will start to fill up over time, and there will be tapes that are full but not yet eligible for reclaim, so you will have fewer FILLING tapes in the pool. So you either have to increase MAXSCRATCH or reduce RECLAIM%, or you have fewer and fewer filling tapes and gradually lose the benefits of collocation. You really need the 'suck it and see' approach to find out the best balance between MAXSCRATCH vs. RECLAIM% for your site.
If you do set MAXSCRATCH to less that your number of clients, then you need to realize that you will never have any scratch tapes in your pool. Your tapes will always be either 'full' or 'filling'. If you use MOVE DATA to free up a tape, you will be back to square 1 after the next backup run. That's the way collocation works.
Another possible problem is that with collocation on you get a LOT more tape mounts during migration. Also, if you copy your onsite collocated pool to an offsite non-collocated copy pool, you will get more mounts during reclaim of offsite storage pools, and during backup stgpool. Some tape drives cannot handle all that activity. You can reduce the amount of extra tape activity somewhat by trying to schedule your copy storage pool before migration happens, so that most of the data goes from disk to tape, rather than tape to tape.
The term 'Imperfect collocation' is sometimes used to describe the situation that occurs when collocation is enabled, but there are insufficient scratch tapes to ensure that each node stores its data on different tapes. Some nodes will have their own tapes, and some will share, so some collocation will happen.

________________________________________
Controlling wait times for tape mounts
By default, a process will wait 60 minutes for a tape mount, before it gets cancelled. To change this, use the server option
MOUNTWAIT n
where n is a number in minutes from 0 to 9999.
The parameter will only start counting once the process gets the mount message, so if a process is waiting because all the drives are in use, it will not time out.

________________________________________
Auditing Tape Libraries
You need to run an audit occasionally to make sure that what TSM thinks is in your library, matches reality.
The audit command is
AUDIT LIBRARY library_name CHECKLABEL=BARCODE
The CHECKLABEL=BARCODE switch is optional, but it will make the audit go pretty fast, say 5 minutes. With that switch, all the audit involves is your robot scanning the barcode labels of all the tapes. Without that switch, the default action is to mount each tape, which will take a long time.
The audit may wait until all tapes are dismounted from drives, so it could take a while for an audit to start. Consider canceling tape processes if your audit is waiting, and you need it in a hurry.

________________________________________
Auditing Tape Volumes
You can audit a tape volume with the command
audit volume volser fix=yes
and this will check all the backups on the tape and fix the database entries for any that are damaged. It is also possible to audit all the volumes in a storage pool with a single command
audit volume stgpool=pool_name fix=yes
This could audit a lot of tapes and take a long time so you can restrict it by date. For example say you had a tape drive that went faulty on March 25th 2005 and was fixed on April 1st 2005. You want to check all the tapes that were written in that period for errors. Use the command
audit volume stgpool=pool_name fromdate=032505 todate=040105
If you chose a volume that is part of a volume set because it contains files that span volumes TSM will select the first volume in the set and scan them all, even if you pick a volume in the middle. If you just want to audit one specific volume in a set then you need to use the skippartial parameter
audit volume volser fix=yes skippartial=yes

TSM new features
TSM 6.1

Released on March 27, 2009.
The biggest single change is the conversion of the database to DB2, or UDB as it is often called these days. This is a packaged DB2 version that does not need any maintenance by your DBAs. The DB2 database permits the following
• Online reorgs - the ability to reorganise the database without needing to stop the TSM service
• Better database integrity checking with the ability to repair database problems online
• Bigger TSM databases, so fewer servers
• Bigger recovery logs, up to 256 GB with the ability to do roll-foward recovery
Upgrading to TSM 6.1 is disruptive. New disk capacity has to be provided for the DB2 database, and the old database 'prepared' before moving it to DB2. Once the preparation starts, the old database is unusable, and so has to be recovered from backup if there are any problems.
The change to DB2 means the end of several old database commands like extend log and extend DB, and we will need to learn some new commands instead
Other features in TSM 6 are
• Data de-duplication
• The ability to interface with external security products
• Improved auditing and reporting facilities
• The ability to move data and metadata between TSM servers, so simplifying load balancing
• The EXPIRE INVENTORY command will be granular, so you can expire by node or node groups, policy domain or by data type
• The ability to concurrently copy data between storage pools while migrating to the next pool in the heirarchy, that is, concurrent migration and storage pool backup
There is a lot more information about TSM 6.1 on the TSM 6.1 features and TSM 6.1 Conversion conversations on the Lasconet.

Show commands
Disclaimer
IBM provide the SHOW commands for their own use, for diagnostic gathering. Tivoli / IBM do not support the SHOW commands so it is no use raising problem reports with them if the results are not what you expect. Be aware that some of these commands are resource intensive and cannot be cancelled and some commands will try to fix problems; that is they may UPDATE the TSM database. If possible, try the commands on a test system before you use them in production. In general, you should not run SHOW commands unless you are familiar with that particular command, or TSM support personnel have asked you to run them.
This is not an exhaustive list of SHOW commands because
• I do not know them all
• I have missed out some obscure commands (that is, the ones that I don't understand)
The list is split into commands issued at the client and commands issued at the server.
Client Commands
Client commands can produce a lot of output that will scroll off the screen, so you may prefer to pipe them into a file. For example SHOW OPTTABLE > output.txt
SHOW OPTIONS
Displays the active client options.
SHOW OPTTABLE
You can configure a client so it can get its option settings from either the client option file or from the server. This command will tell you which one is in use for this client.
SHOW SESSION
Displays capabilities that this client has for this connection to the server. The client and server report and negotiate the capabilities that each has when a session is started by a client to a server. This show command reports the capabilities available by this server and client.
SHOW TRACEFLAGS
Use this to determine which trace options could be used for this client.
SHOW VERSION
Use this command to find out what release and version of TSM is installed.
SHOW CLUSTER
Displays information about the disk mappings in a Windows or a Netware Cluster.
The next five commands all apply to Windows clients
SHOW PLUGINS
If you want to use extra capabilities like image backup, then they are provided by 'plug-ins'. You use this command to find out what plug-ins are available for this client.
SHOW CACHE
TSM uses Subfile backups to backup only changed portions of files over slow network links. TSM knows what parts of a file have changed, by storing checksum information in a cache file on the client. This command will display information about the subfile cache, if the client is configured to use subfile backup.
SHOW SYSTEMOBJECT / SHOW SYSTEMSTATE
You use these commands to find out what system facilities are installed, and which ones can be backed up with TSM. SYSTEMOBJECT is used for Windows 2000 and XP clients, and SYSTEMSTATE for Windows 2003 clients.
SYSTEMSERVICES
For Windows 2003 clients, displays the SYSTEM SERVICES data that is available on this client. Useful to determine which SYSTEM SERVICES files are installed on this Windows client and those that could be backed up.
Server commands
SHOW CONFIG
This is one of the long running commands that produce lots of output. It actually issues a set of QUERY and SHOW commands then uses these to build a quite comprehensive picture of the state of the server. This command is worth running as a diagnostic data gathering exercise, to be analysed when free time permits.
Commands to help with session or tape drive problems
SHOW ASQUEUED
If you have a client session or process stuck, it may be waiting for a drive. You can use this command to see if there are sessions queued waiting for mount points.
SHOW DEVCLASS
Is also useful if you have problems with drives. It displays the status of each device class, including the status of allocated drives.
SHOW MP
Useful for determining which volume is in-use by a given mount point and other attributes for the assigned mount points
.
SHOW ASVOL
If you are having problems with sessions or processes queued, or waiting for tape volumes, then this command will display the in-memory list of assigned volumes.
Commands to help manage the database and recovery log
SHOW BUFSTATS
Use this command to see if you need to increase your database buffer pool size.
Example output is
Database Buffer Pool Statistics:
Total Buffer Latch Requests: 184217213
Times Scavenging Required: 1123642 0.61%
Times Scavenging by Stealing: 1105210 0.60%
Times Scavenging by Waiting: 0 0.00%
Times Read Required: 1123574 0.61%
Cache Hit Percentage: 99.39%
SHOW BUFVARS
This command will display the status of the buffer pool variables and can be used to check for dirty buffer pages.
Example output is
Database buffer pool global variables:

CkptId=1642081, NumClean=32880, MinClean=32784, NumTempClean=32880, MinTempClean=16392,
BufPoolSize=32880, BufDescCount=33071, BufDescMaxFree=36168,
DpTableSize=49161, DpCount=0, DpDirty=10, DpCkptId=1642081, DpCursor=64,
NumEmergency=0 CumEmergency=0, MaxEmergency=0.
BuffersXlatched=0, xLatchesStopped=False, FullFlushWaiting=False.

The above output shows that there are 10 dirty buffers. You use another undocumented command 'FLUSH' to clear out the dirty buffers.
SHOW DBTXNT
Use this command to displays the database transaction table. Sample output looks like
Open objects:
name ->AS.SegmentsSS.PoolsSS.Pool.IdsAF.Clusters Valid=1, inRollback=0, endNTA=0, State=2,
Index=10, LatchCount=0, SavePoint=0, TotLogRecs=0, TotLogBytes=0,
UndoLogRecs=0, UndoLogBytes=0, LogReserve=0, PageReserve=0,
Elapsed=346279 (secs), MinLsn=0.0.0, MaxLsn=0.0.0, LastLsn=0.0.0,
UndoNextLsn=0.0.0, logWriter=False, backupTxn=False
This is showing that this transaction has 4 database tables open, that the transaction is valid, and that it is not writing log records.
SHOW DBV
Displays database global attributes.
SHow DBBACKUPVOLS
Use this command to get details on the latest full and incremental database backup volumes.
SHOW LOGPINNED
You can use this command if your logfile is running out of space. The oldest active record in the log is 'pinning' the log. This command will tell you what task owns that oldest record. You could follow this up with
SHOW LOGPINNED CANCEL
which will cancel whatever is holding the pinned record. This is one of those 'UPDATE' SHOW commands that you need to use with extreme care, and preferably after advice from Tivoli. However, if your log is rapidly filling up, will you have time to make that support call? It would be best to investigate and test this command before you need to use it in an emergency. The command is only available in TSM version 5.1.7.0 or above.
Example output is
Dirty page Lsn=4597153.124.2199, Last DB backup Lsn=4596680.43.3348,
Transaction table Lsn=4597152.240.3249, Running DB backup Lsn=0.0.0, Log truncation Lsn=4596680.43.3348

Lsn=4596680.43.3348, Owner=DB, Length=194
Type=Update, Flags=C2, Action=ExtInsert, Page=2171054,
Tsn=0:818261705, PrevLsn=4596680.43.3137, UndoNextLsn=0.0.0,
UpdtLsn=4596680.43.3137 ===> ObjName=AS.Segments, Index=26,
RootAddr=14, PartKeyLen=3, NonPartKeyLen=4, DataLen=87

The recovery log is pinned by the last data base backup.
Space will not be freed until data base backup is run again.
The command tells you that the log is pinned by a backup.
LOGV
You can use this command to display the recovery log global attributes, but it is less obvious what they all mean.
SHOW LOGstats
Use this command to see statistics on log usage.
An associated command is SHOW LOGRESET This command will RESET the log statistics back to 0.
Commands to help with session hangs
SHOW LOCK
If your TSM server is running very slow it is worth trying this command. A TSM server uses locks to serialise updates to prevent corruption. You use this command to see what resources are locked.
SHOW RESQUEUE
The TSM server monitors resource usage and will cancel any resource users that are holding onto resources for too long. You use the SHOW RESQUEUE command to display the resource queue and it will display information about transactions, locks, and other resources. The only output I have ever seen for this command is
RESOURCETimeout is 60 minutes.
Resource list is unlocked.
There are current no waiters.
SHOW SESSION
Useful for diagnosing hangs or other general session problems while a session is still connected to the server. This is also useful in cases where a session is cancelled or terminated and still appears in QUERY SESSION.
SHOW TXNT
You use this command to get information about active server transactions. Transactions are the low level operations that actually read or update the database. This command can be useful for diagnosing hangs or other transaction related problems, but the command produces a lot of output, and many of the fields are a bit obscure. Partial output for one single transaction is shown below.
slot -> 51:
Tsn=0:822996787, Resurrected=False, InFlight=True, Distributed=False, Addr 3b6458b8
ThreadId=119, Timestamp=05/31/04 09:04:02, Creator=dfmigr.c(1789)
Participants=3, summaryVote=ReadOnly
EndInFlight False, endThreadId 119, tmidx 0 0, processBatchCount 0.
Participant DB: voteReceived=False, ackReceived=False
Participant BF: voteReceived=False, ackReceived=False
Participant SS: voteReceived=False, ackReceived=False
Locks held by Tsn=0:822996787 :
Type=34040, NameSpace=32997, SummMode=xLock, Mode=xLock, Key='127.0'
SHow INVObject 0 ObjectId
Use this command to show an inventory object, reporting its nodename, filespace, management class, and more. This command can be useful if you get errors with objects. For example, when exporting a server you may see a message like
ANR9999D xibf.c(664): Return code 87 encountered in writing
object 0.9041317 to export stream.
ANR0661E EXPORT SERVER: Internal error encountered in
accessing data storage.
The 0. 9041317 is the Object ID. If you use the SHOW command
SHow INVObject 0 9041317
The result below will tell you what the object is
OBJECT: 0. 9041317 (Backup):
Node: ACSN08 Filespace: /y2. /msg/rlds/ temp
Type: 2 CG: 1 Size: 0.0 HeaderSize: 0

BACKUP OBJECTS ENTRY:
State: 1 Type: 2 MC: 1 CG: 1
/y2 : / msg/rlds/ temp
This (MC: DEFAULT)
Active, Inserted 08/01/03 07:58:58

EXPIRING OBJECTS ENTRY:
Expiring object entry not found.

Storage pool and LAN free commands
SHOW SSPOOL
Useful for displaying the states and attributes of defined storage pools.

SHow DAMAGED poolname
Contributed by Roy Adeyosoye, This command will list out all the files in a storage pool that are marked as damaged. Typical output looks like
Volume ID: 34281, Volume Name: QZ1720
Segment number: 1, Segment start: 1606,
Segment Size: 0.85470464
file_name
etc....
Found 4349 damaged bitfiles.
To fix them, run an audit command like
AUDIT VOL QZ1720 FIX=YES

SHOW SLOTS
This command Will list out the total number of usable slots in a SCSI library. The Full command is

SHOW SLOTS libraryname
SHOW TRANSFERSTATS poolname

You use this command to get statistics from the last migration process, for example -
SHOW TRANSFERSTATS BACKUPPOOL

Statistics for last migration from pool BACKUPPOOL
Start date/time: 05/26/04 02:30:21
Elapsed time: 24128 seconds
Total wait time: 17232 seconds
Number of participating processes: 2
Total duration of all processes: 48169 seconds
Total physical files: 11934
Total logical files: 316992
Total bytes: 273999794176
Average logical files per physical file: 26.6
Average physical file size: 22421.5 KB
Number of batch/file transactions ended: 782
Number of batch transactions aborted: 0
Number of file transactions started: 0
Number of file transactions aborted: 0

SHOW LANFREE nodeName storageAgent.
This command was added with TSM version 5.2.2. It will check out all possible destination storage pools for a given client node and tell you if this storage pool can support LAN-free backup and restore.

SHOW OBJDIR
will display all the defined database object names, along with their numeric node identifier. Output looks like
Defined Database Object Names (Home Address):

Activity.Log(49), Activity.Summary(50),
Administrative.Attributes(37), Administrators(40), etc..
Show NODE number
Will display details of one of these nodes
Show Node 49

B-Tree ROOT node: PageAddr=49, Lsn=121563.242.2284, ObjName=Activity.Log
LeftSib=-1, RightSib=-1, Continuation=-1
NumRecs=2, Free=994, NextSlot=163, DirOff=0, PartKeyLen=0
Level=2, NumCols=19, KeyCols=2, PartCols=0, NodeAttr=80
MaxCapacity=1004, Capacity=1004, Occupancy=10, LowOccupancy=334
Record 0 (KeyLen=9, DataLen=4):
Key: ->(,.....)(002D)< -
Subtree=&gt4152978(+Infinite)(+Infinite)< -
Subtree=&gt4152968<
SHOW TREE tree.name
gives details about the data in a tree
SHOW TREE activity.log


************* Table Activity.Log VERIFIED **************

Tree Height: 3
Leaf Nodes: 198
Non-leaf Nodes: 3
Empty Leaf Nodes: 0
Empty Non-leaf Nodes: 0
Leaf Records: 6717
Non-leaf records: 200

Records/Node in leaves:
Mean=33.92, StdDev=3.04, Min=10.00, Max=37.00, Samples=198
Records/Node in non-leaves:
Mean=66.67, StdDev=27.64, Min=2.00, Max=113.00, Samples=3
Fractional occupancy in leaves:
Mean=0.98, StdDev=0.05, Min=0.31, Max=1.00, Samples=198
Fractional occupancy in non-leaves:
Mean=0.39, StdDev=0.16, Min=0.01, Max=0.67, Samples=3
Record length in leaves:
Mean=104.52, StdDev=17.61, Min=68.00, Max=221.00, Samples=6717
Record length in non-leaves:
Mean=13.08, StdDev=0.75, Min=6.00, Max=14.00, Samples=200
Key length in leaves:
Mean=9.17, StdDev=0.47, Min=8.00, Max=10.00, Samples=6717
Key length in non-leaves:
Mean=9.08, StdDev=0.75, Min=2.00, Max=10.00, Samples=200
Partition key length in leaves:
*** no sampled data ***

SHOW VERIFYEXPTABLE
This command used to be used to fix problems where data is not being expired correctly. It has now been replaced by the command CLEANUP EXPTABLE [BEGINNODE=nn ENDNODE=NN]. You must discuss and agree with IBM before running this command, as it updates your database. Once you start this command it cannot be cancelled, and will prevent EXPIRE INVENTORY from running.

What is BMR?
Bare metal recovery is the art of recovering a machine from an empty hard drive or 'bare metal'. At some time we must all have had to rebuild a home PC from a recovery disk. This puts your PC back into its 'factory state'. You must then find and re-install drivers for any new kit installed on top of base version, then install all applications that were not part of the base system. This can be painful and the last time I had to do it I went out and bought some anti-virus software.
A server recovery essentially follows the same process, but while this may be an acceptable process for a home PC, in a commercial environment it can take too much time to rebuild an operating system by hand, and this also requires skilled staff. Even if you have a record of all the information needed to build a server, including copies of all the applications that were loaded on it, it can still take days to complete a rebuild.
There are ways of making this easier. If a server is really business critical, it may be worth investing in a hot standby (that is, a spare server with all the software loaded on it and ready to go) to minimise the business impact of a failure. If it does not make economic sense to have a farm of spare servers sitting idle, then BMR tools exist to simplify a machine rebuild.
Cristie BMR
CBMR can automate a basic system rebuild and will interface with TSM to store backups on the network. It consists of the following components:
• Backup and restore software (PC-BaX) - used to backup and restore files in Windows mode.
• Open File Module (OFM) - enables backup of files that are in use by the windows system or other applications at the time of backup.
• Linux operating system - A failed server needs to be booted from the CBMR CD-ROM or a network based copy. This will boot a version of Linux in which the partitioning and formatting of the hard disks will be done.
• Linux mode restore software - restores the essential operating system files from the TSM server.
CMBR allows you to replace a hard disk with a bigger or a smaller one than the original. It will automatically scale disk partitions to fit the size of the new hard disk.
CBMR can either store system rebuild information on a floppy disk or on a network share. This information includes the number and types of hard disks, their layout, Windows, CBMR and ITSM installation folders and the SCSI, RAID and network adapters installed.
In a disaster situation you boot the server from a provided Linux operating system, then the hard disks will be partitioned and formatted. CBMR then restores the operating system files, either from CD-ROM or from a TSM filespace. You then need to re-boot the server. This whole process can be automated with a script.
Installing CBMR for use with TSM
Install CBMR on the file server - this is a typical point and click process
Create a CBMR client node on the TSM server. You can create one CBMR node for all the machines and then each machine's data will be stored under a different filespace.
In CBMR, create a storage device to represent the TSM client node. The first time you launch PC-BaX it will ask for the storage device. At this point, you tell it to use TSM.

Hardware or software configuration
You need to save the client's configuration information by running setupbmr.exe You should do this every time you change the configuration of the client.
Backup the important system files to the TSM server by running PC-BaX and selecting the do_a_dr_backup option
Both these processes can be automated with the TSM scheduler to run on a regular basis. This would be appropriate for the system files backup, but a manual backup may be a better option after configuration changes.
System Recovery Process
Boot the server from the CBMR CD-ROM. It is possible to hold a copy of the CD-ROM on a remote server, and then boot a failed server from it via a RIB board.
Get the configuration information from the network share - CBMR will now configure the hardware
Restore the system files from the TSM server
When prompted reboot the computer.
This whole process takes 5-10 minutes, and then the server is ready for data recovery. It is possible to recover multiple servers concurrently from one TSM server.

1 comment:

  1. Everything posted made a bunch of sense. But, think on this, suppose you were to create a awesome post title?
    I am not suggesting your content isn't solid, but suppose you added something to possibly get a person's
    attention? I mean TSM SQL | Unix_Tech is a little vanilla.
    You should glance at Yahoo's front page and watch how they create article headlines to grab people interested. You might try adding a video or a related pic or two to grab readers interested about everything've
    written. In my opinion, it would bring your blog a little bit more interesting.

    ReplyDelete