[p4] Basic backup/recovery question

Discussion:

Michael Mirman

2017-02-16 20:32:49 UTC

The checkpoint will be complete - at the time you are creating that checkpoint.
The reason to rotate the journal is that whatever goes to the next journal is not in the checkpoint.
Therefore, when you restore, you may want to replay the next journal(s) after you restore the checkpoint.
--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

-----Original Message-----
Behalf Of briand
Sent: Thursday, February 16, 2017 2:40 PM
Subject: [p4] Basic backup/recovery question
Posted on behalf of forum user 'briand'.
I'm about ready to upgrade our server from the ancient 2012.1 to the current
2016.2. I've read several KB articles on backup, recovery, and upgrade (as
well as several manual pages). Even so, there is one item I'm not completely
clear on.
When you create a checkpoint using p4d -jc, it will also save and truncate the
journal. When I'm recovering from that checkpoint using p4d -jr, do I also
need to recover from the journal file too or does the checkpoint contain a
complete and up to date backup of the server?
Thanks.
--
http://forums.perforce.com/index.php?/topic/5182-basic-backuprecovery-
question
_______________________________________________
http://maillist.perforce.com/mailman/listinfo/perforce-user

_______________________________________________
perforce-user mailing list - perforce-***@perforce.com
http://maillist.perforc

Michael Mirman

2017-02-17 02:05:49 UTC

Permalink

First, let's make sure that we mean the same thing when we say "offline checkpoint".
For me, it means, the real server keeps going, and the checkpoint is done on a replica.
*Not* we take the server offline and take a checkpoint there.

The "standard" (and simplest) procedure is the latter. In that case, what is in the journal is only for the future - the db is not changed during creation of the checkpoint.
We have never used this approach because we like the idea of having our server 24x7, so we don’t take it offline.
Rather, we take a checkpoint on a replica, and we do *not* truncate the journal at that time.

All journals are kept available for the restore procedure.
When the restore time comes (which for us is every night - when we test the restore on a test stack), first, we rebuild the db from the checkpoint, and then replay all journals starting with the "right" one.

Logically, we need to start replaying journals with the first journal that contains records that we did not get in the checkpoint.
Practically, it is a bit tricky because our checkpointing procedure is completely asynchronous to the journal rotation.

The way we do is:
We grep the @db.counter@ record for the journal counter from the checkpoint - now we know the journal number from the checkpoint itself (as opposed to relying on having the journal counter embedded in the checkpoint name, although this is a fine approach, too).
Also, from the checkpoint, we get the time we started writing this checkpoint - the first @ex@ record.

Then, we go through our journals backwards (from the most recent to the oldest), looking for the journal with the @ex@ record, indicating the time preceding the @ex@ time from the checkpoint.
After that, we know all the journals we have to replay - from the oldest to the most recent.

We had had several versions of the logic how to maintain the perfect correctness of what is replayed.
This logic may be a bit overcomplicated, but it has been working fine for years.
--
Michael Mirman
MathWorks, Inc.
508-647-7555

-----Original Message-----
From: perforce-user [mailto:perforce-user-***@perforce.com] On Behalf Of briand
Sent: Thursday, February 16, 2017 6:00 PM
To: perforce-***@perforce.com
Subject: Re: [p4] Basic backup/recovery question

Posted on behalf of forum user 'briand'.

Thanks for the thoughts.

In my test environment, when I'm doing an offline checkpoint, I see a couple
of lines present in the journal after the checkpoint is complete (for example,
setting the "journal" and "lastCheckpointAction" counters),
plus, the journal was last written after the checkpoint file was last written,
so that makes me think that the Perforce metadata gets updated after the
checkpoint is complete, even when performing an offline checkpoint.

--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/5182-basic-backuprecovery-question
_______________________________________________
perforce-user mailing list - perforce-***@perforce.com
http://maillist.perforce.com/mailman/listinfo/perforce-user
_______________________________________________
perforce-user mailing list - perforce-***@perforce.com
http://maillist.perforce.com/mailman/listinfo/perforce

Michael Mirman

2017-02-17 13:38:35 UTC

Permalink

Peculiar.
I see in the release notes in the
Major new functionality in 2010.2
section:
#257688 **
To help administrators keep track of successful checkpoints, a
new internally generated counter 'lastCheckpointAction' has
been added which contains the operation timestamp. Also, when
the checkpoint completes, the MD5 digest of the checkpoint is
written to the file 'checkpoint.N.md5'. Together these data points
can help in verifying complete and undamaged checkpoints. Note
that if the -z flag was used to compress the checkpoint, it must
be uncompressed to verify the checksum. When restoring from a
journal, the server will now produce a warning if the journal
was written by a different version of the server, and will produce
an error if the journal was written using different case-handling
flags than are currently defined for the server.

We are running 2016.1.
I don’t see the counter:
-> p4 counter lastCheckpointAction
0

IOW, even if it's set, we cannot query it.
Not too useful for the user, IMHO.
--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

-----Original Message-----
Behalf Of Domenic
Sent: Friday, February 17, 2017 2:30 AM
Subject: Re: [p4] Basic backup/recovery question
Posted on behalf of forum user 'Domenic'.
Based on
https://www.perforce.com/perforce/doc.current/manuals/cmdref/p4_coun
ters.html
it looks like lastCheckpointAction is a built-in one.
--
http://forums.perforce.com/index.php?/topic/5182-basic-backuprecovery-
question
_______________________________________________
http://maillist.perforce.com/mailman/listinfo/perforce-user

_______________________________________________
perforce-user mailing list - perforce-***@perforce.com
http://maillist.perforce.com/mailman/listinfo/perforce-u

Michael Mirman

2017-02-17 17:13:14 UTC

Permalink

Maybe you don't see it because you've always done your checkpoints off a
replica?

Ah! Of course! This makes perfect sense.
Now I understand why Sven asked me that question!
:-)
--
Michael Mirman
MathWorks, Inc.
3 Apple Hill Drive, Natick, MA 01760
508-647-7555

-----Original Message-----
Behalf Of Domenic
Sent: Friday, February 17, 2017 9:55 AM
Subject: Re: [p4] Basic backup/recovery question
Posted on behalf of forum user 'Domenic'.
[http://forums.perforce.com/index.php?app=forums&module=forums&sec
tion=findpost&pid=21198]

Originally posted to the perforce-user mailing list by: Michael Mirman
We are running 2016.1.
-> p4 counter lastCheckpointAction
0
IOW, even if it's set, we cannot query it.
Not too useful for the user, IMHO.

From our experience, it seems that the counter only gets updated when the
lastCheckpointAction = 1476690993 (2016/10/17 00:56:33 -0700 Pacific
Daylight
Time) checkpoint restored
..even though we take nightly checkpoints off a replica.
Maybe you don't see it because you've always done your checkpoints off a
replica?
--
http://forums.perforce.com/index.php?/topic/5182-basic-backuprecovery-
question
_______________________________________________
http://maillist.perforce.com/mailman/listinfo/perforce-user

_______________________________________________
perforce-user mailing list - perforce-***@perforce.com
http://maillist.perforce.c

Michael Mirman

2017-02-18 22:32:34 UTC

Permalink

I don’t see any holes in this approach.
If I were doing it, I would probably consider shutting down the server, then restarting it in a way that nobody can access it except me, and rotating the journal. Then, I could shut down the server, and create a full checkpoint not to worry about the journals.

My 0.02
--
Michael Mirman
MathWorks, Inc.
508-647-7555

-----Original Message-----
From: perforce-user [mailto:perforce-user-***@perforce.com] On Behalf Of briand
Sent: Friday, February 17, 2017 7:15 PM
To: perforce-***@perforce.com
Subject: Re: [p4] Basic backup/recovery question

Posted on behalf of forum user 'briand'.

Our "standard" nightly backup procedure is to take a checkpoint from a
replica and rotate the journals at the same time. Journals get rotated several
times throughout the day and get save along with the checkpoints.

For this specific instance, since I'm upgrading from a very old p4d version
to the current p4d version, I need to completely rebuild the databases from a
checkpoint (KB article 5469). I've gotten approval from upper management to
shut down Perforce for the weekend, so that I can perform the upgrade without
time pressures (we need to do some major VMware maintenance at the same time, so
it works out well).

Once I shut down the master p4d server, I'll create a new checkpoint (p4d
-jc). This checkpoint and the journal file left over after the checkpoint
(containing the four lines discussed above) will then be used to rebuild the
databases with the new version of p4d and reseed the replicas. It may not be the
procedure that takes the least time, but I believe it will be the least
error-prone. This procedure is consistent with the steps outlined in KB article
5469.