Discussion:
[p4] Any tricks to make sync fast !!!
Janarthanan Moorthy
2005-05-27 05:35:51 UTC
Permalink
Hi,

We are using proxy P4 in our location & our main server is in US.
Since our repository is huge, it is taking lot of time to sync (even
though using proxy p4). Any idea's to make sync very fast ? Currently
our sync takes around 6 hours...


-Jana
Arnt Gulbrandsen
2005-05-27 08:06:42 UTC
Permalink
Post by Janarthanan Moorthy
We are using proxy P4 in our location & our main server is in US.
Since our repository is huge, it is taking lot of time to sync (even
though using proxy p4). Any idea's to make sync very fast ? Currently
our sync takes around 6 hours...
We have the same problem. Our current guess is that p4p involves one
roundtrip per file to be synced. If the p4p and the p4d are, say, 1.2
seconds from each other and there are 3,000 files to be synced, that's
an hour.

One way to solve it would be to put Perforce's p4d in Ulan Bator and
wait for r2005.2 to pipeline better. Short of that, I don't know a way.

Arnt
Ralf Huvendiek
2005-05-27 16:37:42 UTC
Permalink
Post by Arnt Gulbrandsen
Post by Janarthanan Moorthy
We are using proxy P4 in our location & our main server is in US.
Since our repository is huge, it is taking lot of time to sync (even
though using proxy p4). Any idea's to make sync very fast ? Currently
our sync takes around 6 hours...
We have the same problem. Our current guess is that p4p involves one
roundtrip per file to be synced. If the p4p and the p4d are, say, 1.2
seconds from each other and there are 3,000 files to be synced, that's
an hour.
Did you ever try to tunnel all your perforce traffic through ssh?
If you suspect a latency problem this could maybe help, I never tried it
it my own, though.
Let's assume you have 3000 files, if p4d makes a tcp connection per
file, you end up with 3000 connections. As tcp uses a three way
handshake to setup the connection, three packets are travelling at least
from locatin A to B. When the connection is teared down, again 3 packets
wander from A to B. Given those 3000 connections/files and 6 packets, we
end up with 18000 packets going over the wire.

Now the scenario if you use ssh with local port forwarding. In this case
you have one connection which is kept open all the time. All those
little SYN/ACK/FIN statements for your connection setup/teardown should
be local and as a matter of fact much faster.
At least that's the theory :) If it's no big hassle for you, maybe you
should try it. Honestly, if I had to wait 6 hours I would try everything
possible.

Second, you'd really check what Noel Yap asked. Are you're client
workspaces tight enough?

And finally, if not the latency but the bandwidth the problem, then I'd
try to rsync (or something like this) as much as possible during out of
office times.

Ralf
Arnt Gulbrandsen
2005-05-27 17:06:32 UTC
Permalink
Post by Ralf Huvendiek
Did you ever try to tunnel all your perforce traffic through ssh?
If you suspect a latency problem this could maybe help, I never tried
it it my own, though.
Let's assume you have 3000 files, if p4d makes a tcp connection per
file, you end up with 3000 connections.
It doesn't, it creates (approximately) one connection per p4 command.
Post by Ralf Huvendiek
As tcp uses a three way handshake to setup the connection, three
packets are travelling at least from locatin A to B. When the
connection is teared down, again 3 packets wander from A to B. Given
those 3000 connections/files and 6 packets, we end up with 18000
packets going over the wire.
Now the scenario if you use ssh with local port forwarding. In this
case you have one connection which is kept open all the time. All
those little SYN/ACK/FIN statements for your connection
setup/teardown should be local and as a matter of fact much faster.
At least that's the theory :)
The theory doesn't entirely work ;) The time from the first SYN packet
to first transported byte isn't changed much. ssh and sshd need to
communicate in order to set up the tunnel, and the time they need is
comparable to the time TCP needs to set up a connection. It's not
exactly the same, but also not miraculously faster.

Arnt
Jos Backus
2005-05-27 18:46:55 UTC
Permalink
Perhaps this is wildly off-topic, but would this help?
Ralf Huvendiek
2005-05-30 16:49:17 UTC
Permalink
Post by Arnt Gulbrandsen
Post by Ralf Huvendiek
Did you ever try to tunnel all your perforce traffic through ssh?
If you suspect a latency problem this could maybe help, I never tried
it it my own, though.
Let's assume you have 3000 files, if p4d makes a tcp connection per
file, you end up with 3000 connections.
It doesn't, it creates (approximately) one connection per p4 command.
Arnt, thanks for shedding some light onto this. I think I misunderstood
you there. Unfortunately, I still have not understood 100% what you
meant with:

:We have the same problem. Our current guess is that p4p involves one
:roundtrip per file to be synced. If the p4p and the p4d are, say, 1.2
:seconds from each other and there are 3,000 files to be synced, that's
:an hour.

I had understood that when you sync your 3000 files, p4p will make one
connection to p4d for each file. As this was wrong, what did you mean
with 'one roundtrip per file'? The whole topic will be interesting for
us too very soon.

In fact I just tried the p4p here. After having synced against a depot
and the cache is loaded, it will make exactly one connection if I try to
sync again. Of course this would be different, if my cache would not be
up to date.

Regards,
Ralf
Arnt Gulbrandsen
2005-05-30 18:42:26 UTC
Permalink
Post by Ralf Huvendiek
:We have the same problem. Our current guess is that p4p involves one
:roundtrip per file to be synced. If the p4p and the p4d are, say, 1.2
:seconds from each other and there are 3,000 files to be synced, that's
:an hour.
I had understood that when you sync your 3000 files, p4p will make one
connection to p4d for each file. As this was wrong, what did you mean
with 'one roundtrip per file'? The whole topic will be interesting
for us too very soon.
A roundtrip means that a packet travels there, and a reply travels back.
There is an unstated implication that the packet's originator (e.g.
p4p) does nothing meanwhile, but just waits for the reply.

p4p probably does something like that for each file. Perhaps it tells
p4d "I'm now giving client foobar file //depot/mumble/stumble#42, which
I had in cache", and then p4d answers "OK, I recorded that in db.have"
and then they go on to the next file. In that case, there is exactly
one roundtrip per file change, so if 3000 files are to be changed, p4p
has to wait 3000 times. Each wait is as long as the roundtrip ping
reports.

If p4p were to instead tell p4d "I'm now giving client foobar the
following dozen files, which I have in cache: ..." and p4d were to
answer "OK, recorded in db.have", then one roundtrip would serve to
handle a dozen file changes, and the whole operation would be almost a
dozen times faster. Handling 3000 files would require waiting just 250
times.

(Of course p4p could also say "I'm now giving client foobar the
following 3000 files: ...", which would mean just one wait, and the
internet connection between p4p and p4d would work at maximum speed.)
Post by Ralf Huvendiek
In fact I just tried the p4p here. After having synced against a depot
and the cache is loaded, it will make exactly one connection if I try
to sync again. Of course this would be different, if my cache would
not be up to date.
Is your p4p far from your p4d? What is the roundtrip ping reports
between those two hosts?

One good way to observe this is to run "p4 sync ...#none" and then "p4
sync ...". Typical commands for a build script.

Arnt
Ralf Huvendiek
2005-05-30 20:41:41 UTC
Permalink
Post by Arnt Gulbrandsen
A roundtrip means that a packet travels there, and a reply travels back.
There is an unstated implication that the packet's originator (e.g.
p4p) does nothing meanwhile, but just waits for the reply.
p4p probably does something like that for each file. Perhaps it tells
p4d "I'm now giving client foobar file //depot/mumble/stumble#42, which
I had in cache", and then p4d answers "OK, I recorded that in db.have"
and then they go on to the next file. In that case, there is exactly
one roundtrip per file change, so if 3000 files are to be changed, p4p
has to wait 3000 times. Each wait is as long as the roundtrip ping
reports.
Ok, that's just how I understood it the first time anyway. So my
suggestion about the ssh tunnel was still valid ;)
Even though you're right about two things. Of course it will not cause
miracles. If I'd know how to do miracles, I would not work for money
anymore :)
And second, the p4p opens just one connection to the server and reuses
that. I verified it, and it's correct. So I think you're right, ditch
the tunnel idea :(
Post by Arnt Gulbrandsen
If p4p were to instead tell p4d "I'm now giving client foobar the
following dozen files, which I have in cache: ..." and p4d were to
answer "OK, recorded in db.have", then one roundtrip would serve to
handle a dozen file changes, and the whole operation would be almost a
dozen times faster. Handling 3000 files would require waiting just 250
times.
Perfectly right. Alas, right now we have to deal with what we have.
Post by Arnt Gulbrandsen
Post by Ralf Huvendiek
In fact I just tried the p4p here. After having synced against a depot
and the cache is loaded, it will make exactly one connection if I try
to sync again. Of course this would be different, if my cache would
not be up to date.
Is your p4p far from your p4d? What is the roundtrip ping reports
between those two hosts?
I had synced against public.perforce.com (cough, cough)

--- alameda0c.perforce.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 221.924/224.352/227.530/2.380 ms
Post by Arnt Gulbrandsen
One good way to observe this is to run "p4 sync ...#none" and then "p4
sync ...". Typical commands for a build script.
I tried it again. And noticed an few things.

p4 sync #none takes quite some times. tcpdump show some interaction
between the host and my proxy.
p4 sync takes the hell of some time. There's a quite a lot of
communication between the host and the proxy. And now the really nasty
part. I am very suspicious that the proxy does not save all files.
Instead some of the big files are transfered again and again.
My cache has 8 times less files as my workspace after syncing. And from
the delay for some files during sync I'd swear that they're
retransfered. Plus I am unable to find the files in my cache afterwards.

Strange thing. Maybe I did something wrong, I started the proxy with
./p4p -r /tmp/p4cache -t public.perforce.com:1666 -e 0

What I learnt today: Never sync to #none. But do incremental syncs. In
fact is this is one thing I'm looking forward to. Right now we're still
using old PVCS and, speaking in perforce terms, we sync to #none and
then to #head when updating. This pretty much sucks. Because it takes
long enough to be an annoyance (not 6 hours, if I'd to wait 6 hours, I
would kill someone ;)).

*Attention, nasty hack suggestions coming up*

Assuming that the communication between the remote site and the server
is a problem. What about syncing each client on the site where the
server lives, than letting each user download his 'workspace' from
there?
I can think up a few workaround here.

'sync' is always local, that is the site where the perforce server is
located.

-sync to #none, then to #head. Afterwards the user can use tools like
rsync to sync his local workspace to the remote one. rsync should only
transfer the modified files. And from those only the modified parts.

-like above. But zip the resulting workspace and let the user download
this one. In fact we're doing something like this right now, because our
old PVCS version is such a network whore. Downside: The user has to
download the whole zip file. Which can be quite huge and much more than
using the rsync method.

-skip the sync to #none and just sync to #head. You should end up with
just the new files the user needs. After this zip or rsync.

Or course there are some downsides. I did say 'nasty hack solution', did
I? :)

- If the user does not properly sync his workstation with the one on the
remote fileserver, his workspace will be pretty much toast. As the
server 'thinks' he has newer files than he already has.
- If you have a lot of user/workspaces on your remote site, you could
need a pretty much big ass hd for all the files your remote users will
get each night.
- Another thing I forgot?

Ralf

Noel Yap
2005-05-27 11:38:35 UTC
Permalink
Are your clients as minimal as they can be?
Post by Janarthanan Moorthy
Hi,
We are using proxy P4 in our location & our main server is in US.
Since our repository is huge, it is taking lot of time to sync (even
though using proxy p4). Any idea's to make sync very fast ? Currently
our sync takes around 6 hours...
-Jana
_______________________________________________
http://maillist.perforce.com/mailman/listinfo/perforce-user
Dave Lewis
2005-05-27 19:34:03 UTC
Permalink
Post by Arnt Gulbrandsen
Post by Janarthanan Moorthy
We are using proxy P4 in our location & our main server is in US.
Since our repository is huge, it is taking lot of time to sync (even
though using proxy p4). Any idea's to make sync very fast ? Currently
our sync takes around 6 hours...
We have the same problem. Our current guess is that p4p involves one
roundtrip per file to be synced. If the p4p and the p4d are, say, 1.2
seconds from each other and there are 3,000 files to be synced, that's
an hour.
There could be several factors at work. I just did a test sync on
our p4p proxy in India, and 12,000 files took 30 minutes. Another
sync -f took 21 minutes. the size of the files totaled 598 mb.

A test sync here in austin took 5m 20s.

We run a daemon to automatically sync all files of interest on a
continous basis, thus populating the proxy's cache. Prior to
this, syncs would take about 6 hr.

I would guess that the latency of the line could be a big factor in
how quickly things can happen. The proxy has to check its most recent
revision agains the current status of the deopt.

stats on our connection:
PING Statistics----
30 packets transmitted, 30 packets received, 0% packet loss
round-trip (ms) min/avg/max = 272/299/392


dave
Chuck Karish
2005-05-28 16:56:15 UTC
Permalink
Post by Dave Lewis
There could be several factors at work. I just did a test sync on
our p4p proxy in India, and 12,000 files took 30 minutes. Another
sync -f took 21 minutes. the size of the files totaled 598 mb.
A test sync here in austin took 5m 20s.
We run a daemon to automatically sync all files of interest on a
continous basis, thus populating the proxy's cache. Prior to
this, syncs would take about 6 hr.
The cache reduces the demand for bandwidth for file transfers.
As you note, metadata queries are still needed.
Post by Dave Lewis
I would guess that the latency of the line could be a big factor in
how quickly things can happen. The proxy has to check its most recent
revision agains the current status of the deopt.
Yup. If p4p checks one file per query now, there's probably potential
to speed up many operations significantly by batching multiple
requests per connection. Feature request for Perforce.
Post by Dave Lewis
PING Statistics----
30 packets transmitted, 30 packets received, 0% packet loss
round-trip (ms) min/avg/max = 272/299/392
Those are pretty good numbers. A year ago people told of much
higher latencies for connections to India.

Chuck Karish
Shawn Hladky
2005-05-28 13:48:22 UTC
Permalink
We have a few extremely large workspaces that we need to sync in Sri
Lanka. As silly as it sounds the following is literally the fastest way
to do the initial sync:
1. Sync a workspace local to the server, create a label and copy to
portable media.
2. FedEx the media to the remote location
3. Create as many remote workspaces as you need with the same mapping
as the original.
4. Flush the new workspaces to the label (preferably run the flush
local to the server)
[mailto:] On Behalf Of Janarthanan Moorthy
1970-01-01 00:00:00 UTC
Permalink
In this case, we are using p4 to deploy 400k files to qa servers and a
few dozen files change daily. Not exactly the typical use case, but
aside from the initial sync it works great. We run that script every 5
minutes on several workspaces with no problems whatsoever.

-----Original Message-----
From: perforce-user-***@perforce.com
[mailto:perforce-user-***@perforce.com] On Behalf Of Janarthanan
Moorthy
Sent: Thursday, May 26, 2005 11:36 PM
To: perforce-***@perforce.com
Subject: [p4] Any tricks to make sync fast !!!

Hi,

We are using proxy P4 in our location & our main server is in US.
Since our repository is huge, it is taking lot of time to sync (even
though using proxy p4). Any idea's to make sync very fast ? Currently
our sync takes around 6 hours...


-Jana

_______________________________________________
perforce-user mailing list - perforce-***@perforce.com
http://maillist.perforce.com/mailman/listinfo/perforce-user
Dave Lewis
2005-05-28 17:56:42 UTC
Permalink
Post by Chuck Karish
Yup. If p4p checks one file per query now, there's probably potential
to speed up many operations significantly by batching multiple
requests per connection. Feature request for Perforce.
Yes, I was wondering if it did this, it would seem a logical
thing to do...

The server could figure out all files necessary to query about,
and send this to the proxy for verification.
Post by Chuck Karish
Post by Dave Lewis
PING Statistics----
30 packets transmitted, 30 packets received, 0% packet loss
round-trip (ms) min/avg/max = 272/299/392
Those are pretty good numbers. A year ago people told of much
higher latencies for connections to India.
I wasn't in charge of connections stuff, but I think we changed
providers at least once.

dave
Ajay Deshpande
2005-05-30 04:29:46 UTC
Permalink
Folks this is a great discussion. We are grappling with similar
issues! Some more points we have explored.

* I believe what Shawn mentions below is the best way to get the
first set of clients "booted" up to as close to the latest in the
repository as possible. After that it is all differential syncs -
hopefully.

* If you have many clients to sync then you can extend the same
logic. Sync one client to the latest, do a local copy to other
clients and then use p4 flush on them.

* Another aspect we have found is that we hit different bandwidth
limits. For example if ten clients do a p4 sync together, then we hit
the dowload limit implying all clients get slowed down. Since we use
VPN, some times we have hit that limit. So you need to analyse your
traffic a bit.

Question to the group/Perforce folks:
* Is there an easy (read as automated :-) way to ship differential
changes from the server to the client on other transports. For
example we would like to receive p4 sync data on ftp rather than
through the normal tcp connections to the server.

Appreciate any more input!

ajay
Post by Shawn Hladky
We have a few extremely large workspaces that we need to sync in
Sri
Lanka. As silly as it sounds the following is literally the
fastest way
1. Sync a workspace local to the server, create a label and copy
to
portable media.
2. FedEx the media to the remote location
3. Create as many remote workspaces as you need with the same
mapping
as the original.
4. Flush the new workspaces to the label (preferably run the flush
local to the server)
Jeff A. Bowles
2005-05-30 06:21:15 UTC
Permalink
Post by Ajay Deshpande
* Is there an easy (read as automated :-) way to ship differential
changes from the server to the client on other transports. For
example we would like to receive p4 sync data on ftp rather than
through the normal tcp connections to the server.
I am not saying this would be pretty. But...

You might be able to do this specific task with a script that
does the following. Assume your client workspace is on machine B.
1) "Make sure that the Host field of the client spec is empty."
2) (debug step) Have a script on machine-A pretend it is your
client, and do a "p4 sync -n" to see what would be brought down.
3) Have the script on machine-A pretend that it's your
workspace, and have it do a "p4 sync" and then copy the files to
machine B in the way you prefer.

You could make this more robust (and far more complicated) by
writing that script in C++ and linking against the
Perforce API, and playing shenanigans with the methods for file
creation/modification. (It could intercept calls to
create files locally, translating to the ftp-to-B counterparts.)

It's just a thought, and likely would be difficult to make into
something robust.

But it's worth considering as an exercise. (If you went further
with this, you might still want non-sync transactions to happen
from machine B to the server, with no contact with machine A at
all. At least, this would leave all other Perforce commands alone.)

-Jeff Bowles

ps. Don't forget that any "solution" will need to address the
traffic associated with "p4 revert", also.

pps. FTP?
Continue reading on narkive:
Loading...