diff mbox

[1/1] support/download: Add git download method for SHAs

Message ID 1476373859-51202-1-git-send-email-bryce.ferguson@rockwellcollins.com
State Changes Requested
Headers show

Commit Message

Bryce Ferguson Oct. 13, 2016, 3:50 p.m. UTC
From: Brandon Maier <brandon.maier@rockwellcollins.com>

Shallow clones don't work for downloading SHAs. However it's common to
track a tag or branch by the commit SHA so that a branch or tag doesn't
unexpectedly change.

We can take advantage of this scenario by searching the remote for a ref
that it equivalent to our sha, then shallow cloning that ref instead.

Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Signed-off-by: Bryce Ferguson <bryce.ferguson@rockwellcollins.com>
---
 support/download/git | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Arnout Vandecappelle Oct. 14, 2016, 10:50 p.m. UTC | #1
On 13-10-16 17:50, Bryce Ferguson wrote:
> From: Brandon Maier <brandon.maier@rockwellcollins.com>
> 
> Shallow clones don't work for downloading SHAs. However it's common to
> track a tag or branch by the commit SHA so that a branch or tag doesn't
> unexpectedly change.
> 
> We can take advantage of this scenario by searching the remote for a ref
> that it equivalent to our sha, then shallow cloning that ref instead.
> 
> Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
> Signed-off-by: Bryce Ferguson <bryce.ferguson@rockwellcollins.com>
> ---
>  support/download/git | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/support/download/git b/support/download/git
> index 281db61..1c6548d 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -55,6 +55,23 @@ if [ -n "$(_git ls-remote "'${repo}'" "'${cset}'" 2>&1)" ]; then
>      fi
>  fi
>  if [ ${git_done} -eq 0 ]; then
> +    # See if $cset is a sha that maps to a branch, then shallow clone that branch
> +    equivalent_ref="$(_git ls-remote --heads --tags "'${repo}'" | awk "/^${cset}/{ print \$2; exit }" | sed 's,refs/\(tags\|heads\)/,,')"

 There should be a redirect of stderr of ls-remote (we don't want to see any
errors from it, the relevant error is the one coming from git clone).

 This line is way too long.

 Also, you can get rid of the sed by including it in awk:

	{ split(\$2, a, \"/\"); print a[3]; exit }

 Also, wouldn't it be better to merge this into the first one? ls-remote takes a
relatively long time, and it seems to be independent of how much refs are
actually printed, so it seems a waste to do it twice. It does mean that the awk
program would become a little more complicated because it also has to match the
cset as a ref.

 Hm, actually, cset may be something like refs/tags/foo so the awk program would
get quite a bit more complicated... Still, I think it's worth it.

> +    if [ -n "$equivalent_ref" ]; then
> +        printf "Doing shallow clone with cset '%s' as '%s'\n" "$cset" "$equivalent_ref"
> +        if _git clone ${verbose} "${@}" --depth 1 -b "'${equivalent_ref}'" "'${repo}'" "'${basename}'"; then
> +            if (cd "${basename}" && _git show "'${cset}'" >/dev/null 2>&1); then

 Can you explain why this bit is still needed? Is there any way that the clone
could succeed but the cset not be present? It would mean that ls-remote was
lying to us, no?

 I can invent one scenario: the symbolic ref got updated between the ls-remote
and the clone, which means that the wrong commit got shallow-cloned, and the
actual cset is not reachable. Hm, ok, can you add a comment to explain that?


 Otherwise looks good :-)

 Regards,
 Arnout

> +                git_done=1
> +            else
> +                rm -rf "${basename}"
> +                printf "Shallow clone failed, checked out wrong revision, falling back to doing a full clone\n"
> +            fi
> +        else
> +            printf "Shallow clone failed, falling back to doing a full clone\n"
> +        fi
> +    fi
> +fi
> +if [ ${git_done} -eq 0 ]; then
>      printf "Doing full clone\n"
>      _git clone ${verbose} "${@}" "'${repo}'" "'${basename}'"
>  fi
>
Arnout Vandecappelle Oct. 15, 2016, 9:32 a.m. UTC | #2
On 15-10-16 00:50, Arnout Vandecappelle wrote:
> 
> 
> On 13-10-16 17:50, Bryce Ferguson wrote:
>> From: Brandon Maier <brandon.maier@rockwellcollins.com>
>>
>> Shallow clones don't work for downloading SHAs. However it's common to
>> track a tag or branch by the commit SHA so that a branch or tag doesn't
>> unexpectedly change.
>>
>> We can take advantage of this scenario by searching the remote for a ref
>> that it equivalent to our sha, then shallow cloning that ref instead.
>>
>> Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
>> Signed-off-by: Bryce Ferguson <bryce.ferguson@rockwellcollins.com>
>> ---
>>  support/download/git | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/support/download/git b/support/download/git
>> index 281db61..1c6548d 100755
>> --- a/support/download/git
>> +++ b/support/download/git
>> @@ -55,6 +55,23 @@ if [ -n "$(_git ls-remote "'${repo}'" "'${cset}'" 2>&1)" ]; then
>>      fi
>>  fi
>>  if [ ${git_done} -eq 0 ]; then
>> +    # See if $cset is a sha that maps to a branch, then shallow clone that branch
>> +    equivalent_ref="$(_git ls-remote --heads --tags "'${repo}'" | awk "/^${cset}/{ print \$2; exit }" | sed 's,refs/\(tags\|heads\)/,,')"

 Another thing: with --heads --tags, it will not capture "special refs" like
gerrit's refs/changes/42/242/1, which is a pity. Actually for these special refs
it shouldn't be needed to use a sha1 because gerrit makes sure they really are
immutable (if you update the change the last number will increment, and AFAIK
there is no way to delete it); still it would be nice to support sha1's wherever
they come from.

 Cc'ing Ricardo who was concerned about this.

 Regards,
 Arnout
Ricardo Martincoski Oct. 16, 2016, 4:57 a.m. UTC | #3
Hello,

On Fri, Oct 14, 2016 at 07:50 PM, Arnout Vandecappelle wrote:

> On 13-10-16 17:50, Bryce Ferguson wrote:
[snip]
>> +++ b/support/download/git
>> @@ -55,6 +55,23 @@ if [ -n "$(_git ls-remote "'${repo}'" "'${cset}'" 2>&1)" ]; then
>>      fi
>>  fi
>>  if [ ${git_done} -eq 0 ]; then
>> +    # See if $cset is a sha that maps to a branch, then shallow clone that branch
>> +    equivalent_ref="$(_git ls-remote --heads --tags "'${repo}'" | awk "/^${cset}/{ print \$2; exit }" | sed 's,refs/\(tags\|heads\)/,,')"
> 
>  There should be a redirect of stderr of ls-remote (we don't want to see any
> errors from it, the relevant error is the one coming from git clone).
> 
>  This line is way too long.
> 
>  Also, you can get rid of the sed by including it in awk:
> 
> 	{ split(\$2, a, \"/\"); print a[3]; exit }

This expression will need be more complex in order to support refs like this:
74c4e04dbb10a5cdeac36e3f1057946e551beb84        refs/heads/feature/vxlan

> 
>  Also, wouldn't it be better to merge this into the first one? ls-remote takes a
> relatively long time, and it seems to be independent of how much refs are
> actually printed, so it seems a waste to do it twice. It does mean that the awk
> program would become a little more complicated because it also has to match the
> cset as a ref.

I agree with merging this 'if' to the previous one.
It also will make the log cleaner, see example messages at the end.

> 
>  Hm, actually, cset may be something like refs/tags/foo so the awk program would
> get quite a bit more complicated... Still, I think it's worth it.

It worth a try.

But maybe it can turn out to be challenging to achieve a single line that
generates the proper reference to be used in the git clone.
In this case we could achieve the same behavior keeping the 2 ifs.
Something like this:

if _git check-ref-format "'${cset}'"; then
# branch or tag, so use the old if (if 'git ls-remote', 'git clone')
else
# sha1, so use the new if (ref='git ls-remote | awk', if ref 'git clone')
fi

Do you think it would be acceptable?

And yet another approach would be to first of all save the output of 'git
ls-remote' with all refs to a temporary file. And then use this file to check
${cset} is a sha1, branch or tag. Later on, in the same script this temp file
could be checked for special refs too (in another patch).
But I don't know if this approach would be acceptable either.

> 
>> +    if [ -n "$equivalent_ref" ]; then
>> +        printf "Doing shallow clone with cset '%s' as '%s'\n" "$cset" "$equivalent_ref"
>> +        if _git clone ${verbose} "${@}" --depth 1 -b "'${equivalent_ref}'" "'${repo}'" "'${basename}'"; then
>> +            if (cd "${basename}" && _git show "'${cset}'" >/dev/null 2>&1); then
> 
>  Can you explain why this bit is still needed? Is there any way that the clone
> could succeed but the cset not be present? It would mean that ls-remote was
> lying to us, no?
> 
>  I can invent one scenario: the symbolic ref got updated between the ls-remote
> and the clone, which means that the wrong commit got shallow-cloned, and the
> actual cset is not reachable. Hm, ok, can you add a comment to explain that?
> 
> 
>  Otherwise looks good :-)
> 
>  Regards,
>  Arnout
[snip]

With this patch applied and using this dirty hack:

TMUX_VERSION = 74c4e04dbb10a5cdeac36e3f1057946e551beb84
TMUX_SITE = https://review.openswitch.net/openswitch/ops
TMUX_SITE_METHOD = git

I get these log messages:

Doing shallow clone
Cloning into 'tmux-74c4e04dbb10a5cdeac36e3f1057946e551beb84'...
warning: Could not find remote branch 74c4e04dbb10a5cdeac36e3f1057946e551beb84 to clone.
fatal: Remote branch 74c4e04dbb10a5cdeac36e3f1057946e551beb84 not found in upstream origin
Shallow clone failed, falling back to doing a full clone
Doing shallow clone with cset '74c4e04dbb10a5cdeac36e3f1057946e551beb84' as 'feature/vxlan'
Cloning into 'tmux-74c4e04dbb10a5cdeac36e3f1057946e551beb84'...
remote: Counting objects: 585, done

Regards,
Ricardo
Ricardo Martincoski Oct. 16, 2016, 5:20 a.m. UTC | #4
Hello,

On Sat, Oct 15, 2016 at 06:32 AM, Arnout Vandecappelle wrote:

> On 15-10-16 00:50, Arnout Vandecappelle wrote:
>> 
>> On 13-10-16 17:50, Bryce Ferguson wrote:
[snip]
> 
>  Another thing: with --heads --tags, it will not capture "special refs" like
> gerrit's refs/changes/42/242/1, which is a pity. Actually for these special refs
> it shouldn't be needed to use a sha1 because gerrit makes sure they really are
> immutable (if you update the change the last number will increment, and AFAIK

Indeed. The last number is the Revision of the Change and serves the same
purpose as the -v in git format-patch, but it is calculated by the server.

> there is no way to delete it); still it would be nice to support sha1's wherever

Actually in some cases there is.
The developer can upload a change in the draft mode and later on publish it
using the web interface. Once published, AFAIK the change or revision cannot be
deleted. Changes and revisions in the state Draft can be deleted using the web
interface. Both draft and published changes have special refs. But the ref is
removed when the change or revision is deleted.

Out of curiosity I tried to create a draft revision,
6b7bb6a8e547f2edc247b88ec2b8e74d6115baa5        refs/changes/02/2/4
delete it, and create a new revision
0880cece5e05e56d4242487f1207a5e0f3362312        refs/changes/02/2/4
so when in Draft state, a revision of a change is not immutable.

> they come from.
> 
>  Cc'ing Ricardo who was concerned about this.

Thank you very much, Arnout.

But in this specific case I think --heads --tags makes sense.
This is because 'git clone -b' seems to operate only for those types of refs.
I didn't check de code from git.git, but I did some tests.
- branches are allowed, but only without the prefix refs/heads
- tags are allowed, but only without the prefix refs/tags
- HEAD is not allowed
- changes (special refs) are not allowed

The same does not apply to 'git fetch'.
- branches are allowed, in any form: refs/heads/master = heads/master = master
- tags are allowed, in any form: refs/tags/v1 = tags/v1 = v1
- HEAD is allowed
- changes (special refs) are allowed in the forms: refs/changes/01/1/2 =
changes/01/1/2
- other special refs are allowed too: refs/meta/config

Regards,
Ricardo
Arnout Vandecappelle Oct. 16, 2016, 7:25 a.m. UTC | #5
On 16-10-16 06:57, Ricardo Martincoski wrote:
> Hello,
> 
> On Fri, Oct 14, 2016 at 07:50 PM, Arnout Vandecappelle wrote:
> 
>> On 13-10-16 17:50, Bryce Ferguson wrote:
> [snip]
>>> +++ b/support/download/git
>>> @@ -55,6 +55,23 @@ if [ -n "$(_git ls-remote "'${repo}'" "'${cset}'" 2>&1)" ]; then
>>>      fi
>>>  fi
>>>  if [ ${git_done} -eq 0 ]; then
>>> +    # See if $cset is a sha that maps to a branch, then shallow clone that branch
>>> +    equivalent_ref="$(_git ls-remote --heads --tags "'${repo}'" | awk "/^${cset}/{ print \$2; exit }" | sed 's,refs/\(tags\|heads\)/,,')"
>>
>>  There should be a redirect of stderr of ls-remote (we don't want to see any
>> errors from it, the relevant error is the one coming from git clone).
>>
>>  This line is way too long.
>>
>>  Also, you can get rid of the sed by including it in awk:
>>
>> 	{ split(\$2, a, \"/\"); print a[3]; exit }
> 
> This expression will need be more complex in order to support refs like this:
> 74c4e04dbb10a5cdeac36e3f1057946e551beb84        refs/heads/feature/vxlan

 Actually, it is better to use the whole ref, includes refs/*/ because there may
be a branch and tag with the same name. And incidentally this simplifies the awk
expression :-)

 Grmbl, git clone doesn't support refs/.../... style of refs. git clone sucks.
Why don't we do

git init
git fetch

all the time? We're anyway doing a git fetch down there... Could just as well be
done always. Oh, and git fetch does support --depth so no issue there.

>>  Also, wouldn't it be better to merge this into the first one? ls-remote takes a
>> relatively long time, and it seems to be independent of how much refs are
>> actually printed, so it seems a waste to do it twice. It does mean that the awk
>> program would become a little more complicated because it also has to match the
>> cset as a ref.
> 
> I agree with merging this 'if' to the previous one.
> It also will make the log cleaner, see example messages at the end.
> 
>>
>>  Hm, actually, cset may be something like refs/tags/foo so the awk program would
>> get quite a bit more complicated... Still, I think it's worth it.
> 
> It worth a try.
> 
> But maybe it can turn out to be challenging to achieve a single line that
> generates the proper reference to be used in the git clone.
> In this case we could achieve the same behavior keeping the 2 ifs.
> Something like this:
> 
> if _git check-ref-format "'${cset}'"; then
> # branch or tag, so use the old if (if 'git ls-remote', 'git clone')
> else
> # sha1, so use the new if (ref='git ls-remote | awk', if ref 'git clone')
> fi
> 
> Do you think it would be acceptable?

 Certainly acceptable, especially if you write a nice commit message explaining
why it is better like that.

> 
> And yet another approach would be to first of all save the output of 'git
> ls-remote' with all refs to a temporary file. And then use this file to check
> ${cset} is a sha1, branch or tag. Later on, in the same script this temp file
> could be checked for special refs too (in another patch).
> But I don't know if this approach would be acceptable either.

 That would certainly be acceptable.


 Regards,
 Arnout

> 
>>
>>> +    if [ -n "$equivalent_ref" ]; then
>>> +        printf "Doing shallow clone with cset '%s' as '%s'\n" "$cset" "$equivalent_ref"
>>> +        if _git clone ${verbose} "${@}" --depth 1 -b "'${equivalent_ref}'" "'${repo}'" "'${basename}'"; then
>>> +            if (cd "${basename}" && _git show "'${cset}'" >/dev/null 2>&1); then
>>
>>  Can you explain why this bit is still needed? Is there any way that the clone
>> could succeed but the cset not be present? It would mean that ls-remote was
>> lying to us, no?
>>
>>  I can invent one scenario: the symbolic ref got updated between the ls-remote
>> and the clone, which means that the wrong commit got shallow-cloned, and the
>> actual cset is not reachable. Hm, ok, can you add a comment to explain that?
>>
>>
>>  Otherwise looks good :-)
>>
>>  Regards,
>>  Arnout
> [snip]
> 
> With this patch applied and using this dirty hack:
> 
> TMUX_VERSION = 74c4e04dbb10a5cdeac36e3f1057946e551beb84
> TMUX_SITE = https://review.openswitch.net/openswitch/ops
> TMUX_SITE_METHOD = git
> 
> I get these log messages:
> 
> Doing shallow clone
> Cloning into 'tmux-74c4e04dbb10a5cdeac36e3f1057946e551beb84'...
> warning: Could not find remote branch 74c4e04dbb10a5cdeac36e3f1057946e551beb84 to clone.
> fatal: Remote branch 74c4e04dbb10a5cdeac36e3f1057946e551beb84 not found in upstream origin
> Shallow clone failed, falling back to doing a full clone
> Doing shallow clone with cset '74c4e04dbb10a5cdeac36e3f1057946e551beb84' as 'feature/vxlan'
> Cloning into 'tmux-74c4e04dbb10a5cdeac36e3f1057946e551beb84'...
> remote: Counting objects: 585, done
> 
> Regards,
> Ricardo
>
Brandon Maier Oct. 16, 2016, 11:11 p.m. UTC | #6
On Sun, Oct 16, 2016 at 2:25 AM, Arnout Vandecappelle <arnout@mind.be>
wrote:
>
>
>  Actually, it is better to use the whole ref, includes refs/*/ because
> there may
> be a branch and tag with the same name. And incidentally this simplifies
> the awk
> expression :-)
>
>  Grmbl, git clone doesn't support refs/.../... style of refs. git clone
> sucks.
> Why don't we do
>
> git init
> git fetch
>
> all the time? We're anyway doing a git fetch down there... Could just as
> well be
> done always. Oh, and git fetch does support --depth so no issue there.
>

That seems reasonable. As Ricardo and you mentioned, git fetch seems to
work with any ref (including special refs), and is smart about searching
for matching branches. We could simplify this down greatly and also cover
the special refs scenario from Yann's patch. Something like...

equivalent_ref="$(git ls-remote $repo | awk '/^$cset/{ print $2; exit}')"
git init $basename
pushd $basename
git remote add origin $repo
if [ -n $equivalent_ref ]; then
    git fetch --depth=1 origin $equivalent_ref
    # $cset must be a SHA-1 here, so checkout $cset so we know we got the
correct commit
    git checkout $cset
    git_done=1
fi
if [ $git_done -ne 1 ]; then
    # Don't know if cset is a sha or ref, so don't try to add to local
refs, instead checkout FETCH_HEAD
    git fetch --depth=1 origin $cset
    git checkout FETCH_HEAD
    git_done=1
fi
# Full fetch w/ special refs
if [ $git_done -ne 1]; then
    git fetch -u origin 'refs/*:refs/*'
    git checkout $cset
fi

The downside is we won't add refs to our local refs/ on a shallow clone.
But it looks like we only needed them for checkout, which we can do via
FETCH_HEAD instead.
Arnout Vandecappelle Oct. 17, 2016, 8:18 p.m. UTC | #7
On 17-10-16 01:11, Brandon Maier wrote:
> On Sun, Oct 16, 2016 at 2:25 AM, Arnout Vandecappelle <arnout@mind.be
> <mailto:arnout@mind.be>> wrote:
> 
> 
>>      Actually, it is better to use the whole ref, includes refs/*/ because there may
>>     be a branch and tag with the same name. And incidentally this simplifies the awk
>>     expression :-)
>> 
>>      Grmbl, git clone doesn't support refs/.../... style of refs. git clone sucks.
>>     Why don't we do
>> 
>>     git init
>>     git fetch
>> 
>>     all the time? We're anyway doing a git fetch down there... Could just as well be
>>     done always. Oh, and git fetch does support --depth so no issue there.
> 
> 
> That seems reasonable. As Ricardo and you mentioned, git fetch seems to work
> with any ref (including special refs), and is smart about searching for matching
> branches. We could simplify this down greatly and also cover the special refs
> scenario from Yann's patch. Something like...
> 
> equivalent_ref="$(git ls-remote $repo | awk '/^$cset/{ print $2; exit}')"

 There are two boundary cases that we should also handle:
- an actual ref that could also be a commit sha, e.g. "added";
- an incomplete commit sha that matches several refs.

> git init $basename
> pushd $basename
> git remote add origin $repo

 I'm not a big fan of adding an explicit remote, just use $repo everywhere.

> if [ -n $equivalent_ref ]; then
>     git fetch --depth=1 origin $equivalent_ref
>     # $cset must be a SHA-1 here, so checkout $cset so we know we got the
> correct commit
>     git checkout $cset
>     git_done=1

 Only if checkout was successful, I guess.

> fi
> if [ $git_done -ne 1 ]; then

 I think this could be an else. If the first case failed, the second one is very
unlikely to succeed.

>     # Don't know if cset is a sha or ref, so don't try to add to local refs,
> instead checkout FETCH_HEAD
>     git fetch --depth=1 origin $cset
>     git checkout FETCH_HEAD

 I kind of prefer to specify the target ref explicitly instead of relying on
FETCH_HEAD. I.e., create a local special ref for it. Could be e.g.
refs/buildroot/$cset.

>     git_done=1

 Here checkout was always successful.

> fi
> # Full fetch w/ special refs
> if [ $git_done -ne 1]; then
>     git fetch -u origin 'refs/*:refs/*'

 This could theoretically be a much larger clone than a simple "git fetch -u
origin", so it might be worthwhile to first try that one still. And it's a
pretty unusual situation that you specified a sha that is only reachable from a
special ref, not from a normal branch/tag.

 However, in practice I don't think the extra objects added by gerrit will
amount to much on a large repo. gerrit will die a thousand deads before it can
even approximate one "change" per e.g. kernel commit.

>     git checkout $cset
> fi
>  
> The downside is we won't add refs to our local refs/ on a shallow clone. But it
> looks like we only needed them for checkout, which we can do via FETCH_HEAD instead.

 Except in the sha-gotten-through-shallow-clone-of-ref case, because in that
case the fetched ref may have already changed compared to what was there when we
did the ls-remote.


 Regards,
 Arnout
Brandon Maier Oct. 17, 2016, 10:11 p.m. UTC | #8
On Mon, Oct 17, 2016 at 3:18 PM, Arnout Vandecappelle <arnout@mind.be>
wrote:

>
>
> On 17-10-16 01:11, Brandon Maier wrote:
> >
> > That seems reasonable. As Ricardo and you mentioned, git fetch seems to
> work
> > with any ref (including special refs), and is smart about searching for
> matching
> > branches. We could simplify this down greatly and also cover the special
> refs
> > scenario from Yann's patch. Something like...
> >
> > equivalent_ref="$(git ls-remote $repo | awk '/^$cset/{ print $2; exit}')"
>
>  There are two boundary cases that we should also handle:
> - an actual ref that could also be a commit sha, e.g. "added";
>

I can move the 'git fetch $cset' before the 'git fetch $equivalent_ref' so
that if cset matches a ref, it will be fetched first. Also has the bonus
that we can skip the ls-remote call entirely if it's a ref or the server
supports fetching sha-1s.

- an incomplete commit sha that matches several refs.


As in a shortened sha that collides with multiple full shas? I could change
the awk to '\$1 == $cset{ print \$2; exit}' so that we only match on a
complete 40-byte sha. It looks like all the PKG_VERSIONS in buildroot use
the full sha anyway.


> > git init $basename
> > pushd $basename
> > git remote add origin $repo
>
>  I'm not a big fan of adding an explicit remote, just use $repo everywhere.
>

Either way is fine by me


>
> > if [ -n $equivalent_ref ]; then
> >     git fetch --depth=1 origin $equivalent_ref
> >     # $cset must be a SHA-1 here, so checkout $cset so we know we got the
> > correct commit
> >     git checkout $cset
> >     git_done=1
>
>  Only if checkout was successful, I guess.
>

Yep, I wrote this quick. But all the git fetches and checkouts should be
wrapped in an 'if ...; then git_done=1; else printf "failed..."; fi'


>
> > fi
> > if [ $git_done -ne 1 ]; then
>
>  I think this could be an else. If the first case failed, the second one
> is very
> unlikely to succeed.
>

If cset is a ref, equivalent_ref will be empty and we'd want to fallback
here.


>
> >     # Don't know if cset is a sha or ref, so don't try to add to local
> refs,
> > instead checkout FETCH_HEAD
> >     git fetch --depth=1 origin $cset
> >     git checkout FETCH_HEAD
>
>  I kind of prefer to specify the target ref explicitly instead of relying
> on
> FETCH_HEAD. I.e., create a local special ref for it. Could be e.g.
> refs/buildroot/$cset.
>

Is it possible for FETCH_HEAD to be incorrect? Maybe if someone is messing
around in the repo? Otherwise this could add an (albeit extremely unlikely)
corner case where the repo has a special ref called refs/buildroot/xyz.


>
> >     git_done=1
>
>  Here checkout was always successful.
>
> > fi
> > # Full fetch w/ special refs
> > if [ $git_done -ne 1]; then
> >     git fetch -u origin 'refs/*:refs/*'
>
>  This could theoretically be a much larger clone than a simple "git fetch
> -u
> origin", so it might be worthwhile to first try that one still. And it's a
> pretty unusual situation that you specified a sha that is only reachable
> from a
> special ref, not from a normal branch/tag.
>
>  However, in practice I don't think the extra objects added by gerrit will
> amount to much on a large repo. gerrit will die a thousand deads before it
> can
> even approximate one "change" per e.g. kernel commit.
>

We'd need to do a "git fetch -u origin 'refs/tags/*:refs/tags/*'
'refs/heads/*:refs/heads/*'" otherwise it wouldn't fetch the tags and
branches, which I'm fine with adding. But if we agree that special refs
probably won't make up a large portion of fetches, I'd rather keep it
simple and just fetch everything.


> >     git checkout $cset
> > fi
> >
> > The downside is we won't add refs to our local refs/ on a shallow clone.
> But it
> > looks like we only needed them for checkout, which we can do via
> FETCH_HEAD instead.
>
>  Except in the sha-gotten-through-shallow-clone-of-ref case, because in
> that
> case the fetched ref may have already changed compared to what was there
> when we
> did the ls-remote.
>

Yes, but in that case we can do "git checkout $cset" because $cset must be
a commit's sha, and we don't need to fetch refs to checkout shas.


Otherwise, if we think switching to fetches is an acceptable solution, I'll
create a new patchset to do the change-over.
Arnout Vandecappelle Oct. 17, 2016, 10:58 p.m. UTC | #9
On 18-10-16 00:11, Brandon Maier wrote:
> On Mon, Oct 17, 2016 at 3:18 PM, Arnout Vandecappelle <arnout@mind.be
> <mailto:arnout@mind.be>> wrote:
> 
> 
> 
>     On 17-10-16 01:11, Brandon Maier wrote:
>     >
>     > That seems reasonable. As Ricardo and you mentioned, git fetch seems to work
>     > with any ref (including special refs), and is smart about searching for matching
>     > branches. We could simplify this down greatly and also cover the special refs
>     > scenario from Yann's patch. Something like...
>     >
>     > equivalent_ref="$(git ls-remote $repo | awk '/^$cset/{ print $2; exit}')"
> 
>      There are two boundary cases that we should also handle:
>     - an actual ref that could also be a commit sha, e.g. "added";
> 
>  
> I can move the 'git fetch $cset' before the 'git fetch $equivalent_ref' so that
> if cset matches a ref, it will be fetched first. Also has the bonus that we can
> skip the ls-remote call entirely if it's a ref or the server supports fetching
> sha-1s.

 Sounds good!

> 
>     - an incomplete commit sha that matches several refs. 
> 
> 
> As in a shortened sha that collides with multiple full shas? I could change the
> awk to '\$1 == $cset{ print \$2; exit}' so that we only match on a complete
> 40-byte sha. It looks like all the PKG_VERSIONS in buildroot use the full sha
> anyway.

 I indeed meant an avbreviated sha. Inside Buildroot we indeed require full
sha's, but in BR2_EXTERNAL packages that's not always the case.

 I'd just error out if there are multiple matches.


>     > git init $basename
>     > pushd $basename
>     > git remote add origin $repo
> 
>      I'm not a big fan of adding an explicit remote, just use $repo everywhere.
> 
> 
> Either way is fine by me
>  
> 
> 
>     > if [ -n $equivalent_ref ]; then
>     >     git fetch --depth=1 origin $equivalent_ref
>     >     # $cset must be a SHA-1 here, so checkout $cset so we know we got the
>     > correct commit
>     >     git checkout $cset
>     >     git_done=1
> 
>      Only if checkout was successful, I guess.
> 
> 
> Yep, I wrote this quick. But all the git fetches and checkouts should be wrapped
> in an 'if ...; then git_done=1; else printf "failed..."; fi'
>  
> 
> 
>     > fi
>     > if [ $git_done -ne 1 ]; then
> 
>      I think this could be an else. If the first case failed, the second one is very
>     unlikely to succeed.
> 
> 
> If cset is a ref, equivalent_ref will be empty and we'd want to fallback here.

 Exactly, so an else of the [ -n $equivalent_ref ].

>  
> 
> 
>     >     # Don't know if cset is a sha or ref, so don't try to add to local refs,
>     > instead checkout FETCH_HEAD
>     >     git fetch --depth=1 origin $cset
>     >     git checkout FETCH_HEAD
> 
>      I kind of prefer to specify the target ref explicitly instead of relying on
>     FETCH_HEAD. I.e., create a local special ref for it. Could be e.g.
>     refs/buildroot/$cset.
> 
> 
> Is it possible for FETCH_HEAD to be incorrect? Maybe if someone is messing
> around in the repo? Otherwise this could add an (albeit extremely unlikely)
> corner case where the repo has a special ref called refs/buildroot/xyz.

 We own the buildroot namespace :-P

>  
> 
> 
>     >     git_done=1
> 
>      Here checkout was always successful.
> 
>     > fi
>     > # Full fetch w/ special refs
>     > if [ $git_done -ne 1]; then
>     >     git fetch -u origin 'refs/*:refs/*'
> 
>      This could theoretically be a much larger clone than a simple "git fetch -u
>     origin", so it might be worthwhile to first try that one still. And it's a
>     pretty unusual situation that you specified a sha that is only reachable from a
>     special ref, not from a normal branch/tag.
> 
>      However, in practice I don't think the extra objects added by gerrit will
>     amount to much on a large repo. gerrit will die a thousand deads before it can
>     even approximate one "change" per e.g. kernel commit.
> 
> 
> We'd need to do a "git fetch -u origin 'refs/tags/*:refs/tags/*'
> 'refs/heads/*:refs/heads/*'" otherwise it wouldn't fetch the tags and branches,

 Euh, isn't this exactly equivalent to "git fetch -u origin" ?

> which I'm fine with adding. But if we agree that special refs probably won't
> make up a large portion of fetches, I'd rather keep it simple and just fetch
> everything.

 Good enough for me.

> 
> 
>     >     git checkout $cset
>     > fi
>     >
>     > The downside is we won't add refs to our local refs/ on a shallow clone. But it
>     > looks like we only needed them for checkout, which we can do via FETCH_HEAD instead.
> 
>      Except in the sha-gotten-through-shallow-clone-of-ref case, because in that
>     case the fetched ref may have already changed compared to what was there when we
>     did the ls-remote.
> 
> 
> Yes, but in that case we can do "git checkout $cset" because $cset must be a
> commit's sha, and we don't need to fetch refs to checkout shas.
> 
> 
> Otherwise, if we think switching to fetches is an acceptable solution, I'll
> create a new patchset to do the change-over.

 Do it in two patches: first convert the current situation to fetches, then add
the shallow download of tag SHAs.

 Regards,
 Arnout
Ricardo Martincoski Nov. 2, 2016, 5:22 p.m. UTC | #10
Hello,

Please take a look at this patch series:
http://patchwork.ozlabs.org/patch/690099/
http://patchwork.ozlabs.org/patch/690097/ (please read the WARNING)
http://patchwork.ozlabs.org/patch/690098/
        
Regards,
Ricardo
diff mbox

Patch

diff --git a/support/download/git b/support/download/git
index 281db61..1c6548d 100755
--- a/support/download/git
+++ b/support/download/git
@@ -55,6 +55,23 @@  if [ -n "$(_git ls-remote "'${repo}'" "'${cset}'" 2>&1)" ]; then
     fi
 fi
 if [ ${git_done} -eq 0 ]; then
+    # See if $cset is a sha that maps to a branch, then shallow clone that branch
+    equivalent_ref="$(_git ls-remote --heads --tags "'${repo}'" | awk "/^${cset}/{ print \$2; exit }" | sed 's,refs/\(tags\|heads\)/,,')"
+    if [ -n "$equivalent_ref" ]; then
+        printf "Doing shallow clone with cset '%s' as '%s'\n" "$cset" "$equivalent_ref"
+        if _git clone ${verbose} "${@}" --depth 1 -b "'${equivalent_ref}'" "'${repo}'" "'${basename}'"; then
+            if (cd "${basename}" && _git show "'${cset}'" >/dev/null 2>&1); then
+                git_done=1
+            else
+                rm -rf "${basename}"
+                printf "Shallow clone failed, checked out wrong revision, falling back to doing a full clone\n"
+            fi
+        else
+            printf "Shallow clone failed, falling back to doing a full clone\n"
+        fi
+    fi
+fi
+if [ ${git_done} -eq 0 ]; then
     printf "Doing full clone\n"
     _git clone ${verbose} "${@}" "'${repo}'" "'${basename}'"
 fi