Skip to content

Commit bb470f4

Browse files
newrengitster
authored andcommitted
merge-ort: step 3 of tree writing -- handling subdirectories as we go
Our order for processing of entries means that if we have a tree of files that looks like Makefile src/moduleA/foo.c src/moduleA/bar.c src/moduleB/baz.c src/moduleB/umm.c tokens.txt Then we will process paths in the order of the leftmost column below. I have added two additional columns that help explain the algorithm that follows; the 2nd column is there to remind us we have oid & mode info we are tracking for each of these paths (which differs between the paths which I'm not representing well here), and the third column annotates the parent directory of the entry: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB src/moduleB <version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA src/moduleA <version_info> src src <version_info> "" Makefile <version_info> "" When the parent directory changes, if it's a subdirectory of the previous parent directory (e.g. "" -> src/moduleB) then we can just keep appending. If the parent directory differs from the previous parent directory and is not a subdirectory, then we should process that directory. So, for example, when we get to this point: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB and note that the next entry (src/moduleB) has a different parent than the last one that isn't a subdirectory, we should write out a tree for it 100644 blob <HASH> umm.c 100644 blob <HASH> baz.c then pop all the entries under that directory while recording the new hash for that directory, leaving us with tokens.txt <version_info> "" src/moduleB <new version_info> src This process repeats until at the end we get to tokens.txt <version_info> "" src <new version_info> "" Makefile <version_info> "" and then we can write out the toplevel tree. Since we potentially have entries in our string_list corresponding to multiple different toplevel directories, e.g. a slightly different repository might have: whizbang.txt <version_info> "" tokens.txt <version_info> "" src/moduleD <new version_info> src src/moduleC <new version_info> src src/moduleB <new version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA When src/moduleA is popped off, we need to know that the "last directory" reverts back to src, and how many entries in our string_list are associated with that parent directory. So I use an auxiliary offsets string_list which would have (parent_directory,offset) information of the form "" 0 src 2 src/moduleA 5 Whenever I write out a tree for a subdirectory, I set versions.nr to the final offset value and then decrement offsets.nr...and then add an entry to versions with a hash for the new directory. The idea is relatively simple, there's just a lot of accounting to implement this. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent ee4012d commit bb470f4

File tree

1 file changed

+234
-8
lines changed

1 file changed

+234
-8
lines changed

merge-ort.c

Lines changed: 234 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -521,7 +521,46 @@ static int string_list_df_name_compare(const char *one, const char *two)
521521
}
522522

523523
struct directory_versions {
524+
/*
525+
* versions: list of (basename -> version_info)
526+
*
527+
* The basenames are in reverse lexicographic order of full pathnames,
528+
* as processed in process_entries(). This puts all entries within
529+
* a directory together, and covers the directory itself after
530+
* everything within it, allowing us to write subtrees before needing
531+
* to record information for the tree itself.
532+
*/
524533
struct string_list versions;
534+
535+
/*
536+
* offsets: list of (full relative path directories -> integer offsets)
537+
*
538+
* Since versions contains basenames from files in multiple different
539+
* directories, we need to know which entries in versions correspond
540+
* to which directories. Values of e.g.
541+
* "" 0
542+
* src 2
543+
* src/moduleA 5
544+
* Would mean that entries 0-1 of versions are files in the toplevel
545+
* directory, entries 2-4 are files under src/, and the remaining
546+
* entries starting at index 5 are files under src/moduleA/.
547+
*/
548+
struct string_list offsets;
549+
550+
/*
551+
* last_directory: directory that previously processed file found in
552+
*
553+
* last_directory starts NULL, but records the directory in which the
554+
* previous file was found within. As soon as
555+
* directory(current_file) != last_directory
556+
* then we need to start updating accounting in versions & offsets.
557+
* Note that last_directory is always the last path in "offsets" (or
558+
* NULL if "offsets" is empty) so this exists just for quick access.
559+
*/
560+
const char *last_directory;
561+
562+
/* last_directory_len: cached computation of strlen(last_directory) */
563+
unsigned last_directory_len;
525564
};
526565

527566
static int tree_entry_order(const void *a_, const void *b_)
@@ -596,6 +635,181 @@ static void record_entry_for_tree(struct directory_versions *dir_metadata,
596635
basename)->util = &mi->result;
597636
}
598637

638+
static void write_completed_directory(struct merge_options *opt,
639+
const char *new_directory_name,
640+
struct directory_versions *info)
641+
{
642+
const char *prev_dir;
643+
struct merged_info *dir_info = NULL;
644+
unsigned int offset;
645+
646+
/*
647+
* Some explanation of info->versions and info->offsets...
648+
*
649+
* process_entries() iterates over all relevant files AND
650+
* directories in reverse lexicographic order, and calls this
651+
* function. Thus, an example of the paths that process_entries()
652+
* could operate on (along with the directories for those paths
653+
* being shown) is:
654+
*
655+
* xtract.c ""
656+
* tokens.txt ""
657+
* src/moduleB/umm.c src/moduleB
658+
* src/moduleB/stuff.h src/moduleB
659+
* src/moduleB/baz.c src/moduleB
660+
* src/moduleB src
661+
* src/moduleA/foo.c src/moduleA
662+
* src/moduleA/bar.c src/moduleA
663+
* src/moduleA src
664+
* src ""
665+
* Makefile ""
666+
*
667+
* info->versions:
668+
*
669+
* always contains the unprocessed entries and their
670+
* version_info information. For example, after the first five
671+
* entries above, info->versions would be:
672+
*
673+
* xtract.c <xtract.c's version_info>
674+
* token.txt <token.txt's version_info>
675+
* umm.c <src/moduleB/umm.c's version_info>
676+
* stuff.h <src/moduleB/stuff.h's version_info>
677+
* baz.c <src/moduleB/baz.c's version_info>
678+
*
679+
* Once a subdirectory is completed we remove the entries in
680+
* that subdirectory from info->versions, writing it as a tree
681+
* (write_tree()). Thus, as soon as we get to src/moduleB,
682+
* info->versions would be updated to
683+
*
684+
* xtract.c <xtract.c's version_info>
685+
* token.txt <token.txt's version_info>
686+
* moduleB <src/moduleB's version_info>
687+
*
688+
* info->offsets:
689+
*
690+
* helps us track which entries in info->versions correspond to
691+
* which directories. When we are N directories deep (e.g. 4
692+
* for src/modA/submod/subdir/), we have up to N+1 unprocessed
693+
* directories (+1 because of toplevel dir). Corresponding to
694+
* the info->versions example above, after processing five entries
695+
* info->offsets will be:
696+
*
697+
* "" 0
698+
* src/moduleB 2
699+
*
700+
* which is used to know that xtract.c & token.txt are from the
701+
* toplevel dirctory, while umm.c & stuff.h & baz.c are from the
702+
* src/moduleB directory. Again, following the example above,
703+
* once we need to process src/moduleB, then info->offsets is
704+
* updated to
705+
*
706+
* "" 0
707+
* src 2
708+
*
709+
* which says that moduleB (and only moduleB so far) is in the
710+
* src directory.
711+
*
712+
* One unique thing to note about info->offsets here is that
713+
* "src" was not added to info->offsets until there was a path
714+
* (a file OR directory) immediately below src/ that got
715+
* processed.
716+
*
717+
* Since process_entry() just appends new entries to info->versions,
718+
* write_completed_directory() only needs to do work if the next path
719+
* is in a directory that is different than the last directory found
720+
* in info->offsets.
721+
*/
722+
723+
/*
724+
* If we are working with the same directory as the last entry, there
725+
* is no work to do. (See comments above the directory_name member of
726+
* struct merged_info for why we can use pointer comparison instead of
727+
* strcmp here.)
728+
*/
729+
if (new_directory_name == info->last_directory)
730+
return;
731+
732+
/*
733+
* If we are just starting (last_directory is NULL), or last_directory
734+
* is a prefix of the current directory, then we can just update
735+
* info->offsets to record the offset where we started this directory
736+
* and update last_directory to have quick access to it.
737+
*/
738+
if (info->last_directory == NULL ||
739+
!strncmp(new_directory_name, info->last_directory,
740+
info->last_directory_len)) {
741+
uintptr_t offset = info->versions.nr;
742+
743+
info->last_directory = new_directory_name;
744+
info->last_directory_len = strlen(info->last_directory);
745+
/*
746+
* Record the offset into info->versions where we will
747+
* start recording basenames of paths found within
748+
* new_directory_name.
749+
*/
750+
string_list_append(&info->offsets,
751+
info->last_directory)->util = (void*)offset;
752+
return;
753+
}
754+
755+
/*
756+
* The next entry that will be processed will be within
757+
* new_directory_name. Since at this point we know that
758+
* new_directory_name is within a different directory than
759+
* info->last_directory, we have all entries for info->last_directory
760+
* in info->versions and we need to create a tree object for them.
761+
*/
762+
dir_info = strmap_get(&opt->priv->paths, info->last_directory);
763+
assert(dir_info);
764+
offset = (uintptr_t)info->offsets.items[info->offsets.nr-1].util;
765+
if (offset == info->versions.nr) {
766+
/*
767+
* Actually, we don't need to create a tree object in this
768+
* case. Whenever all files within a directory disappear
769+
* during the merge (e.g. unmodified on one side and
770+
* deleted on the other, or files were renamed elsewhere),
771+
* then we get here and the directory itself needs to be
772+
* omitted from its parent tree as well.
773+
*/
774+
dir_info->is_null = 1;
775+
} else {
776+
/*
777+
* Write out the tree to the git object directory, and also
778+
* record the mode and oid in dir_info->result.
779+
*/
780+
dir_info->is_null = 0;
781+
dir_info->result.mode = S_IFDIR;
782+
write_tree(&dir_info->result.oid, &info->versions, offset,
783+
opt->repo->hash_algo->rawsz);
784+
}
785+
786+
/*
787+
* We've now used several entries from info->versions and one entry
788+
* from info->offsets, so we get rid of those values.
789+
*/
790+
info->offsets.nr--;
791+
info->versions.nr = offset;
792+
793+
/*
794+
* Now we've taken care of the completed directory, but we need to
795+
* prepare things since future entries will be in
796+
* new_directory_name. (In particular, process_entry() will be
797+
* appending new entries to info->versions.) So, we need to make
798+
* sure new_directory_name is the last entry in info->offsets.
799+
*/
800+
prev_dir = info->offsets.nr == 0 ? NULL :
801+
info->offsets.items[info->offsets.nr-1].string;
802+
if (new_directory_name != prev_dir) {
803+
uintptr_t c = info->versions.nr;
804+
string_list_append(&info->offsets,
805+
new_directory_name)->util = (void*)c;
806+
}
807+
808+
/* And, of course, we need to update last_directory to match. */
809+
info->last_directory = new_directory_name;
810+
info->last_directory_len = strlen(info->last_directory);
811+
}
812+
599813
/* Per entry merge function */
600814
static void process_entry(struct merge_options *opt,
601815
const char *path,
@@ -694,7 +908,9 @@ static void process_entries(struct merge_options *opt,
694908
struct strmap_entry *e;
695909
struct string_list plist = STRING_LIST_INIT_NODUP;
696910
struct string_list_item *entry;
697-
struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
911+
struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP,
912+
STRING_LIST_INIT_NODUP,
913+
NULL, 0 };
698914

699915
if (strmap_empty(&opt->priv->paths)) {
700916
oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -714,6 +930,11 @@ static void process_entries(struct merge_options *opt,
714930
/*
715931
* Iterate over the items in reverse order, so we can handle paths
716932
* below a directory before needing to handle the directory itself.
933+
*
934+
* This allows us to write subtrees before we need to write trees,
935+
* and it also enables sane handling of directory/file conflicts
936+
* (because it allows us to know whether the directory is still in
937+
* the way when it is time to process the file at the same path).
717938
*/
718939
for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
719940
char *path = entry->string;
@@ -724,6 +945,8 @@ static void process_entries(struct merge_options *opt,
724945
*/
725946
struct merged_info *mi = entry->util;
726947

948+
write_completed_directory(opt, mi->directory_name,
949+
&dir_metadata);
727950
if (mi->clean)
728951
record_entry_for_tree(&dir_metadata, path, mi);
729952
else {
@@ -732,17 +955,20 @@ static void process_entries(struct merge_options *opt,
732955
}
733956
}
734957

735-
/*
736-
* TODO: We can't actually write a tree yet, because dir_metadata just
737-
* contains all basenames of all files throughout the tree with their
738-
* mode and hash. Not only is that a nonsensical tree, it will have
739-
* lots of duplicates for paths such as "Makefile" or ".gitignore".
740-
*/
741-
die("Not yet implemented; need to process subtrees separately");
958+
if (dir_metadata.offsets.nr != 1 ||
959+
(uintptr_t)dir_metadata.offsets.items[0].util != 0) {
960+
printf("dir_metadata.offsets.nr = %d (should be 1)\n",
961+
dir_metadata.offsets.nr);
962+
printf("dir_metadata.offsets.items[0].util = %u (should be 0)\n",
963+
(unsigned)(uintptr_t)dir_metadata.offsets.items[0].util);
964+
fflush(stdout);
965+
BUG("dir_metadata accounting completely off; shouldn't happen");
966+
}
742967
write_tree(result_oid, &dir_metadata.versions, 0,
743968
opt->repo->hash_algo->rawsz);
744969
string_list_clear(&plist, 0);
745970
string_list_clear(&dir_metadata.versions, 0);
971+
string_list_clear(&dir_metadata.offsets, 0);
746972
}
747973

748974
void merge_switch_to_result(struct merge_options *opt,

0 commit comments

Comments
 (0)