Skip to content

Commit 7f6fc87

Browse files
committed
feat: support workspace filters when fetching them from the object database
1 parent 2ff41f9 commit 7f6fc87

File tree

10 files changed

+964
-65
lines changed

10 files changed

+964
-65
lines changed

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crate-status.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -293,16 +293,26 @@ The top-level crate that acts as hub to all functionality provided by the `gix-*
293293
Check out the [performance discussion][gix-diff-performance] as well.
294294

295295
* **tree**
296-
* [x] changes needed to obtain _other tree_
296+
* [x] changes needed to obtain _other tree_
297297
* **patches**
298-
* There are various ways to generate a patch from two blobs.
299-
* [ ] any
298+
* There are various ways to generate a patch from two blobs.
299+
* [ ] text
300+
* [ ] binary
300301
* **lines**
301-
* [x] Simple line-by-line diffs powered by the `imara-diff` crate.
302-
* diffing, merging, working with hunks of data
303-
* find differences between various states, i.e. index, working tree, commit-tree
302+
* [x] Simple line-by-line diffs powered by the `imara-diff` crate.
303+
* **generic rename tracker to find renames and copies**
304+
* [x] find by exact match
305+
* [x] find by similarity check
306+
* [ ] heuristics to find best candidate
307+
* [ ] find by basename to help detecting simple moves
308+
* [ ] caching of diffable data
309+
* **blob**
310+
* [ ] use external command for diff generation
311+
* [ ] worktree conversions
312+
* [ ] `textconv` filters
313+
* [ ] working with hunks of data
304314
* [x] API documentation
305-
* [ ] Examples
315+
* [ ] Examples
306316

307317
[gix-diff-performance]: https://github.com/Byron/gitoxide/discussions/74
308318

gix-diff/Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ autotests = false
1313
[features]
1414
default = ["blob"]
1515
## Enable diffing of blobs using imara-diff, which also allows for a generic rewrite tracking implementation.
16-
blob = ["dep:imara-diff"]
16+
blob = ["dep:imara-diff", "dep:gix-filter"]
1717
## Data structures implement `serde::Serialize` and `serde::Deserialize`.
1818
serde = ["dep:serde", "gix-hash/serde", "gix-object/serde"]
1919
## Make it possible to compile to the `wasm32-unknown-unknown` target.
@@ -25,6 +25,7 @@ doctest = false
2525
[dependencies]
2626
gix-hash = { version = "^0.13.1", path = "../gix-hash" }
2727
gix-object = { version = "^0.38.0", path = "../gix-object" }
28+
gix-filter = { version = "^0.6.0", path = "../gix-filter", optional = true }
2829

2930
thiserror = "1.0.32"
3031
imara-diff = { version = "0.1.3", optional = true }

gix-diff/src/blob.rs

Lines changed: 0 additions & 18 deletions
This file was deleted.

gix-diff/src/blob/mod.rs

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
//! For using text diffs, please have a look at the [`imara-diff` documentation](https://docs.rs/imara-diff),
2+
//! maintained by [Pascal Kuthe](https://github.com/pascalkuthe).
3+
pub use imara_diff::*;
4+
5+
use bstr::BStr;
6+
use bstr::BString;
7+
8+
/// Information about the diff performed to detect similarity.
9+
#[derive(Debug, Default, Clone, Copy, PartialEq, PartialOrd)]
10+
pub struct DiffLineStats {
11+
/// The amount of lines to remove from the source to get to the destination.
12+
pub removals: u32,
13+
/// The amount of lines to add to the source to get to the destination.
14+
pub insertions: u32,
15+
/// The amount of lines of the previous state, in the source.
16+
pub before: u32,
17+
/// The amount of lines of the new state, in the destination.
18+
pub after: u32,
19+
/// A range from 0 to 1.0, where 1.0 is a perfect match and 0.5 is a similarity of 50%.
20+
/// Similarity is the ratio between all lines in the previous blob and the current blob,
21+
/// calculated as `(old_lines_count - new_lines_count) as f32 / old_lines_count.max(new_lines_count) as f32`.
22+
pub similarity: f32,
23+
}
24+
25+
/// A set of values to define how to diff something that is associated with it using `git-attributes`.
26+
///
27+
/// Some values are related to diffing, some are related to conversions.
28+
#[derive(Debug, Clone)]
29+
struct Driver {
30+
/// The name of the driver, as referred to by `[diff "name"]` in the git configuration.
31+
pub name: BString,
32+
/// The per-driver algorithm to use.
33+
pub algorithm: Option<Algorithm>,
34+
/// The external filter program to call like `<binary_to_text_command> /path/to/blob` which outputs a textual version of the provided
35+
/// binary file.
36+
/// Note that it's invoked with a shell if arguments are given.
37+
pub binary_to_text_command: Option<BString>,
38+
/// `true` if this driver deals with binary files, which means that a `binary_to_text_command` should be used to convert binary
39+
/// into a textual representation.
40+
pub is_binary: bool,
41+
}
42+
43+
/// A trait to help access the worktree version of a file, if present.
44+
pub trait ReadWorktreeBlob {
45+
/// Write the contents of the file, executable or link (i.e. the link target itself) at
46+
/// `rela_path` to the initially empty `buf`.
47+
/// `is_source` is `true` if this is the blob for the source of a rewrite (copy or rename). Otherwise it is the
48+
/// destination.
49+
///
50+
/// Return `std::io::ErrorKind::NotFound` if the file is not available in the working tree, which will make the
51+
/// implementation to extract it from the object database instead and convert it to its working-tree counterpart.
52+
fn read_worktree_blob(&mut self, rela_path: &BStr, buf: &mut Vec<u8>, is_source: bool) -> std::io::Result<()>;
53+
}
54+
55+
///
56+
pub mod platform {
57+
use super::Algorithm;
58+
use crate::blob::{Platform, ReadWorktreeBlob};
59+
60+
/// A conversion pipeline to take an object or path from what's stored in `git` to what can be diffed, while
61+
/// following the guidance of git-attributes at the respective path to learn if diffing should happen or if
62+
/// the content is considered binary.
63+
///
64+
/// There are two different conversion flows, where the target of the flow is a buffer with diffable content:
65+
///
66+
/// * `worktree on disk` -> `text conversion`
67+
/// * `object` -> `worktree-filters` -> `text conversion`
68+
///
69+
/// Based on whether or not [`ReadWorktreeBlob`] can find the file in question, we either read directly from disk
70+
/// or transform from the object database.
71+
// TODO: make public and put outside to driver, construct separately
72+
pub(crate) struct FilterPipeline<ReadBlob> {
73+
filter: gix_filter::Pipeline,
74+
read_blob: ReadBlob,
75+
drivers: Vec<super::Driver>,
76+
}
77+
78+
#[derive(Default)]
79+
pub(crate) struct Diffable {
80+
/// `None` unless this instance is set to an object to diff.
81+
id_and_mode: Option<(gix_hash::ObjectId, gix_object::tree::EntryMode)>,
82+
/// The complete and fully transformed content to use for diffing, an allocation for reuse
83+
/// between setting different source and destination objects.
84+
buf: Vec<u8>,
85+
}
86+
87+
/// Options for use in [Platform::new()].
88+
#[derive(Default, Copy, Clone)]
89+
pub struct Options {
90+
/// The algorithm to use when diffing.
91+
pub algorithm: Option<Algorithm>,
92+
93+
/// If `false`, default `false`, no diffing of binary files will be attempted even
94+
/// if a [`binary_to_text_command`](Driver::binary_to_text_command) is set in the driver.
95+
pub allow_binary_to_text_command: bool,
96+
}
97+
/// Lifecycle
98+
impl<ReadBlob: ReadWorktreeBlob> Platform<ReadBlob> {
99+
/// Create a new instance with `options`, a way to read worktree blobs with `read_blob` and `drivers`.
100+
pub fn new(
101+
options: Options,
102+
read_blob: ReadBlob,
103+
filter: gix_filter::Pipeline,
104+
drivers: Vec<super::Driver>,
105+
) -> Self {
106+
Platform {
107+
opts: options,
108+
old: Diffable::default(),
109+
new: Diffable::default(),
110+
filter: FilterPipeline {
111+
filter,
112+
read_blob,
113+
drivers,
114+
},
115+
}
116+
}
117+
}
118+
}
119+
120+
/// A utility for performing a diff of two blobs, including flexible conversions, conversion-caching
121+
/// acquisition of diff information.
122+
/// Note that this instance will not call external filters as their output can't be known programmatically,
123+
/// but it allows to prepare their input if the caller wishes to perform this task.
124+
struct Platform<ReadBlob> {
125+
opts: platform::Options,
126+
/// The old version of a diff-able blob, if set.
127+
old: platform::Diffable,
128+
/// The new version of a diff-able blob, if set.
129+
new: platform::Diffable,
130+
131+
/// A way to convert objects into a diff-able format.
132+
filter: platform::FilterPipeline<ReadBlob>,
133+
}

0 commit comments

Comments
 (0)