Skip to content

Generate stable sql objects names #1802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fredericDelaporte
Copy link
Member

@fredericDelaporte fredericDelaporte commented Jul 16, 2018

Fix #1769.

Possible breaking change:

  • Auto-generated constraint names will not be the same than the ones generated with previous NHibernate versions under .Net Framework. (Under .Net Core those names were anyway changing at each run.)

@fredericDelaporte fredericDelaporte added this to the 5.2 milestone Jul 16, 2018
@fredericDelaporte fredericDelaporte force-pushed the sqlObjectsStableNames branch 2 times, most recently from 579dcb0 to 464cb10 Compare July 16, 2018 16:14
using NHibernate.Mapping;
using NUnit.Framework;

namespace NHibernate.Test.NHSpecificTest.NH1399
{
[TestFixture]
[TestFixture, Obsolete]
public class Fixture
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is testing Table.UniqueColumnString with a non-null referencedTable parameter, case which was no more having any usage in NHibernate. So indeed these tests are obsolete since some times.

if (string.IsNullOrEmpty(fk.Name))
{
fk.Name = Constraint.GenerateName(
fk.GeneratedConstraintNamePrefix, table, fk.ReferencedTable, fk.Columns);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a heavily simplified port of the Hibernate code, which instead uses name policy generation and bothers with quoting/unquoting (like the things removed in #1703).

// Hash the generated name for avoiding collisions with user choosen names.
// This is not 100% reliable, as hashing may still have a chance of generating
// collisions.
var name = prefix + HashName(sb.ToString());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite dislike the idea of "uniqueness through hashcode", since it is fundamentally flawed: collisions may occur, even if very unlikely. But I have no better idea here.

if (name.Length > 30)
{
// This, of course, increases the collision risk.
name = name.Substring(0, 30);
Copy link
Member Author

@fredericDelaporte fredericDelaporte Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was tempted to use Dialect.MaxAliasLength. But that would require to partially undo #1703... Maybe instead should we add some setting for the max length here. After all we are not generating an alias. And a setting would be quite less heavy to resolve than the dialect.

@hazzik, any thoughts here?

/// <returns>The hased name.</returns>
private static string HashName(string name)
{
// Hibernate uses MD5, but with .Net this would throw on FIPS enabled machine.
Copy link
Member Author

@fredericDelaporte fredericDelaporte Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing is from my point of view a very unfortunate design decision of the .Net Framework: if FIPS is enabled on the machine, any usage of non FIPS compliant algorithm is forbidden. But FIPS does not forbid them, it just forbids to use them for handling a security related task. And here we are not handling a security related task. But there is no way to tell it for allowing using MD5 here even on FIPS enabled machine. See this Microsoft blog (may be very long to load) for more information.

So if we want to use MD5 here without causing NHibernate to be unusable on a FIPS enabled machine, we would have to use a custom implementation, like this one.

{
// Hibernate uses MD5, but with .Net this would throw on FIPS enabled machine.
// As a consequence generated names will be quite longer.
using (var hasher = SHA256.Create())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should pass on a FIPS enabled machine, SHA256.Create being documented as choosing a FIPS compliant one if needed. At first I was willing to use HashAlgorithm.Create instead (which default to SHA-1, a bit shorter), but it throws PlatformNotSupported under Core.

// Converting to base 32 for shortening the name.
// Hibernate uses base 35, but we do not have a native implementation
// in .Net, and base 32 is easier to implement.
return ToBase32String(hash);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hibernate creates a BigInteger from the byte array (.Net can do that too) then asks its base 35 representation: .Net does not have this natively, and implementing it was a bit heavy in my opinion.

@hazzik
Copy link
Member

hazzik commented Jul 16, 2018

I think using a cryptographic hash for this trivial task is an overkill. I was researching using lightweight non-cryptographic hashes to perform the operation, like MurmurHash2 and xxHash. xx claims to be the fastest. They both generate 32 bit integer as a result (they have 64 bit versions, but I think this is an overkill again) and provide 10/10 results on SMHasher. I think it should be enough for us.

Both algorithms are open-sourced with compatible licenses

@fredericDelaporte
Copy link
Member Author

For avoiding adding a new dependency, we could take xxHash version from Seok-Ju Yun by taking just the part of the source we need. The speed claim is about C version, so it may not be faster than MurMurHash2 unsafe or hackish versions from David Landmann (which we could also incorporate as source code).

But for the usage it will have, mapping compilation and only on some foreign keys and unique constraints, I do not think it will make much differences.

@hazzik
Copy link
Member

hazzik commented Jul 17, 2018

I benchmarked the implementations:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-2670QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
Frequency=2143568 Hz, Resolution=466.5119 ns, Timer=TSC
.NET Core SDK=2.1.302
  [Host] : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
  Clr    : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
  Core   : .NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT

Method Job Runtime N Mean Error StdDev Rank
_MurmurHash2Simple Clr Clr 1000 1,941.1 ns 16.336 ns 15.281 ns 7
_MurmurHash2InlineBitConverter Clr Clr 1000 1,249.7 ns 7.251 ns 6.782 ns 4
_MurmurHash2Unsafe Clr Clr 1000 387.5 ns 5.415 ns 4.800 ns 2
_xxHash Clr Clr 1000 2,791.2 ns 17.844 ns 16.691 ns 8
_xxHash2 Clr Clr 1000 2,820.5 ns 25.035 ns 19.546 ns 9
_MurmurHash2Simple Core Core 1000 803.6 ns 2.764 ns 2.586 ns 3
_MurmurHash2InlineBitConverter Core Core 1000 1,249.1 ns 15.606 ns 12.184 ns 4
_MurmurHash2Unsafe Core Core 1000 383.0 ns 1.749 ns 1.636 ns 1
_xxHash Core Core 1000 1,693.8 ns 16.848 ns 14.069 ns 6
_xxHash2 Core Core 1000 1,680.4 ns 6.565 ns 6.141 ns 5
_MurmurHash2Simple Clr Clr 10000 19,309.7 ns 183.592 ns 143.336 ns 14
_MurmurHash2InlineBitConverter Clr Clr 10000 12,535.8 ns 57.996 ns 48.429 ns 12
_MurmurHash2Unsafe Clr Clr 10000 3,748.7 ns 25.359 ns 21.176 ns 10
_xxHash Clr Clr 10000 28,138.2 ns 161.633 ns 134.971 ns 15
_xxHash2 Clr Clr 10000 28,212.8 ns 61.340 ns 57.378 ns 15
_MurmurHash2Simple Core Core 10000 7,969.3 ns 28.366 ns 25.146 ns 11
_MurmurHash2InlineBitConverter Core Core 10000 12,541.1 ns 65.146 ns 57.750 ns 12
_MurmurHash2Unsafe Core Core 10000 3,754.9 ns 15.327 ns 14.337 ns 10
_xxHash Core Core 10000 16,782.3 ns 113.966 ns 106.604 ns 13
_xxHash2 Core Core 10000 16,825.2 ns 42.787 ns 40.023 ns 13

@hazzik
Copy link
Member

hazzik commented Jul 17, 2018

Let's stick with MurMurHash2Simple

Remove cryptographic hash
@fredericDelaporte fredericDelaporte changed the title WIP - Generate stable sql objects names Generate stable sql objects names Jul 17, 2018
{
// Use a concatenation that guarantees uniqueness, even if identical names
// exist between all table and column identifiers.
var sb = new StringBuilder("table`" + table.Name + "`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use StringBuilder.Append instead of concatenation of the string. Eg:

var sb = new StringBuilder("table`").Append(table.Name).Append("`");

// exist between all table and column identifiers.
var sb = new StringBuilder("table`" + table.Name + "`");
if (referencedTable != null)
sb.Append("references`" + referencedTable.Name + "`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

public static readonly ColumnComparator Instance = new ColumnComparator();

public int Compare(Column col1, Column col2) {
return StringComparer.Ordinal.Compare(col1?.Name, col2?.Name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should compare on Column.CanonicalName. The Name of a Column class is case-sensitive when quoted and case-insensitive when not quoted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it depends on database. SQL Server 2017 keeps respecting the collation of the database (in fact, of its metadata tables), quoted or not.
It may also be case sensitive when unquoted, not only when quoted. And this varies from databases to databases.

Quoted identifier being case sensitive is an Oracle feature. So I think it is better taking the name "as is", relying on the user to map its objects in a consistent way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not talking about databases here, but about rules of NHibernate itself. Column has overridden Equals and GetHashCode operators which follow this rule. So we need to be consistent.

foreach (var column in alphabeticalColumns)
{
var columnName = column == null ? "" : column.Name;
sb.Append("column`").Append(columnName).Append("`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, here should be CanonicalName prob.

// bindings are given.
var alphabeticalColumns = new List<Column>(columns);
alphabeticalColumns.Sort(ColumnComparator.Instance);
foreach (var column in alphabeticalColumns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use foreach (var column in columns.OrderBy(c=>c.Name))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I have ported the Java code without thinking about it.

Simplify code and use canonical name
@bahusoid
Copy link
Member

Maybe we should include in generated key some truncated real parts like table and/or column names to minimize collisions risks on large databases and make generated keys more "user friendly"
Something like "FK_TABLEX_3B355A0C"

@hazzik
Copy link
Member

hazzik commented Jul 18, 2018

User has an ability to specify a name, so I would not bother

@fredericDelaporte fredericDelaporte merged commit 2f8f317 into nhibernate:master Jul 18, 2018
@fredericDelaporte fredericDelaporte deleted the sqlObjectsStableNames branch July 18, 2018 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants