HASH function (CORE-4436)

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
Hi!

We have HASH function that returns a 64 bit integer. The algorithm is
bad and the result too small for a hash.

I propose to extend the same function with a second parameter with the
algorithm name.

When only one parameter is passed, things works as now.

When two parameters are passed, it will return a VARCHAR(64) CHARACTER
SET OCTETS. That's sufficient for a SHA-256, for example.

We implement (preferably getting code with compatible license or public
domain - like was done for SHA-1 used in authentication - instead of use
a bloated library) some algorithms (MD5, some useful SHA-*). No need for
"hash plugins" for now.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Leyne, Sean

> We have HASH function that returns a 64 bit integer. The algorithm is bad and
> the result too small for a hash.
>
> I propose to extend the same function with a second parameter with the
> algorithm name.
>
> When only one parameter is passed, things works as now.
>
> When two parameters are passed, it will return a VARCHAR(64) CHARACTER
> SET OCTETS. That's sufficient for a SHA-256, for example.
>
> We implement (preferably getting code with compatible license or public
> domain - like was done for SHA-1 used in authentication - instead of use a
> bloated library) some algorithms (MD5, some useful SHA-*). No need for
> "hash plugins" for now.

Would this approach have any performance advantages over using UDFs?


Sean


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Dmitry Yemanov-3
In reply to this post by Adriano dos Santos Fernandes-3
18.07.2017 21:55, Adriano dos Santos Fernandes wrote:
>
> We have HASH function that returns a 64 bit integer. The algorithm is
> bad and the result too small for a hash.

Yes, and we have CORE-4436 with related complains.

> I propose to extend the same function with a second parameter with the
> algorithm name.

Agreed.

> When two parameters are passed, it will return a VARCHAR(64) CHARACTER
> SET OCTETS. That's sufficient for a SHA-256, for example.

Would it make sense to reserve more bytes, 256 or 1024 octets for
example? Just to avoid extending the result every five years...

> We implement (preferably getting code with compatible license or public
> domain - like was done for SHA-1 used in authentication - instead of use
> a bloated library) some algorithms (MD5, some useful SHA-*). No need for
> "hash plugins" for now.

Agreed.


Dmitry


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Dmitry Yemanov-3
In reply to this post by Leyne, Sean
18.07.2017 22:00, Leyne, Sean wrote:
>
> Would this approach have any performance advantages over using UDFs?

People hate writing UDFs for common tasks. And IMHO getting a robust
hash belongs to this category.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
On 18/07/2017 16:34, Dmitry Yemanov wrote:
> 18.07.2017 22:00, Leyne, Sean wrote:
>>
>> Would this approach have any performance advantages over using UDFs?
>
> People hate writing UDFs for common tasks. And IMHO getting a robust
> hash belongs to this category.
>
Moreover, we already have HASH and it's bad (can't be removed, but can
be improved).


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
In reply to this post by Dmitry Yemanov-3
On 18/07/2017 16:32, Dmitry Yemanov wrote:
>
>
>> When two parameters are passed, it will return a VARCHAR(64) CHARACTER
>> SET OCTETS. That's sufficient for a SHA-256, for example.
>
> Would it make sense to reserve more bytes, 256 or 1024 octets for
> example? Just to avoid extending the result every five years...

Ok for me, I believe 256 would be good then, as very large column is
difficult to read in ISQL.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Dimitry Sibiryakov-3
In reply to this post by Dmitry Yemanov-3
18.07.2017 21:32, Dmitry Yemanov wrote:
> Would it make sense to reserve more bytes, 256 or 1024 octets for example? Just to avoid
> extending the result every five years...

   Just take care not to overflow index limit.


--
   WBR, SD.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
On 18/07/2017 16:48, Dimitry Sibiryakov wrote:
> 18.07.2017 21:32, Dmitry Yemanov wrote:
>> Would it make sense to reserve more bytes, 256 or 1024 octets for
>> example? Just to avoid extending the result every five years...
>
>   Just take care not to overflow index limit.
>
>

It will be described as VARCHAR, so real length depends on the algorithm
used by the user.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Dimitry Sibiryakov-3
18.07.2017 21:52, Adriano dos Santos Fernandes wrote:
> It will be described as VARCHAR, so real length depends on the algorithm
> used by the user.

   Index is built on full data length, not real one.


--
   WBR, SD.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Leyne, Sean
In reply to this post by Adriano dos Santos Fernandes-3


> On 18/07/2017 16:32, Dmitry Yemanov wrote:
> >
> >
> >> When two parameters are passed, it will return a VARCHAR(64)
> >> CHARACTER SET OCTETS. That's sufficient for a SHA-256, for example.
> >
> > Would it make sense to reserve more bytes, 256 or 1024 octets for
> > example? Just to avoid extending the result every five years...
>
> Ok for me, I believe 256 would be good then, as very large column is difficult
> to read in ISQL.

If we are going to support common HASH functions then we need to support all currently/commonly available.

In which case a length of at least 512 bits/128 bytes (for SHA-512 and SHA3-512) is required -- difficulty reading is the last thing that should be of concern.


Sean


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

firebird-devel mailing list
In reply to this post by Adriano dos Santos Fernandes-3
18.07.2017 21:55, Adriano dos Santos Fernandes wrote:

> Hi!
>
> We have HASH function that returns a 64 bit integer. The algorithm is
> bad and the result too small for a hash.
>
> I propose to extend the same function with a second parameter with the
> algorithm name.
> > When only one parameter is passed, things works as now.
>
> When two parameters are passed, it will return a VARCHAR(64) CHARACTER
> SET OCTETS. That's sufficient for a SHA-256, for example.

   Algorithm name could define result length. If alg name is passed as
parameter, we could use maximum known length (of all known algs) at
prepare time.

Just my 0.02 UAH,
Vlad

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
Em 18/07/2017 17:35, Vlad Khorsun via Firebird-devel escreveu:

>
>   Algorithm name could define result length.

Yes, for constants.


> If alg name is passed as
> parameter, we could use maximum known length (of all known algs) at
> prepare time.
>

Maximum known length means if a new algorithm is added, it will be
increased and that seems not good. It may break applications.

Firebird has a interesting but unused feature, which is:

- A system function can declare a fixed length, say VARCHAR(20)

- Actual function execution can return VARCHAR(30)

- User CAST function execution to return VARCHAR(30)

- It works! :)


Adriano

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
In reply to this post by Dimitry Sibiryakov-3
Em 18/07/2017 16:58, Dimitry Sibiryakov escreveu:
> 18.07.2017 21:52, Adriano dos Santos Fernandes wrote:
>> It will be described as VARCHAR, so real length depends on the algorithm
>> used by the user.
>
>   Index is built on full data length, not real one.
>

People will not be crazy to create computed index on hashes.

They will store hash on field of appropriate length for the given hash
and index it.


Adriano

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Dmitry Yemanov-3
In reply to this post by Adriano dos Santos Fernandes-3
19.07.2017 01:32, Adriano dos Santos Fernandes wrote:
>
>>
>>    Algorithm name could define result length.
>
> Yes, for constants.

We need to decide whether the algorithm name can be passed dynamically
(and thus be presented as "value" in the grammar) or must be predefined
(via a string literal or maybe token). The latter gives us more
flexibility regarding the result type. Tricks with runtime type coercion
are possible but require careful programming at the user side (proper
CASTs must present), I'd rather avoid that.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Dimitry Sibiryakov-3
In reply to this post by Dimitry Sibiryakov-3
19.07.2017 2:47, Adriano dos Santos Fernandes wrote:
> They will store hash on field of appropriate length for the given hash
> and index it.

   On contrary, there is no point to store hash value if its purpose is to speed up lookup
only.

--
   WBR, SD.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
On 19/07/2017 05:27, Dimitry Sibiryakov wrote:
> 19.07.2017 2:47, Adriano dos Santos Fernandes wrote:
>> They will store hash on field of appropriate length for the given hash
>> and index it.
>
>   On contrary, there is no point to store hash value if its purpose is
> to speed up lookup only.
>

I hope you don't need to recreate your index and recompute whole table
hashes with a slow algorithm.

In most cases original string (password) is not even stored and hashes
are used to check if a password matches a previously one.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
In reply to this post by Dmitry Yemanov-3
On 19/07/2017 02:29, Dmitry Yemanov wrote:

> 19.07.2017 01:32, Adriano dos Santos Fernandes wrote:
>>
>>>
>>>    Algorithm name could define result length.
>>
>> Yes, for constants.
>
> We need to decide whether the algorithm name can be passed dynamically
> (and thus be presented as "value" in the grammar) or must be
> predefined (via a string literal or maybe token). The latter gives us
> more flexibility regarding the result type.

This is an interesting idea.

We can use special syntax (like some others system functions):

HASH( <value> [WITH <algorithm>] )

<algorithm> will be an identifier (not keyword) interpreted at compile time.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

firebird-devel mailing list


On 19.07.2017 14:07, Adriano dos Santos Fernandes wrote:

> On 19/07/2017 02:29, Dmitry Yemanov wrote:
>> 19.07.2017 01:32, Adriano dos Santos Fernandes wrote:
>>>>     Algorithm name could define result length.
>>> Yes, for constants.
>> We need to decide whether the algorithm name can be passed dynamically
>> (and thus be presented as "value" in the grammar) or must be
>> predefined (via a string literal or maybe token). The latter gives us
>> more flexibility regarding the result type.
> This is an interesting idea.
>
> We can use special syntax (like some others system functions):
>
> HASH( <value> [WITH <algorithm>] )
>
> <algorithm> will be an identifier (not keyword) interpreted at compile time.

If we are not going to use hash plugins I'd vote for this syntax.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Leyne, Sean
In reply to this post by Adriano dos Santos Fernandes-3


> > We need to decide whether the algorithm name can be passed dynamically
> > (and thus be presented as "value" in the grammar) or must be
> > predefined (via a string literal or maybe token). The latter gives us
> > more flexibility regarding the result type.
>
> This is an interesting idea.
>
> We can use special syntax (like some others system functions):
>
> HASH( <value> [WITH <algorithm>] )

Why do we need to extend the current function?

Why not create separate, built-in, functions for each hash type with names* that align with the common algorithm name?

MD2()
...
MD5()
SHA0()
SHA1()
SHA_224()
...
SHA512_256()
...
SHA3_224()
...
SHA3_512()


(would save needing to look at documentation to determine the supported algorithms)?



Sean

* https://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HASH function (CORE-4436)

Adriano dos Santos Fernandes-3
On 19/07/2017 11:44, Leyne, Sean wrote:

>
>>> We need to decide whether the algorithm name can be passed dynamically
>>> (and thus be presented as "value" in the grammar) or must be
>>> predefined (via a string literal or maybe token). The latter gives us
>>> more flexibility regarding the result type.
>> This is an interesting idea.
>>
>> We can use special syntax (like some others system functions):
>>
>> HASH( <value> [WITH <algorithm>] )
> Why do we need to extend the current function?
>
> Why not create separate, built-in, functions for each hash type with names* that align with the common algorithm name?
>
Because "HASH" is an excellent namespace/container for hash functions.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
12
Loading...